agedi.api.dataset

Dataset creation.

Functions

create_dataset(→ agedi.data.Dataset)

Create and setup an AGeDi Dataset from ASE Atoms objects.

Module Contents

agedi.api.dataset.create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: float | int = 0.9, val_split: float | int = 0.1, mask: str = 'none', confinement: Tuple[float, float] | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: int | None = None, canonical_cell: bool = False, regressor_data: Sequence[ase.Atoms] | None = None, properties: List[Dict] | None = None) agedi.data.Dataset

Create and setup an AGeDi Dataset from ASE Atoms objects.

Parameters:
  • data (Sequence[Atoms]) – ASE Atoms objects to add to the dataset.

  • cutoff (float, optional) – Neighbour-list cutoff radius in Ångström.

  • batch_size (int, optional) – Mini-batch size used during training/validation.

  • train_split (Union[float, int], optional) – Fraction or absolute number of samples for the training split.

  • val_split (Union[float, int], optional) – Fraction or absolute number of samples for the validation split.

  • mask (str, optional) – Atom-mask method (e.g. "MaskFixed" or "none").

  • confinement (Tuple[float, float], optional) – Z-axis confinement bounds (z_min, z_max).

  • conditioning (str, optional) – Name of the per-structure property to use as a conditioning signal. The value is read from atoms.info[conditioning] or the corresponding atoms.get_<conditioning>() method. Ignored when set to "none" (default).

  • conditioning_type (str, optional) – "scalar" (default) or "node"; controls how the conditioning property is broadcast onto the graph.

  • repeat (int, optional) – When given, augment the dataset by repeating each structure up to repeat times along the first two cell vectors.

  • canonical_cell (bool, optional) – Store cells in canonical lower-triangular form.

  • regressor_data (Sequence[Atoms], optional) – Additional ASE Atoms objects used to train a regressor head.

  • properties (List[Dict], optional) – Per-structure property dictionaries; must contain exactly one entry per element in data. Each dictionary is merged into the corresponding graph object via setattr, matching the layout accepted by add_atoms_data(). Keys already produced by the conditioning logic are overwritten by values in properties when both are present.

Returns:

A fully set-up Dataset ready for training.

Return type:

Dataset