agedi.api.dataset¶
Dataset creation.
Functions¶
|
Create and setup an AGeDi Dataset from ASE Atoms objects. |
Module Contents¶
- agedi.api.dataset.create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: float | int = 0.9, val_split: float | int = 0.1, mask: str = 'none', confinement: Tuple[float, float] | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: int | None = None, canonical_cell: bool = False, regressor_data: Sequence[ase.Atoms] | None = None, properties: List[Dict] | None = None) agedi.data.Dataset¶
Create and setup an AGeDi Dataset from ASE Atoms objects.
- Parameters:
data (Sequence[Atoms]) – ASE Atoms objects to add to the dataset.
cutoff (float, optional) – Neighbour-list cutoff radius in Ångström.
batch_size (int, optional) – Mini-batch size used during training/validation.
train_split (Union[float, int], optional) – Fraction or absolute number of samples for the training split.
val_split (Union[float, int], optional) – Fraction or absolute number of samples for the validation split.
mask (str, optional) – Atom-mask method (e.g.
"MaskFixed"or"none").confinement (Tuple[float, float], optional) – Z-axis confinement bounds
(z_min, z_max).conditioning (str, optional) – Name of the per-structure property to use as a conditioning signal. The value is read from
atoms.info[conditioning]or the correspondingatoms.get_<conditioning>()method. Ignored when set to"none"(default).conditioning_type (str, optional) –
"scalar"(default) or"node"; controls how the conditioning property is broadcast onto the graph.repeat (int, optional) – When given, augment the dataset by repeating each structure up to
repeattimes along the first two cell vectors.canonical_cell (bool, optional) – Store cells in canonical lower-triangular form.
regressor_data (Sequence[Atoms], optional) – Additional ASE Atoms objects used to train a regressor head.
properties (List[Dict], optional) – Per-structure property dictionaries; must contain exactly one entry per element in data. Each dictionary is merged into the corresponding graph object via
setattr, matching the layout accepted byadd_atoms_data(). Keys already produced by the conditioning logic are overwritten by values in properties when both are present.
- Returns:
A fully set-up
Datasetready for training.- Return type: