agedi.api.dataset ================= .. py:module:: agedi.api.dataset .. autoapi-nested-parse:: Dataset creation. Functions --------- .. autoapisummary:: agedi.api.dataset.create_dataset Module Contents --------------- .. py:function:: create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: Optional[int] = None, canonical_cell: bool = False, regressor_data: Optional[Sequence[ase.Atoms]] = None, properties: Optional[List[Dict]] = None) -> agedi.data.Dataset Create and setup an AGeDi Dataset from ASE Atoms objects. :param data: ASE Atoms objects to add to the dataset. :type data: Sequence[Atoms] :param cutoff: Neighbour-list cutoff radius in Ångström. :type cutoff: float, optional :param batch_size: Mini-batch size used during training/validation. :type batch_size: int, optional :param train_split: Fraction or absolute number of samples for the training split. :type train_split: Union[float, int], optional :param val_split: Fraction or absolute number of samples for the validation split. :type val_split: Union[float, int], optional :param mask: Atom-mask method (e.g. ``"MaskFixed"`` or ``"none"``). :type mask: str, optional :param confinement: Z-axis confinement bounds ``(z_min, z_max)``. :type confinement: Tuple[float, float], optional :param conditioning: Name of the per-structure property to use as a conditioning signal. The value is read from ``atoms.info[conditioning]`` or the corresponding ``atoms.get_()`` method. Ignored when set to ``"none"`` (default). :type conditioning: str, optional :param conditioning_type: ``"scalar"`` (default) or ``"node"``; controls how the conditioning property is broadcast onto the graph. :type conditioning_type: str, optional :param repeat: When given, augment the dataset by repeating each structure up to ``repeat`` times along the first two cell vectors. :type repeat: int, optional :param canonical_cell: Store cells in canonical lower-triangular form. :type canonical_cell: bool, optional :param regressor_data: Additional ASE Atoms objects used to train a regressor head. :type regressor_data: Sequence[Atoms], optional :param properties: Per-structure property dictionaries; **must** contain exactly one entry per element in *data*. Each dictionary is merged into the corresponding graph object via ``setattr``, matching the layout accepted by :meth:`~agedi.data.Dataset.add_atoms_data`. Keys already produced by the *conditioning* logic are overwritten by values in *properties* when both are present. :type properties: List[Dict], optional :returns: A fully set-up :class:`~agedi.data.Dataset` ready for training. :rtype: Dataset