agedi.api ========= .. py:module:: agedi.api .. autoapi-nested-parse:: Public API for AGeDi. Re-exports all public symbols from the api sub-modules so that ``from agedi.api import X`` works for every user-facing name. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/agedi/api/_display/index /autoapi/agedi/api/_registry/index /autoapi/agedi/api/dataset/index /autoapi/agedi/api/diffusion/index /autoapi/agedi/api/prediction/index /autoapi/agedi/api/sampling/index /autoapi/agedi/api/training/index Functions --------- .. autoapisummary:: agedi.api.register_model agedi.api.create_dataset agedi.api.create_diffusion agedi.api.load_diffusion agedi.api.predict agedi.api.sample agedi.api.create_trainer agedi.api.train agedi.api.train_from_atoms agedi.api.train_from_config Package Contents ---------------- .. py:function:: register_model(name: str, factory: Callable) -> None Register a custom score model backbone factory under *name*. The factory is called with the keyword arguments ``cutoff``, ``heads``, ``feature_size``, ``n_blocks``, ``head_dim``, and ``n_rbf`` and must return a 3-tuple ``(translator, representation, List[Head])``. Registered models can be selected by passing ``model=name`` to :func:`create_diffusion`. :param name: Alias used to select this backend (e.g. ``"PaiNN"``). :type name: str :param factory: Factory function with signature:: factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf) -> Tuple[Translator, nn.Module, List[Head]] :type factory: Callable .. rubric:: Examples :: from agedi.functional import register_model def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf): ... return translator, representation, head_list register_model("MyModel", my_factory) .. py:function:: create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: Optional[int] = None, canonical_cell: bool = False, regressor_data: Optional[Sequence[ase.Atoms]] = None, properties: Optional[List[Dict]] = None) -> agedi.data.Dataset Create and setup an AGeDi Dataset from ASE Atoms objects. :param data: ASE Atoms objects to add to the dataset. :type data: Sequence[Atoms] :param cutoff: Neighbour-list cutoff radius in Ångström. :type cutoff: float, optional :param batch_size: Mini-batch size used during training/validation. :type batch_size: int, optional :param train_split: Fraction or absolute number of samples for the training split. :type train_split: Union[float, int], optional :param val_split: Fraction or absolute number of samples for the validation split. :type val_split: Union[float, int], optional :param mask: Atom-mask method (e.g. ``"MaskFixed"`` or ``"none"``). :type mask: str, optional :param confinement: Z-axis confinement bounds ``(z_min, z_max)``. :type confinement: Tuple[float, float], optional :param conditioning: Name of the per-structure property to use as a conditioning signal. The value is read from ``atoms.info[conditioning]`` or the corresponding ``atoms.get_()`` method. Ignored when set to ``"none"`` (default). :type conditioning: str, optional :param conditioning_type: ``"scalar"`` (default) or ``"node"``; controls how the conditioning property is broadcast onto the graph. :type conditioning_type: str, optional :param repeat: When given, augment the dataset by repeating each structure up to ``repeat`` times along the first two cell vectors. :type repeat: int, optional :param canonical_cell: Store cells in canonical lower-triangular form. :type canonical_cell: bool, optional :param regressor_data: Additional ASE Atoms objects used to train a regressor head. :type regressor_data: Sequence[Atoms], optional :param properties: Per-structure property dictionaries; **must** contain exactly one entry per element in *data*. Each dictionary is merged into the corresponding graph object via ``setattr``, matching the layout accepted by :meth:`~agedi.data.Dataset.add_atoms_data`. Keys already produced by the *conditioning* logic are overwritten by values in *properties* when both are present. :type properties: List[Dict], optional :returns: A fully set-up :class:`~agedi.data.Dataset` ready for training. :rtype: Dataset .. py:function:: create_diffusion(model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[Union[str, Noiser]] = ('CellPositions', ), sde: Union[str, SDE] = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Optional[Tuple[float, float]] = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: Optional[Union[str, torch.device]] = None, type_map: Optional[List[int]] = None) -> agedi.Agedi Create a diffusion model for script-based training and sampling. :param model: GNN backbone architecture. The name is looked up in the model registry; use :func:`register_model` to add custom backends. The built-in default is ``"PaiNN"`` (SchNetPack PaiNN). :type model: str, optional :param cutoff: Neighbour-list cutoff radius in Å. Defaults to ``6.0``. :type cutoff: float, optional :param feature_size: Embedding / feature dimension. Defaults to ``64``. :type feature_size: int, optional :param n_blocks: Number of interaction blocks. Defaults to ``4``. :type n_blocks: int, optional :param n_rbf: Number of radial basis functions. Defaults to ``30``. :type n_rbf: int, optional :param noisers: Noiser identifiers or instances to include. Defaults to ``("CellPositions",)``. Recognised string identifiers (CamelCase preferred; snake_case aliases also accepted for backwards compatibility): * ``"Positions"`` / ``"positions"`` – :class:`~agedi.diffusion.noisers.Positions` (StandardNormal prior + Normal, for gas-phase clusters). * ``"CellPositions"`` / ``"cell_positions"`` – :class:`~agedi.diffusion.noisers.CellPositions` (UniformCell prior + Normal, for periodic bulk/surface systems). * ``"ConfinedCellPositions"`` / ``"confined_cell_positions"`` – :class:`~agedi.diffusion.noisers.ConfinedCellPositions` (UniformCellConfined prior + TruncatedNormal, for Z-confined systems). * ``"Types"`` / ``"types"`` – :class:`~agedi.diffusion.noisers.Types`. :type noisers: Sequence[str or Noiser], optional :param sde: SDE for position noisers. Short aliases: ``"ve"`` (default), ``"vp"``. Pass an instantiated :class:`~agedi.diffusion.sdes.SDE` for full control. :type sde: str or SDE, optional :param conditioning: Property to condition on, or ``"none"`` for time-only conditioning. Defaults to ``"none"``. :type conditioning: str, optional :param conditioning_type: Type of the conditioning module: ``"scalar"`` or ``"integer"``. Defaults to ``"scalar"``. :type conditioning_type: str, optional :param confinement: Z-direction confinement bounds ``(z_min, z_max)`` in Å. :type confinement: Tuple[float, float], optional :param force_field: When ``True``, attach a ``diffusion.regressor_model``. The heads **shares** the same representation and translator as the score model so that atomic embeddings are learned jointly. It is trained whenever the training batch contains per-atom forces and total energies (i.e. the ASE training structures have DFT (or other) energy and forces). The trained forces head enables force-field guided sampling via :class:`~agedi.diffusion.ForcefieldGuidanceConfig`. Defaults to ``False``. :type force_field: bool, optional :param lr: Learning rate. Defaults to ``1e-4``. :type lr: float, optional :param lr_factor: LR-scheduler reduction factor. Defaults to ``0.95``. :type lr_factor: float, optional :param lr_patience: LR-scheduler patience (epochs). Defaults to ``100``. :type lr_patience: int, optional :param weight_decay: Optimizer weight-decay. Defaults to ``0.0``. :type weight_decay: float, optional :param eps: Minimum diffusion time. Defaults to ``1e-5``. :type eps: float, optional :param guidance_weight: Classifier-free guidance weight. Defaults to ``-1.0`` (disabled). :type guidance_weight: float, optional :param device: Target compute device. When ``None`` CUDA is used if available, otherwise CPU. :type device: str or torch.device, optional :param type_map: Compact type map for the :class:`~agedi.diffusion.noisers.Types` noiser. ``type_map[0]`` must be ``0`` (absorbing state) and ``type_map[i]`` is the atomic number for compact index ``i``. When provided, the ``Types`` noiser and the ``TypesScore`` head use a reduced vocabulary of size ``len(type_map)`` instead of the default 100. Auto-populated by :func:`train_from_atoms` when a ``"Types"`` noiser is requested. :type type_map: List[int], optional :returns: A freshly initialised :class:`~agedi.Agedi` model. :rtype: Agedi .. py:function:: load_diffusion(path: Union[str, pathlib.Path], checkpoint: Optional[Union[str, pathlib.Path]] = None, device: Optional[Union[str, torch.device]] = None) -> Agedi Load a trained diffusion model from an AGeDi log directory. The model architecture is fully reconstructed from the Hydra-compatible ``diffusion`` config stored in ``hparams.yaml``, so no additional parameters are needed. :param path: Path to the AGeDi log / model directory (or directly to the ``hparams.yaml`` file). :param checkpoint: Path to a specific checkpoint file. When ``None`` the latest checkpoint (``checkpoints/last_model.ckpt``) is loaded automatically. :param device: Device to load the model onto. When ``None`` CUDA is used if available, otherwise CPU. .. py:function:: predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: Optional[float] = None) -> List[ase.Atoms] Predict energies and forces for input structures using a trained force-field. The model must have been trained with ``force_field=True`` (i.e. it must have a ``regressor_model`` attached). The predicted energy and forces are attached to the returned :class:`~ase.Atoms` objects via an :class:`~ase.calculators.singlepoint.SinglePointCalculator`. :param diffusion: A trained :class:`~agedi.Agedi` model with a force-field regressor (trained with ``--force_field``). :param structures: Input ASE :class:`~ase.Atoms` objects to run predictions on. :param batch_size: Number of structures per inference batch. Defaults to ``64``. :param cutoff: Neighbour-list cutoff in Å. When ``None`` (default), the cutoff is read from the model's representation automatically. :returns: The input structures with a :class:`~ase.calculators.singlepoint.SinglePointCalculator` attached containing the predicted energy and/or forces. :rtype: List[Atoms] :raises ValueError: If the model does not have a force-field regressor. .. py:function:: sample(diffusion: Agedi, *, n_samples: int, n_atoms: Optional[int] = None, atomic_numbers: Optional[List[int]] = None, formula: Optional[str] = None, positions: Optional[numpy.ndarray] = None, cell: Optional[numpy.ndarray] = None, pbc: Optional[numpy.ndarray] = None, template: Optional[Union[agedi.data.AtomsGraph, ase.Atoms]] = None, confinement: Optional[Tuple[float, float]] = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: Optional[agedi.diffusion.ForcefieldGuidanceConfig] = None, property: Optional[Dict[str, float]] = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True) -> Union[List[agedi.data.AtomsGraph], List[ase.Atoms], List[List[agedi.data.AtomsGraph]], List[List[ase.Atoms]]] Sample structures from a trained diffusion model. :param diffusion: A trained :class:`~agedi.Agedi` model. :param n_samples: Number of structures to generate. :param n_atoms: Number of atoms per structure. Automatically determined from ``formula`` if provided, or from the length of ``atomic_numbers`` when ``n_atoms`` is not explicitly given. :param atomic_numbers: Atomic numbers of the generated atoms. Not required when the model has a types-noiser or when ``formula`` is provided. :param formula: Chemical formula (e.g. ``"H2O"``). Used to derive ``n_atoms`` and ``atomic_numbers`` when they are not provided explicitly. :param positions: Fixed positions of the atoms (shape ``(n_atoms, 3)``). Required when no positions-noiser is configured (type-only diffusion). Positions will not be modified during sampling. :param cell: Unit-cell matrix (3×3 array or flat length-9 array). Not required when ``template`` is provided (the template's cell is used instead). :param pbc: Periodic boundary conditions as a length-3 boolean array (e.g. ``[True, True, False]``). When ``template`` is provided its ``pbc`` is used unless this argument is given explicitly. Defaults to ``[True, True, True]`` (fully periodic) when neither ``template`` nor ``pbc`` is supplied. :param template: Template structure. May be an :class:`~agedi.AtomsGraph` or an ASE :class:`~ase.Atoms` object; the latter is automatically converted to an :class:`~agedi.AtomsGraph` (with ``confinement`` applied when provided). When given, ``cell`` and ``pbc`` are taken from the template unless explicitly provided. :param ff_guidance: Force-field guidance configuration. When ``None`` (default) a :class:`~agedi.diffusion.ForcefieldGuidanceConfig` with default values is used (i.e. guidance is disabled). :param compile: When ``True``, use ``torch.compile`` on the reverse diffusion step for faster sampling. Before the sampling loop starts, the maximum number of neighbors and cell-list dimensions are estimated automatically via NVIDIA nvalchemiops (``estimate_max_neighbors`` and ``estimate_cell_list_sizes``), and all neighbor-list buffers are pre-allocated with fixed shapes. Requires NVIDIA nvalchemiops. Defaults to ``False``. :param print_timings: When ``True``, print a per-stage timing breakdown at the end of each sampling batch (graph init, score model, denoise, neighbor list, etc.). Defaults to ``False``. .. py:function:: create_trainer(*, epochs: int = -1, max_time: Optional[Union[int, Dict, datetime.timedelta]] = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: Optional[int] = None, repeat_epoch: Optional[int] = None, hparams: Optional[Dict] = None, extra_callbacks: Optional[List[lightning.pytorch.callbacks.Callback]] = None) -> lightning.Trainer Create a Lightning trainer configured for AGeDi. :param epochs: Maximum number of training epochs (``-1`` = unlimited). :param max_time: Wall-clock time limit for training. Accepts: * ``int`` – number of *hours* (e.g. ``24`` ≡ 24 hours). * ``dict`` – Lightning-style mapping, e.g. ``{"days": 0, "hours": 12, "minutes": 30, "seconds": 0}``. * :class:`datetime.timedelta` – a Python timedelta object. * ``None`` – no time limit. :param accelerator: Hardware accelerator to use (e.g. ``"auto"``, ``"gpu"``, ``"cpu"``). Default: ``"auto"``. :param devices: Number of devices to train on. Default: ``1``. :param logger: Logging backend: ``"tensorboard"`` (default) or ``"wandb"``. :param log_dir: Root directory for logs and checkpoints. Default: ``"logs"``. :param project: WandB project name (only used when ``logger="wandb"``). :param name: Experiment display name used by TensorBoard and WandB as the run sub-directory / run name. Default: ``"agedi"``. :param log_interval: How often (in steps) to log metrics. Default: ``10``. :param gradient_clip_val: Maximum gradient norm for gradient clipping. Default: ``10.0``. :param progress_bar: Whether to show a Lightning progress bar. Default: ``False``. :param print_epoch_interval: Print a one-line training summary to stdout every this many epochs. Set to ``0`` to disable. Default: ``10``. :param log_grad_norm: Whether to log the total gradient norm during training. Disable for large models where the per-step overhead is undesirable. Default: ``True``. :param repeat: Number of repetition levels for cell-repeat data augmentation. Must be set together with *repeat_epoch*. When ``None`` (default), no repetition augmentation is applied. :param repeat_epoch: How many epochs between repetition-level increases. Required when *repeat* is set. :param hparams: Hyperparameters dict logged to ``hparams.yaml`` via :class:`~agedi.data.callbacks.HParamsMetricLogger`. When ``None`` (default), no extra hyperparameter logging is performed. :param extra_callbacks: Extra Lightning callbacks to append to the default callback list. When ``None`` (default) only the built-in callbacks are used. :returns: A configured :class:`~lightning.Trainer` ready to call ``trainer.fit(diffusion, dataset)``. :rtype: lightning.Trainer .. py:function:: train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: Optional[lightning.Trainer] = None, ckpt_path: Optional[Union[str, pathlib.Path]] = None, **trainer_kwargs) -> lightning.Trainer Train a diffusion model and return the trainer used. :param diffusion: The diffusion model to train. :param dataset: The dataset to train on. :param trainer: A pre-configured Lightning :class:`~lightning.Trainer`. When ``None`` a new trainer is created from *trainer_kwargs*. :param ckpt_path: Path to a Lightning checkpoint (``.ckpt``) to resume training from. When provided the full training state (model weights, optimiser, LR-scheduler, and epoch counter) is restored before fitting. Equivalent to passing ``ckpt_path`` to ``trainer.fit()``. :param \*\*trainer_kwargs: Additional keyword arguments forwarded to :func:`create_trainer` when *trainer* is ``None``. .. py:function:: train_from_atoms(data: Sequence[ase.Atoms], *, model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[str] = ('CellPositions', ), sde: Union[str, SDE] = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, force_field: bool = False, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, repeat: Optional[int] = None, canonical_cell: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, data_path: Optional[str] = None, regressor_data: Optional[Sequence[ase.Atoms]] = None, checkpoint: Optional[Union[str, pathlib.Path]] = None, trainer: Optional[lightning.Trainer] = None, n_classes: Optional[int] = None, **trainer_kwargs) -> Tuple[Agedi, agedi.data.Dataset, lightning.Trainer] Build (or restore), train, and return an AGeDi model from ASE Atoms data. When a ``"Types"`` noiser is included and no *checkpoint* is given, the unique element types present in *data* are automatically detected and a compact type map is built so that the vocabulary size equals the number of distinct element types (plus the absorbing state at index 0). The ``n_classes`` parameter can be used to restrict the vocabulary to the *n_classes* most frequently occurring element types (sorted by atomic number). :param data: ASE :class:`~ase.Atoms` objects to train on. :param model: GNN backbone architecture name. Looked up in the model registry; use :func:`register_model` to add custom backends. Default: ``"PaiNN"`` (SchNetPack PaiNN). :param cutoff: Neighbour-list cutoff radius in Å. Default: ``6.0``. :param feature_size: Embedding / feature dimension. Default: ``64``. :param n_blocks: Number of interaction blocks in the GNN backbone. Default: ``4``. :param n_rbf: Number of radial basis functions. Default: ``30``. :param noisers: Sequence of noiser identifiers. Recognised string identifiers: ``"Positions"``, ``"CellPositions"``, ``"ConfinedCellPositions"``, ``"Types"`` (snake_case aliases also accepted). Default: ``("CellPositions",)``. :param sde: SDE for position noisers. Short aliases: ``"ve"`` (default), ``"vp"``. Pass an instantiated :class:`~agedi.diffusion.sdes.SDE` for full control. :param conditioning: Per-structure property to condition on (read from ``atoms.info[conditioning]`` or ``atoms.get_()``), or ``"none"`` for time-only conditioning (default). :param conditioning_type: Type of the conditioning module: ``"scalar"`` (default) or ``"integer"``. :param mask: Atom-masking strategy: ``"MaskFixed"`` (freeze atoms tagged with ASE :class:`~ase.constraints.FixAtoms`) or ``"none"`` (default). :param confinement: Z-direction confinement bounds ``(z_min, z_max)`` in Å. Required when using the ``"ConfinedCellPositions"`` noiser. :param force_field: When ``True``, attach a regressor head (sharing the backbone) that predicts per-atom forces and total energy. Enables force-field guided sampling via :class:`~agedi.diffusion.ForcefieldGuidanceConfig`. The training data must contain DFT (or other) forces and energy. Default: ``False``. :param batch_size: Mini-batch size used during training. Default: ``64``. :param train_split: Fraction or absolute count of structures for the training split. Default: ``0.9``. :param val_split: Fraction or absolute count of structures for the validation split. Default: ``0.1``. :param repeat: When given, augment the dataset by repeating each structure up to ``repeat`` times along the first two cell vectors. Requires ``repeat_epoch`` (passed via ``**trainer_kwargs``) to specify how often the repetition level increases. :param canonical_cell: Store unit cells in canonical lower-triangular form. Default: ``False``. :param lr: Learning rate. Default: ``1e-4``. :param lr_factor: LR-scheduler reduction factor. Default: ``0.95``. :param lr_patience: LR-scheduler patience (epochs). Default: ``100``. :param weight_decay: Optimiser weight decay. Default: ``0.0``. :param eps: Minimum diffusion time value. Default: ``1e-5``. :param guidance_weight: Classifier-free guidance weight. Default: ``-1.0`` (disabled). :param data_path: String path to the training data file; stored in ``hparams.yaml`` for reference only. When ``None``, no path metadata is saved. :param regressor_data: Optional additional ASE Atoms objects used *exclusively* for training the force-field regressor head. Structures here are never passed through the diffusion loss. Each structure must have an ASE calculator with energy and forces attached. :param checkpoint: Path to a previously saved run directory (containing ``hparams.yaml``) or directly to a ``.ckpt`` checkpoint file. When provided the model architecture and weights are loaded from the checkpoint instead of being built from the architecture parameters (*model*, *cutoff*, *feature_size*, etc.). The full training state (optimiser, LR-scheduler, epoch counter) is also restored so that training continues seamlessly. Supply *data* to train on new data, or use the original data path to resume on the same dataset. :param trainer: A pre-configured Lightning :class:`~lightning.Trainer`. When ``None`` (default) a new trainer is built from ``**trainer_kwargs``. :param n_classes: Number of element-type classes to use for the :class:`~agedi.diffusion.noisers.Types` noiser (not counting the absorbing state at index 0). When ``None`` (default), all distinct element types present in *data* are used. Must not exceed the number of distinct types in the training data. Ignored when *checkpoint* is provided (the vocabulary is loaded from the checkpoint). :param \*\*trainer_kwargs: Additional keyword arguments forwarded to :func:`create_trainer` when *trainer* is ``None``. Common keys: ``epochs``, ``max_time``, ``logger``, ``log_dir``, ``gradient_clip_val``, ``repeat_epoch``. :returns: The trained diffusion model, the dataset, and the Lightning trainer. :rtype: Tuple[Agedi, Dataset, Trainer] .. py:function:: train_from_config(config: Union[str, pathlib.Path, Dict]) -> Tuple[Agedi, agedi.data.Dataset, lightning.Trainer] Train an AGeDi model from a YAML configuration file or dictionary. This is the *Hydra-style* entry point. The configuration can be provided as: * a path to a YAML file (``str`` or :class:`~pathlib.Path`), * a plain Python ``dict``, * a Hydra / OmegaConf ``DictConfig``. The function loads the training data from ``config["data_path"]`` (an ASE-readable file) and delegates to :func:`train_from_atoms` with the remaining configuration values. The minimal required key is ``data_path``. All other keys are optional and fall back to the same defaults as :func:`train_from_atoms`. A ready-to-edit template is shipped with the package at ``agedi/conf/train.yaml``. :param config: Configuration source – a YAML file path, a ``dict``, or an OmegaConf ``DictConfig``. :returns: The trained diffusion model, the dataset used, and the Lightning trainer. :rtype: Tuple[Agedi, Dataset, Trainer] .. rubric:: Examples Minimal Python usage:: from agedi import train_from_config diffusion, dataset, trainer = train_from_config("conf/train.yaml") Programmatic override:: from agedi import train_from_config cfg = {"data_path": "train.traj", "epochs": 50, "feature_size": 128} diffusion, _, _ = train_from_config(cfg)