agedi.api
=========

.. py:module:: agedi.api

.. autoapi-nested-parse::

   Public API for AGeDi.

   Re-exports all public symbols from the api sub-modules so that
   ``from agedi.api import X`` works for every user-facing name.


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/agedi/api/_display/index
   /autoapi/agedi/api/_registry/index
   /autoapi/agedi/api/dataset/index
   /autoapi/agedi/api/diffusion/index
   /autoapi/agedi/api/prediction/index
   /autoapi/agedi/api/sampling/index
   /autoapi/agedi/api/training/index


Functions
---------

.. autoapisummary::

   agedi.api.register_model
   agedi.api.create_dataset
   agedi.api.create_diffusion
   agedi.api.load_diffusion
   agedi.api.predict
   agedi.api.sample
   agedi.api.create_trainer
   agedi.api.train
   agedi.api.train_from_atoms
   agedi.api.train_from_config


Package Contents
----------------

.. py:function:: register_model(name: str, factory: Callable) -> None

   Register a custom score model backbone factory under *name*.

   The factory is called with the keyword arguments ``cutoff``,
   ``heads``, ``feature_size``, ``n_blocks``, ``head_dim``, and ``n_rbf``
   and must return a 3-tuple ``(translator, representation, List[Head])``.

   Registered models can be selected by passing ``model=name`` to
   :func:`create_diffusion`.

   :param name: Alias used to select this backend (e.g. ``"PaiNN"``).
   :type name: str
   :param factory:
                   Factory function with signature::

                       factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf)
                           -> Tuple[Translator, nn.Module, List[Head]]
   :type factory: Callable

   .. rubric:: Examples

   ::

       from agedi.functional import register_model

       def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf):
           ...
           return translator, representation, head_list

       register_model("MyModel", my_factory)


.. py:function:: create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: Optional[int] = None, canonical_cell: bool = False, regressor_data: Optional[Sequence[ase.Atoms]] = None, properties: Optional[List[Dict]] = None) -> agedi.data.Dataset

   Create and setup an AGeDi Dataset from ASE Atoms objects.

   :param data: ASE Atoms objects to add to the dataset.
   :type data: Sequence[Atoms]
   :param cutoff: Neighbour-list cutoff radius in Ångström.
   :type cutoff: float, optional
   :param batch_size: Mini-batch size used during training/validation.
   :type batch_size: int, optional
   :param train_split: Fraction or absolute number of samples for the training split.
   :type train_split: Union[float, int], optional
   :param val_split: Fraction or absolute number of samples for the validation split.
   :type val_split: Union[float, int], optional
   :param mask: Atom-mask method (e.g. ``"MaskFixed"`` or ``"none"``).
   :type mask: str, optional
   :param confinement: Z-axis confinement bounds ``(z_min, z_max)``.
   :type confinement: Tuple[float, float], optional
   :param conditioning: Name of the per-structure property to use as a conditioning signal.
                        The value is read from ``atoms.info[conditioning]`` or the
                        corresponding ``atoms.get_<conditioning>()`` method.  Ignored when
                        set to ``"none"`` (default).
   :type conditioning: str, optional
   :param conditioning_type: ``"scalar"`` (default) or ``"node"``; controls how the conditioning
                             property is broadcast onto the graph.
   :type conditioning_type: str, optional
   :param repeat: When given, augment the dataset by repeating each structure up to
                  ``repeat`` times along the first two cell vectors.
   :type repeat: int, optional
   :param canonical_cell: Store cells in canonical lower-triangular form.
   :type canonical_cell: bool, optional
   :param regressor_data: Additional ASE Atoms objects used to train a regressor head.
   :type regressor_data: Sequence[Atoms], optional
   :param properties: Per-structure property dictionaries; **must** contain exactly one
                      entry per element in *data*.  Each dictionary is merged into the
                      corresponding graph object via ``setattr``, matching the layout
                      accepted by :meth:`~agedi.data.Dataset.add_atoms_data`.  Keys
                      already produced by the *conditioning* logic are overwritten by
                      values in *properties* when both are present.
   :type properties: List[Dict], optional

   :returns: A fully set-up :class:`~agedi.data.Dataset` ready for training.
   :rtype: Dataset


.. py:function:: create_diffusion(model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[Union[str, Noiser]] = ('CellPositions', ), sde: Union[str, SDE] = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Optional[Tuple[float, float]] = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: Optional[Union[str, torch.device]] = None, type_map: Optional[List[int]] = None) -> agedi.Agedi

   Create a diffusion model for script-based training and sampling.

   :param model: GNN backbone architecture.  The name is looked up in the model
                 registry; use :func:`register_model` to add custom backends.
                 The built-in default is ``"PaiNN"`` (SchNetPack PaiNN).
   :type model: str, optional
   :param cutoff: Neighbour-list cutoff radius in Å.  Defaults to ``6.0``.
   :type cutoff: float, optional
   :param feature_size: Embedding / feature dimension.  Defaults to ``64``.
   :type feature_size: int, optional
   :param n_blocks: Number of interaction blocks.  Defaults to ``4``.
   :type n_blocks: int, optional
   :param n_rbf: Number of radial basis functions.  Defaults to ``30``.
   :type n_rbf: int, optional
   :param noisers: Noiser identifiers or instances to include.  Defaults to
                   ``("CellPositions",)``.  Recognised string identifiers (CamelCase
                   preferred; snake_case aliases also accepted for backwards compatibility):

                   * ``"Positions"`` / ``"positions"`` – :class:`~agedi.diffusion.noisers.Positions`
                     (StandardNormal prior + Normal, for gas-phase clusters).
                   * ``"CellPositions"`` / ``"cell_positions"`` – :class:`~agedi.diffusion.noisers.CellPositions`
                     (UniformCell prior + Normal, for periodic bulk/surface systems).
                   * ``"ConfinedCellPositions"`` / ``"confined_cell_positions"`` –
                     :class:`~agedi.diffusion.noisers.ConfinedCellPositions`
                     (UniformCellConfined prior + TruncatedNormal, for Z-confined systems).
                   * ``"Types"`` / ``"types"`` – :class:`~agedi.diffusion.noisers.Types`.
   :type noisers: Sequence[str or Noiser], optional
   :param sde: SDE for position noisers.  Short aliases: ``"ve"`` (default),
               ``"vp"``.  Pass an instantiated
               :class:`~agedi.diffusion.sdes.SDE` for full control.
   :type sde: str or SDE, optional
   :param conditioning: Property to condition on, or ``"none"`` for time-only
                        conditioning.  Defaults to ``"none"``.
   :type conditioning: str, optional
   :param conditioning_type: Type of the conditioning module: ``"scalar"`` or ``"integer"``.
                             Defaults to ``"scalar"``.
   :type conditioning_type: str, optional
   :param confinement: Z-direction confinement bounds ``(z_min, z_max)`` in Å.
   :type confinement: Tuple[float, float], optional
   :param force_field: When ``True``, attach a ``diffusion.regressor_model``.  The heads **shares** the
                       same representation and translator as the score model so that atomic
                       embeddings are learned jointly.  It is trained whenever the training
                       batch contains per-atom forces and total energies (i.e. the ASE training structures have
                       DFT (or other) energy and forces).  The trained forces head enables force-field guided
                       sampling via :class:`~agedi.diffusion.ForcefieldGuidanceConfig`.
                       Defaults to ``False``.
   :type force_field: bool, optional
   :param lr: Learning rate.  Defaults to ``1e-4``.
   :type lr: float, optional
   :param lr_factor: LR-scheduler reduction factor.  Defaults to ``0.95``.
   :type lr_factor: float, optional
   :param lr_patience: LR-scheduler patience (epochs).  Defaults to ``100``.
   :type lr_patience: int, optional
   :param weight_decay: Optimizer weight-decay.  Defaults to ``0.0``.
   :type weight_decay: float, optional
   :param eps: Minimum diffusion time.  Defaults to ``1e-5``.
   :type eps: float, optional
   :param guidance_weight: Classifier-free guidance weight.  Defaults to ``-1.0`` (disabled).
   :type guidance_weight: float, optional
   :param device: Target compute device.  When ``None`` CUDA is used if available,
                  otherwise CPU.
   :type device: str or torch.device, optional
   :param type_map: Compact type map for the :class:`~agedi.diffusion.noisers.Types`
                    noiser.  ``type_map[0]`` must be ``0`` (absorbing state) and
                    ``type_map[i]`` is the atomic number for compact index ``i``.
                    When provided, the ``Types`` noiser and the ``TypesScore`` head use
                    a reduced vocabulary of size ``len(type_map)`` instead of the
                    default 100.  Auto-populated by :func:`train_from_atoms` when a
                    ``"Types"`` noiser is requested.
   :type type_map: List[int], optional

   :returns: A freshly initialised :class:`~agedi.Agedi` model.
   :rtype: Agedi


.. py:function:: load_diffusion(path: Union[str, pathlib.Path], checkpoint: Optional[Union[str, pathlib.Path]] = None, device: Optional[Union[str, torch.device]] = None) -> Agedi

   Load a trained diffusion model from an AGeDi log directory.

   The model architecture is fully reconstructed from the Hydra-compatible
   ``diffusion`` config stored in ``hparams.yaml``, so no additional
   parameters are needed.

   :param path: Path to the AGeDi log / model directory (or directly to the
                ``hparams.yaml`` file).
   :param checkpoint: Path to a specific checkpoint file.  When ``None`` the latest
                      checkpoint (``checkpoints/last_model.ckpt``) is loaded automatically.
   :param device: Device to load the model onto.  When ``None`` CUDA is used if
                  available, otherwise CPU.


.. py:function:: predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: Optional[float] = None) -> List[ase.Atoms]

   Predict energies and forces for input structures using a trained force-field.

   The model must have been trained with ``force_field=True`` (i.e. it must
   have a ``regressor_model`` attached).  The predicted energy and forces are
   attached to the returned :class:`~ase.Atoms` objects via an
   :class:`~ase.calculators.singlepoint.SinglePointCalculator`.

   :param diffusion: A trained :class:`~agedi.Agedi` model with a force-field
                     regressor (trained with ``--force_field``).
   :param structures: Input ASE :class:`~ase.Atoms` objects to run predictions on.
   :param batch_size: Number of structures per inference batch.  Defaults to ``64``.
   :param cutoff: Neighbour-list cutoff in Å.  When ``None`` (default), the cutoff is
                  read from the model's representation automatically.

   :returns: The input structures with a
             :class:`~ase.calculators.singlepoint.SinglePointCalculator` attached
             containing the predicted energy and/or forces.
   :rtype: List[Atoms]

   :raises ValueError: If the model does not have a force-field regressor.


.. py:function:: sample(diffusion: Agedi, *, n_samples: int, n_atoms: Optional[int] = None, atomic_numbers: Optional[List[int]] = None, formula: Optional[str] = None, positions: Optional[numpy.ndarray] = None, cell: Optional[numpy.ndarray] = None, pbc: Optional[numpy.ndarray] = None, template: Optional[Union[agedi.data.AtomsGraph, ase.Atoms]] = None, confinement: Optional[Tuple[float, float]] = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: Optional[agedi.diffusion.ForcefieldGuidanceConfig] = None, property: Optional[Dict[str, float]] = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True) -> Union[List[agedi.data.AtomsGraph], List[ase.Atoms], List[List[agedi.data.AtomsGraph]], List[List[ase.Atoms]]]

   Sample structures from a trained diffusion model.

   :param diffusion: A trained :class:`~agedi.Agedi` model.
   :param n_samples: Number of structures to generate.
   :param n_atoms: Number of atoms per structure. Automatically determined from
                   ``formula`` if provided, or from the length of ``atomic_numbers``
                   when ``n_atoms`` is not explicitly given.
   :param atomic_numbers: Atomic numbers of the generated atoms.  Not required when the model
                          has a types-noiser or when ``formula`` is provided.
   :param formula: Chemical formula (e.g. ``"H2O"``).  Used to derive ``n_atoms`` and
                   ``atomic_numbers`` when they are not provided explicitly.
   :param positions: Fixed positions of the atoms (shape ``(n_atoms, 3)``).  Required
                     when no positions-noiser is configured (type-only diffusion).
                     Positions will not be modified during sampling.
   :param cell: Unit-cell matrix (3×3 array or flat length-9 array).  Not required
                when ``template`` is provided (the template's cell is used instead).
   :param pbc: Periodic boundary conditions as a length-3 boolean array (e.g.
               ``[True, True, False]``).  When ``template`` is provided its ``pbc``
               is used unless this argument is given explicitly.  Defaults to
               ``[True, True, True]`` (fully periodic) when neither ``template``
               nor ``pbc`` is supplied.
   :param template: Template structure.  May be an :class:`~agedi.AtomsGraph` or an
                    ASE :class:`~ase.Atoms` object; the latter is automatically converted
                    to an :class:`~agedi.AtomsGraph` (with ``confinement`` applied when
                    provided).  When given, ``cell`` and ``pbc`` are taken from the
                    template unless explicitly provided.
   :param ff_guidance: Force-field guidance configuration.  When ``None`` (default) a
                       :class:`~agedi.diffusion.ForcefieldGuidanceConfig` with default
                       values is used (i.e. guidance is disabled).
   :param compile: When ``True``, use ``torch.compile`` on the reverse diffusion step
                   for faster sampling.  Before the sampling loop starts, the maximum
                   number of neighbors and cell-list dimensions are estimated
                   automatically via NVIDIA nvalchemiops
                   (``estimate_max_neighbors`` and ``estimate_cell_list_sizes``), and
                   all neighbor-list buffers are pre-allocated with fixed shapes.
                   Requires NVIDIA nvalchemiops.  Defaults to ``False``.
   :param print_timings: When ``True``, print a per-stage timing breakdown at the end of
                         each sampling batch (graph init, score model, denoise, neighbor
                         list, etc.).  Defaults to ``False``.


.. py:function:: create_trainer(*, epochs: int = -1, max_time: Optional[Union[int, Dict, datetime.timedelta]] = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: Optional[int] = None, repeat_epoch: Optional[int] = None, hparams: Optional[Dict] = None, extra_callbacks: Optional[List[lightning.pytorch.callbacks.Callback]] = None) -> lightning.Trainer

   Create a Lightning trainer configured for AGeDi.

   :param epochs: Maximum number of training epochs (``-1`` = unlimited).
   :param max_time: Wall-clock time limit for training.  Accepts:

                    * ``int``   – number of *hours* (e.g. ``24`` ≡ 24 hours).
                    * ``dict``  – Lightning-style mapping, e.g.
                      ``{"days": 0, "hours": 12, "minutes": 30, "seconds": 0}``.
                    * :class:`datetime.timedelta` – a Python timedelta object.
                    * ``None``  – no time limit.
   :param accelerator: Hardware accelerator to use (e.g. ``"auto"``, ``"gpu"``, ``"cpu"``).
                       Default: ``"auto"``.
   :param devices: Number of devices to train on.  Default: ``1``.
   :param logger: Logging backend: ``"tensorboard"`` (default) or ``"wandb"``.
   :param log_dir: Root directory for logs and checkpoints.  Default: ``"logs"``.
   :param project: WandB project name (only used when ``logger="wandb"``).
   :param name: Experiment display name used by TensorBoard and WandB as the
                run sub-directory / run name.  Default: ``"agedi"``.
   :param log_interval: How often (in steps) to log metrics.  Default: ``10``.
   :param gradient_clip_val: Maximum gradient norm for gradient clipping.  Default: ``10.0``.
   :param progress_bar: Whether to show a Lightning progress bar.  Default: ``False``.
   :param print_epoch_interval: Print a one-line training summary to stdout every this many epochs.
                                Set to ``0`` to disable.  Default: ``10``.
   :param log_grad_norm: Whether to log the total gradient norm during training.
                         Disable for large models where the per-step overhead is undesirable.
                         Default: ``True``.
   :param repeat: Number of repetition levels for cell-repeat data augmentation.
                  Must be set together with *repeat_epoch*.  When ``None`` (default),
                  no repetition augmentation is applied.
   :param repeat_epoch: How many epochs between repetition-level increases.  Required when
                        *repeat* is set.
   :param hparams: Hyperparameters dict logged to ``hparams.yaml`` via
                   :class:`~agedi.data.callbacks.HParamsMetricLogger`.  When ``None``
                   (default), no extra hyperparameter logging is performed.
   :param extra_callbacks: Extra Lightning callbacks to append to the default callback list.
                           When ``None`` (default) only the built-in callbacks are used.

   :returns: A configured :class:`~lightning.Trainer` ready to call
             ``trainer.fit(diffusion, dataset)``.
   :rtype: lightning.Trainer


.. py:function:: train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: Optional[lightning.Trainer] = None, ckpt_path: Optional[Union[str, pathlib.Path]] = None, **trainer_kwargs) -> lightning.Trainer

   Train a diffusion model and return the trainer used.

   :param diffusion: The diffusion model to train.
   :param dataset: The dataset to train on.
   :param trainer: A pre-configured Lightning :class:`~lightning.Trainer`.  When
                   ``None`` a new trainer is created from *trainer_kwargs*.
   :param ckpt_path: Path to a Lightning checkpoint (``.ckpt``) to resume training from.
                     When provided the full training state (model weights, optimiser,
                     LR-scheduler, and epoch counter) is restored before fitting.
                     Equivalent to passing ``ckpt_path`` to ``trainer.fit()``.
   :param \*\*trainer_kwargs: Additional keyword arguments forwarded to :func:`create_trainer`
                              when *trainer* is ``None``.


.. py:function:: train_from_atoms(data: Sequence[ase.Atoms], *, model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[str] = ('CellPositions', ), sde: Union[str, SDE] = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, force_field: bool = False, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, repeat: Optional[int] = None, canonical_cell: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, data_path: Optional[str] = None, regressor_data: Optional[Sequence[ase.Atoms]] = None, checkpoint: Optional[Union[str, pathlib.Path]] = None, trainer: Optional[lightning.Trainer] = None, n_classes: Optional[int] = None, **trainer_kwargs) -> Tuple[Agedi, agedi.data.Dataset, lightning.Trainer]

   Build (or restore), train, and return an AGeDi model from ASE Atoms data.

   When a ``"Types"`` noiser is included and no *checkpoint* is given, the
   unique element types present in *data* are automatically detected and a
   compact type map is built so that the vocabulary size equals the number of
   distinct element types (plus the absorbing state at index 0).  The
   ``n_classes`` parameter can be used to restrict the vocabulary to the
   *n_classes* most frequently occurring element types (sorted by atomic
   number).

   :param data: ASE :class:`~ase.Atoms` objects to train on.
   :param model: GNN backbone architecture name.  Looked up in the model registry;
                 use :func:`register_model` to add custom backends.  Default:
                 ``"PaiNN"`` (SchNetPack PaiNN).
   :param cutoff: Neighbour-list cutoff radius in Å.  Default: ``6.0``.
   :param feature_size: Embedding / feature dimension.  Default: ``64``.
   :param n_blocks: Number of interaction blocks in the GNN backbone.  Default: ``4``.
   :param n_rbf: Number of radial basis functions.  Default: ``30``.
   :param noisers: Sequence of noiser identifiers.  Recognised string identifiers:
                   ``"Positions"``, ``"CellPositions"``, ``"ConfinedCellPositions"``,
                   ``"Types"`` (snake_case aliases also accepted).
                   Default: ``("CellPositions",)``.
   :param sde: SDE for position noisers.  Short aliases: ``"ve"`` (default),
               ``"vp"``.  Pass an instantiated
               :class:`~agedi.diffusion.sdes.SDE` for full control.
   :param conditioning: Per-structure property to condition on (read from
                        ``atoms.info[conditioning]`` or ``atoms.get_<conditioning>()``),
                        or ``"none"`` for time-only conditioning (default).
   :param conditioning_type: Type of the conditioning module: ``"scalar"`` (default) or
                             ``"integer"``.
   :param mask: Atom-masking strategy: ``"MaskFixed"`` (freeze atoms tagged with
                ASE :class:`~ase.constraints.FixAtoms`) or ``"none"`` (default).
   :param confinement: Z-direction confinement bounds ``(z_min, z_max)`` in Å.  Required
                       when using the ``"ConfinedCellPositions"`` noiser.
   :param force_field: When ``True``, attach a regressor head (sharing the backbone) that
                       predicts per-atom forces and total energy.  Enables force-field
                       guided sampling via :class:`~agedi.diffusion.ForcefieldGuidanceConfig`.
                       The training data must contain DFT (or other) forces and energy.
                       Default: ``False``.
   :param batch_size: Mini-batch size used during training.  Default: ``64``.
   :param train_split: Fraction or absolute count of structures for the training split.
                       Default: ``0.9``.
   :param val_split: Fraction or absolute count of structures for the validation split.
                     Default: ``0.1``.
   :param repeat: When given, augment the dataset by repeating each structure up to
                  ``repeat`` times along the first two cell vectors.  Requires
                  ``repeat_epoch`` (passed via ``**trainer_kwargs``) to specify how
                  often the repetition level increases.
   :param canonical_cell: Store unit cells in canonical lower-triangular form.  Default:
                          ``False``.
   :param lr: Learning rate.  Default: ``1e-4``.
   :param lr_factor: LR-scheduler reduction factor.  Default: ``0.95``.
   :param lr_patience: LR-scheduler patience (epochs).  Default: ``100``.
   :param weight_decay: Optimiser weight decay.  Default: ``0.0``.
   :param eps: Minimum diffusion time value.  Default: ``1e-5``.
   :param guidance_weight: Classifier-free guidance weight.  Default: ``-1.0`` (disabled).
   :param data_path: String path to the training data file; stored in ``hparams.yaml``
                     for reference only.  When ``None``, no path metadata is saved.
   :param regressor_data: Optional additional ASE Atoms objects used *exclusively* for
                          training the force-field regressor head.  Structures here are never
                          passed through the diffusion loss.  Each structure must have an ASE
                          calculator with energy and forces attached.
   :param checkpoint: Path to a previously saved run directory (containing ``hparams.yaml``)
                      or directly to a ``.ckpt`` checkpoint file.  When provided the model
                      architecture and weights are loaded from the checkpoint instead of
                      being built from the architecture parameters (*model*, *cutoff*,
                      *feature_size*, etc.).  The full training state (optimiser,
                      LR-scheduler, epoch counter) is also restored so that training
                      continues seamlessly.  Supply *data* to train on new data, or use
                      the original data path to resume on the same dataset.
   :param trainer: A pre-configured Lightning :class:`~lightning.Trainer`.  When
                   ``None`` (default) a new trainer is built from ``**trainer_kwargs``.
   :param n_classes: Number of element-type classes to use for the
                     :class:`~agedi.diffusion.noisers.Types` noiser (not counting the
                     absorbing state at index 0).  When ``None`` (default), all distinct
                     element types present in *data* are used.  Must not exceed the number
                     of distinct types in the training data.  Ignored when *checkpoint* is
                     provided (the vocabulary is loaded from the checkpoint).
   :param \*\*trainer_kwargs: Additional keyword arguments forwarded to :func:`create_trainer`
                              when *trainer* is ``None``.  Common keys: ``epochs``, ``max_time``,
                              ``logger``, ``log_dir``, ``gradient_clip_val``, ``repeat_epoch``.

   :returns: The trained diffusion model, the dataset, and the Lightning trainer.
   :rtype: Tuple[Agedi, Dataset, Trainer]


.. py:function:: train_from_config(config: Union[str, pathlib.Path, Dict]) -> Tuple[Agedi, agedi.data.Dataset, lightning.Trainer]

   Train an AGeDi model from a YAML configuration file or dictionary.

   This is the *Hydra-style* entry point.  The configuration can be provided
   as:

   * a path to a YAML file (``str`` or :class:`~pathlib.Path`),
   * a plain Python ``dict``,
   * a Hydra / OmegaConf ``DictConfig``.

   The function loads the training data from ``config["data_path"]`` (an
   ASE-readable file) and delegates to :func:`train_from_atoms` with the
   remaining configuration values.

   The minimal required key is ``data_path``.  All other keys are optional
   and fall back to the same defaults as :func:`train_from_atoms`.

   A ready-to-edit template is shipped with the package at
   ``agedi/conf/train.yaml``.

   :param config: Configuration source – a YAML file path, a ``dict``, or an OmegaConf
                  ``DictConfig``.

   :returns: The trained diffusion model, the dataset used, and the Lightning
             trainer.
   :rtype: Tuple[Agedi, Dataset, Trainer]

   .. rubric:: Examples

   Minimal Python usage::

       from agedi import train_from_config
       diffusion, dataset, trainer = train_from_config("conf/train.yaml")

   Programmatic override::

       from agedi import train_from_config
       cfg = {"data_path": "train.traj", "epochs": 50, "feature_size": 128}
       diffusion, _, _ = train_from_config(cfg)