agedi.functional

Backward-compatibility shim.

All public symbols are now implemented in agedi.api. This module re-exports them so that existing code using from agedi.functional import X continues to work unchanged.

Functions

create_dataset(→ agedi.data.Dataset)

Create and setup an AGeDi Dataset from ASE Atoms objects.

create_diffusion(, sde, SDE] =, conditioning, ...)

Create a diffusion model for script-based training and sampling.

create_trainer(→ lightning.Trainer)

Create a Lightning trainer configured for AGeDi.

load_diffusion(→ Agedi)

Load a trained diffusion model from an AGeDi log directory.

predict(→ List[ase.Atoms])

Predict energies and forces for input structures using a trained force-field.

register_model(→ None)

Register a custom score model backbone factory under name.

sample(→ Union[List[agedi.data.AtomsGraph], ...)

Sample structures from a trained diffusion model.

train(→ lightning.Trainer)

Train a diffusion model and return the trainer used.

_build_type_map_from_data(→ List[int])

Build a compact type map from the element types present in training data.

train_from_atoms(*args, **kwargs)

train_from_config(*args, **kwargs)

Module Contents

agedi.functional.create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: float | int = 0.9, val_split: float | int = 0.1, mask: str = 'none', confinement: Tuple[float, float] | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: int | None = None, canonical_cell: bool = False, regressor_data: Sequence[ase.Atoms] | None = None, properties: List[Dict] | None = None) agedi.data.Dataset

Create and setup an AGeDi Dataset from ASE Atoms objects.

Parameters:
  • data (Sequence[Atoms]) – ASE Atoms objects to add to the dataset.

  • cutoff (float, optional) – Neighbour-list cutoff radius in Ångström.

  • batch_size (int, optional) – Mini-batch size used during training/validation.

  • train_split (Union[float, int], optional) – Fraction or absolute number of samples for the training split.

  • val_split (Union[float, int], optional) – Fraction or absolute number of samples for the validation split.

  • mask (str, optional) – Atom-mask method (e.g. "MaskFixed" or "none").

  • confinement (Tuple[float, float], optional) – Z-axis confinement bounds (z_min, z_max).

  • conditioning (str, optional) – Name of the per-structure property to use as a conditioning signal. The value is read from atoms.info[conditioning] or the corresponding atoms.get_<conditioning>() method. Ignored when set to "none" (default).

  • conditioning_type (str, optional) – "scalar" (default) or "node"; controls how the conditioning property is broadcast onto the graph.

  • repeat (int, optional) – When given, augment the dataset by repeating each structure up to repeat times along the first two cell vectors.

  • canonical_cell (bool, optional) – Store cells in canonical lower-triangular form.

  • regressor_data (Sequence[Atoms], optional) – Additional ASE Atoms objects used to train a regressor head.

  • properties (List[Dict], optional) – Per-structure property dictionaries; must contain exactly one entry per element in data. Each dictionary is merged into the corresponding graph object via setattr, matching the layout accepted by add_atoms_data(). Keys already produced by the conditioning logic are overwritten by values in properties when both are present.

Returns:

A fully set-up Dataset ready for training.

Return type:

Dataset

agedi.functional.create_diffusion(model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[str | Noiser] = ('CellPositions',), sde: str | SDE = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Tuple[float, float] | None = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: str | torch.device | None = None, type_map: List[int] | None = None) agedi.Agedi

Create a diffusion model for script-based training and sampling.

Parameters:
  • model (str, optional) – GNN backbone architecture. The name is looked up in the model registry; use register_model() to add custom backends. The built-in default is "PaiNN" (SchNetPack PaiNN).

  • cutoff (float, optional) – Neighbour-list cutoff radius in Å. Defaults to 6.0.

  • feature_size (int, optional) – Embedding / feature dimension. Defaults to 64.

  • n_blocks (int, optional) – Number of interaction blocks. Defaults to 4.

  • n_rbf (int, optional) – Number of radial basis functions. Defaults to 30.

  • noisers (Sequence[str or Noiser], optional) –

    Noiser identifiers or instances to include. Defaults to ("CellPositions",). Recognised string identifiers (CamelCase preferred; snake_case aliases also accepted for backwards compatibility):

    • "Positions" / "positions"Positions (StandardNormal prior + Normal, for gas-phase clusters).

    • "CellPositions" / "cell_positions"CellPositions (UniformCell prior + Normal, for periodic bulk/surface systems).

    • "ConfinedCellPositions" / "confined_cell_positions"ConfinedCellPositions (UniformCellConfined prior + TruncatedNormal, for Z-confined systems).

    • "Types" / "types"Types.

  • sde (str or SDE, optional) – SDE for position noisers. Short aliases: "ve" (default), "vp". Pass an instantiated SDE for full control.

  • conditioning (str, optional) – Property to condition on, or "none" for time-only conditioning. Defaults to "none".

  • conditioning_type (str, optional) – Type of the conditioning module: "scalar" or "integer". Defaults to "scalar".

  • confinement (Tuple[float, float], optional) – Z-direction confinement bounds (z_min, z_max) in Å.

  • force_field (bool, optional) – When True, attach a diffusion.regressor_model. The heads shares the same representation and translator as the score model so that atomic embeddings are learned jointly. It is trained whenever the training batch contains per-atom forces and total energies (i.e. the ASE training structures have DFT (or other) energy and forces). The trained forces head enables force-field guided sampling via ForcefieldGuidanceConfig. Defaults to False.

  • lr (float, optional) – Learning rate. Defaults to 1e-4.

  • lr_factor (float, optional) – LR-scheduler reduction factor. Defaults to 0.95.

  • lr_patience (int, optional) – LR-scheduler patience (epochs). Defaults to 100.

  • weight_decay (float, optional) – Optimizer weight-decay. Defaults to 0.0.

  • eps (float, optional) – Minimum diffusion time. Defaults to 1e-5.

  • guidance_weight (float, optional) – Classifier-free guidance weight. Defaults to -1.0 (disabled).

  • device (str or torch.device, optional) – Target compute device. When None CUDA is used if available, otherwise CPU.

  • type_map (List[int], optional) – Compact type map for the Types noiser. type_map[0] must be 0 (absorbing state) and type_map[i] is the atomic number for compact index i. When provided, the Types noiser and the TypesScore head use a reduced vocabulary of size len(type_map) instead of the default 100. Auto-populated by train_from_atoms() when a "Types" noiser is requested.

Returns:

A freshly initialised Agedi model.

Return type:

Agedi

agedi.functional.create_trainer(*, epochs: int = -1, max_time: int | Dict | datetime.timedelta | None = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: int | None = None, repeat_epoch: int | None = None, hparams: Dict | None = None, extra_callbacks: List[lightning.pytorch.callbacks.Callback] | None = None) lightning.Trainer

Create a Lightning trainer configured for AGeDi.

Parameters:
  • epochs – Maximum number of training epochs (-1 = unlimited).

  • max_time

    Wall-clock time limit for training. Accepts:

    • int – number of hours (e.g. 24 ≡ 24 hours).

    • dict – Lightning-style mapping, e.g. {"days": 0, "hours": 12, "minutes": 30, "seconds": 0}.

    • datetime.timedelta – a Python timedelta object.

    • None – no time limit.

  • accelerator – Hardware accelerator to use (e.g. "auto", "gpu", "cpu"). Default: "auto".

  • devices – Number of devices to train on. Default: 1.

  • logger – Logging backend: "tensorboard" (default) or "wandb".

  • log_dir – Root directory for logs and checkpoints. Default: "logs".

  • project – WandB project name (only used when logger="wandb").

  • name – Experiment display name used by TensorBoard and WandB as the run sub-directory / run name. Default: "agedi".

  • log_interval – How often (in steps) to log metrics. Default: 10.

  • gradient_clip_val – Maximum gradient norm for gradient clipping. Default: 10.0.

  • progress_bar – Whether to show a Lightning progress bar. Default: False.

  • print_epoch_interval – Print a one-line training summary to stdout every this many epochs. Set to 0 to disable. Default: 10.

  • log_grad_norm – Whether to log the total gradient norm during training. Disable for large models where the per-step overhead is undesirable. Default: True.

  • repeat – Number of repetition levels for cell-repeat data augmentation. Must be set together with repeat_epoch. When None (default), no repetition augmentation is applied.

  • repeat_epoch – How many epochs between repetition-level increases. Required when repeat is set.

  • hparams – Hyperparameters dict logged to hparams.yaml via HParamsMetricLogger. When None (default), no extra hyperparameter logging is performed.

  • extra_callbacks – Extra Lightning callbacks to append to the default callback list. When None (default) only the built-in callbacks are used.

Returns:

A configured Trainer ready to call trainer.fit(diffusion, dataset).

Return type:

lightning.Trainer

agedi.functional.load_diffusion(path: str | pathlib.Path, checkpoint: str | pathlib.Path | None = None, device: str | torch.device | None = None) Agedi

Load a trained diffusion model from an AGeDi log directory.

The model architecture is fully reconstructed from the Hydra-compatible diffusion config stored in hparams.yaml, so no additional parameters are needed.

Parameters:
  • path – Path to the AGeDi log / model directory (or directly to the hparams.yaml file).

  • checkpoint – Path to a specific checkpoint file. When None the latest checkpoint (checkpoints/last_model.ckpt) is loaded automatically.

  • device – Device to load the model onto. When None CUDA is used if available, otherwise CPU.

agedi.functional.predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: float | None = None) List[ase.Atoms]

Predict energies and forces for input structures using a trained force-field.

The model must have been trained with force_field=True (i.e. it must have a regressor_model attached). The predicted energy and forces are attached to the returned Atoms objects via an SinglePointCalculator.

Parameters:
  • diffusion – A trained Agedi model with a force-field regressor (trained with --force_field).

  • structures – Input ASE Atoms objects to run predictions on.

  • batch_size – Number of structures per inference batch. Defaults to 64.

  • cutoff – Neighbour-list cutoff in Å. When None (default), the cutoff is read from the model’s representation automatically.

Returns:

The input structures with a SinglePointCalculator attached containing the predicted energy and/or forces.

Return type:

List[Atoms]

Raises:

ValueError – If the model does not have a force-field regressor.

agedi.functional.register_model(name: str, factory: Callable) None

Register a custom score model backbone factory under name.

The factory is called with the keyword arguments cutoff, heads, feature_size, n_blocks, head_dim, and n_rbf and must return a 3-tuple (translator, representation, List[Head]).

Registered models can be selected by passing model=name to create_diffusion().

Parameters:
  • name (str) – Alias used to select this backend (e.g. "PaiNN").

  • factory (Callable) –

    Factory function with signature:

    factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf)
        -> Tuple[Translator, nn.Module, List[Head]]
    

Examples

from agedi.functional import register_model

def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf):
    ...
    return translator, representation, head_list

register_model("MyModel", my_factory)
agedi.functional.sample(diffusion: Agedi, *, n_samples: int, n_atoms: int | None = None, atomic_numbers: List[int] | None = None, formula: str | None = None, positions: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, template: agedi.data.AtomsGraph | ase.Atoms | None = None, confinement: Tuple[float, float] | None = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: agedi.diffusion.ForcefieldGuidanceConfig | None = None, property: Dict[str, float] | None = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True) List[agedi.data.AtomsGraph] | List[ase.Atoms] | List[List[agedi.data.AtomsGraph]] | List[List[ase.Atoms]]

Sample structures from a trained diffusion model.

Parameters:
  • diffusion – A trained Agedi model.

  • n_samples – Number of structures to generate.

  • n_atoms – Number of atoms per structure. Automatically determined from formula if provided, or from the length of atomic_numbers when n_atoms is not explicitly given.

  • atomic_numbers – Atomic numbers of the generated atoms. Not required when the model has a types-noiser or when formula is provided.

  • formula – Chemical formula (e.g. "H2O"). Used to derive n_atoms and atomic_numbers when they are not provided explicitly.

  • positions – Fixed positions of the atoms (shape (n_atoms, 3)). Required when no positions-noiser is configured (type-only diffusion). Positions will not be modified during sampling.

  • cell – Unit-cell matrix (3×3 array or flat length-9 array). Not required when template is provided (the template’s cell is used instead).

  • pbc – Periodic boundary conditions as a length-3 boolean array (e.g. [True, True, False]). When template is provided its pbc is used unless this argument is given explicitly. Defaults to [True, True, True] (fully periodic) when neither template nor pbc is supplied.

  • template – Template structure. May be an AtomsGraph or an ASE Atoms object; the latter is automatically converted to an AtomsGraph (with confinement applied when provided). When given, cell and pbc are taken from the template unless explicitly provided.

  • ff_guidance – Force-field guidance configuration. When None (default) a ForcefieldGuidanceConfig with default values is used (i.e. guidance is disabled).

  • compile – When True, use torch.compile on the reverse diffusion step for faster sampling. Before the sampling loop starts, the maximum number of neighbors and cell-list dimensions are estimated automatically via NVIDIA nvalchemiops (estimate_max_neighbors and estimate_cell_list_sizes), and all neighbor-list buffers are pre-allocated with fixed shapes. Requires NVIDIA nvalchemiops. Defaults to False.

  • print_timings – When True, print a per-stage timing breakdown at the end of each sampling batch (graph init, score model, denoise, neighbor list, etc.). Defaults to False.

agedi.functional.train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: lightning.Trainer | None = None, ckpt_path: str | pathlib.Path | None = None, **trainer_kwargs) lightning.Trainer

Train a diffusion model and return the trainer used.

Parameters:
  • diffusion – The diffusion model to train.

  • dataset – The dataset to train on.

  • trainer – A pre-configured Lightning Trainer. When None a new trainer is created from trainer_kwargs.

  • ckpt_path – Path to a Lightning checkpoint (.ckpt) to resume training from. When provided the full training state (model weights, optimiser, LR-scheduler, and epoch counter) is restored before fitting. Equivalent to passing ckpt_path to trainer.fit().

  • **trainer_kwargs – Additional keyword arguments forwarded to create_trainer() when trainer is None.

agedi.functional._build_type_map_from_data(data: Sequence[Atoms]) List[int]

Build a compact type map from the element types present in training data.

The map is [0, z1, z2, ...] where z1 < z2 < ... are the sorted unique atomic numbers found in data. Index 0 is reserved for the absorbing state.

Parameters:

data (Sequence[Atoms]) – List of ASE Atoms objects to inspect.

Returns:

A list where type_map[i] is the atomic number corresponding to compact index i (and type_map[0] == 0 for the absorbing state).

Return type:

List[int]

agedi.functional.train_from_atoms(*args, **kwargs)
agedi.functional.train_from_config(*args, **kwargs)