agedi.functional¶

Backward-compatibility shim.

All public symbols are now implemented in agedi.api. This module re-exports them so that existing code using from agedi.functional import X continues to work unchanged.

Functions¶

`create_dataset`(→ agedi.data.Dataset)	Create and setup an AGeDi Dataset from ASE Atoms objects.
`create_diffusion`(, sde, SDE, None] = None, ...)	Create a diffusion model for script-based training and sampling.
`create_trainer`(→ lightning.Trainer)	Create a Lightning trainer configured for AGeDi.
`load_diffusion`(→ Agedi)	Load a trained diffusion model from an AGeDi log directory.
`predict`(→ List[ase.Atoms])	Predict energies and forces for input structures using a trained force-field.
`register_model`(→ None)	Register a custom score model backbone factory under name.
`sample`(→ Union[List[agedi.data.AtomsGraph], ...)	Sample structures from a trained diffusion model.
`train`(→ lightning.Trainer)	Train a diffusion model and return the trainer used.
`_build_type_map_from_data`(→ List[int])	Build a compact type map from the element types present in training data.
`train_from_atoms`(args, *kwargs)
`train_from_config`(args, *kwargs)

Module Contents¶

agedi.functional.create_dataset(data: Sequence[ase.Atoms], cutoff: float | None = None, batch_size: int = 64, train_split: float | int = 0.9, val_split: float | int = 0.1, mask: str = 'none', confinement: Tuple[float, float] | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: int | None = None, canonical_cell: bool = False, regressor_data: Sequence[ase.Atoms] | None = None, properties: List[Dict] | None = None, fully_connected: bool = False) → agedi.data.Dataset¶

Create and setup an AGeDi Dataset from ASE Atoms objects.

Parameters:

data (Sequence[Atoms]) – ASE Atoms objects to add to the dataset.
cutoff (float, optional) – Neighbour-list cutoff radius in Ångström.
batch_size (int, optional) – Mini-batch size used during training/validation.
train_split (Union[float, int], optional) – Fraction or absolute number of samples for the training split.
val_split (Union[float, int], optional) – Fraction or absolute number of samples for the validation split.
mask (str, optional) – Atom-mask method (e.g. "MaskFixed" or "none").
confinement (Tuple[float, float], optional) – Z-axis confinement bounds (z_min, z_max).
conditioning (str, optional) – Name of the per-structure property to use as a conditioning signal. The value is read from atoms.info[conditioning] or the corresponding atoms.get_<conditioning>() method. Ignored when set to "none" (default).
conditioning_type (str, optional) – "scalar" (default) or "node"; controls how the conditioning property is broadcast onto the graph.
repeat (int, optional) – When given, augment the dataset by repeating each structure up to repeat times along the first two cell vectors.
canonical_cell (bool, optional) – Store cells in canonical lower-triangular form.
regressor_data (Sequence[Atoms], optional) – Additional ASE Atoms objects used to train a regressor head.
properties (List[Dict], optional) – Per-structure property dictionaries; must contain exactly one entry per element in data. Each dictionary is merged into the corresponding graph object via setattr, matching the layout accepted by add_atoms_data(). Keys already produced by the conditioning logic are overwritten by values in properties when both are present.

Returns:

A fully set-up Dataset ready for training.

Return type:

Dataset

agedi.functional.create_diffusion(model: str = 'PaiNN', cutoff: float | None = None, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[str | Noiser] = ('CellPositions',), sde: str | SDE | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Tuple[float, float] | None = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: str | torch.device | None = None, type_map: List[int] | None = None, prediction_type: str = 'score', sampler: str = 'em', loss_weighting: str = 'uniform', fully_connected: bool = False) → agedi.Agedi¶

Create a diffusion model for script-based training and sampling.

Parameters:

model (str, optional) – GNN backbone architecture. The name is looked up in the model registry; use register_model() to add custom backends. The built-in default is "PaiNN" (SchNetPack PaiNN).
cutoff (float, optional) – Neighbour-list cutoff radius in Å. Defaults to 6.0.
feature_size (int, optional) – Embedding / feature dimension. Defaults to 64.
n_blocks (int, optional) – Number of interaction blocks. Defaults to 4.
n_rbf (int, optional) – Number of radial basis functions. Defaults to 30.
noisers (Sequence[str or Noiser], optional) –
Noiser identifiers or instances to include. Defaults to ("CellPositions",). Recognised string identifiers (CamelCase preferred; snake_case aliases also accepted for backwards compatibility):
- "Positions" / "positions" – Positions (StandardNormal prior + Normal, for gas-phase clusters).
- "CellPositions" / "cell_positions" – CellPositions (UniformCell prior + Normal, for periodic bulk/surface systems).
- "ConfinedCellPositions" / "confined_cell_positions" – ConfinedCellPositions (UniformCellConfined prior + TruncatedNormal, for Z-confined systems).
- "Types" / "types" – Types.
sde (str or SDE, optional) – SDE for position noisers. Short aliases: "ve" (default), "vp". Pass an instantiated SDE for full control.
conditioning (str, optional) – Property to condition on, or "none" for time-only conditioning. Defaults to "none".
conditioning_type (str, optional) – Type of the conditioning module: "scalar" or "integer". Defaults to "scalar".
confinement (Tuple[float, float], optional) – Z-direction confinement bounds (z_min, z_max) in Å.
force_field (bool, optional) – When True, attach a diffusion.regressor_model. The heads shares the same representation and translator as the score model so that atomic embeddings are learned jointly. It is trained whenever the training batch contains per-atom forces and total energies (i.e. the ASE training structures have DFT (or other) energy and forces). The trained forces head enables force-field guided sampling via ForcefieldGuidanceConfig. Defaults to False.
lr (float, optional) – Learning rate. Defaults to 1e-4.
lr_factor (float, optional) – LR-scheduler reduction factor. Defaults to 0.95.
lr_patience (int, optional) – LR-scheduler patience (epochs). Defaults to 100.
weight_decay (float, optional) – Optimizer weight-decay. Defaults to 0.0.
eps (float, optional) – Minimum diffusion time. Defaults to 1e-5.
guidance_weight (float, optional) – Classifier-free guidance weight. Defaults to -1.0 (disabled).
device (str or torch.device, optional) – Target compute device. When None CUDA is used if available, otherwise CPU.
type_map (List[int], optional) – Compact type map for the Types noiser. type_map[0] must be 0 (absorbing state) and type_map[i] is the atomic number for compact index i. When provided, the Types noiser and the TypesScore head use a reduced vocabulary of size len(type_map) instead of the default 100. Auto-populated by train_from_atoms() when a "Types" noiser is requested.

Returns:

A freshly initialised Agedi model.

Return type:

Agedi

agedi.functional.create_trainer(*, epochs: int = -1, max_time: int | Dict | datetime.timedelta | None = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: int | None = None, repeat_epoch: int | None = None, hparams: Dict | None = None, extra_callbacks: List[lightning.pytorch.callbacks.Callback] | None = None) → lightning.Trainer¶

Create a Lightning trainer configured for AGeDi.

Parameters:

epochs – Maximum number of training epochs (-1 = unlimited).
max_time –
Wall-clock time limit for training. Accepts:
- int – number of hours (e.g. 24 ≡ 24 hours).
- dict – Lightning-style mapping, e.g. {"days": 0, "hours": 12, "minutes": 30, "seconds": 0}.
- datetime.timedelta – a Python timedelta object.
- None – no time limit.
accelerator – Hardware accelerator to use (e.g. "auto", "gpu", "cpu"). Default: "auto".
devices – Number of devices to train on. Default: 1.
logger – Logging backend: "tensorboard" (default) or "wandb".
log_dir – Root directory for logs and checkpoints. Default: "logs".
project – WandB project name (only used when logger="wandb").
name – Experiment display name used by TensorBoard and WandB as the run sub-directory / run name. Default: "agedi".
log_interval – How often (in steps) to log metrics. Default: 10.
gradient_clip_val – Maximum gradient norm for gradient clipping. Default: 10.0.
progress_bar – Whether to show a Lightning progress bar. Default: False.
print_epoch_interval – Print a one-line training summary to stdout every this many epochs. Set to 0 to disable. Default: 10.
log_grad_norm – Whether to log the total gradient norm during training. Disable for large models where the per-step overhead is undesirable. Default: True.
repeat – Number of repetition levels for cell-repeat data augmentation. Must be set together with repeat_epoch. When None (default), no repetition augmentation is applied.
repeat_epoch – How many epochs between repetition-level increases. Required when repeat is set.
hparams – Hyperparameters dict logged to hparams.yaml via HParamsMetricLogger. When None (default), no extra hyperparameter logging is performed.
extra_callbacks – Extra Lightning callbacks to append to the default callback list. When None (default) only the built-in callbacks are used.

Returns:

A configured Trainer ready to call trainer.fit(diffusion, dataset).

Return type:

lightning.Trainer

Load a trained diffusion model from an AGeDi log directory.

The model architecture is fully reconstructed from the Hydra-compatible diffusion config stored in hparams.yaml, so no additional parameters are needed.

Parameters:

path – Path to the AGeDi log / model directory (or directly to the hparams.yaml file).
checkpoint – Path to a specific checkpoint file. When None the latest checkpoint (checkpoints/last_model.ckpt) is loaded automatically.
device – Device to load the model onto. When None CUDA is used if available, otherwise CPU.

agedi.functional.predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: float | None = None) → List[ase.Atoms]¶

Predict energies and forces for input structures using a trained force-field.

The model must have been trained with force_field=True (i.e. it must have a regressor_model attached). The predicted energy and forces are attached to the returned Atoms objects via an SinglePointCalculator.

Parameters:

diffusion – A trained Agedi model with a force-field regressor (trained with --force_field).
structures – Input ASE Atoms objects to run predictions on.
batch_size – Number of structures per inference batch. Defaults to 64.
cutoff – Neighbour-list cutoff in Å. When None (default), the cutoff is read from the model’s representation automatically.

Returns:

The input structures with a SinglePointCalculator attached containing the predicted energy and/or forces.

Return type:

List[Atoms]

Raises:

ValueError – If the model does not have a force-field regressor.

agedi.functional.register_model(name: str, factory: Callable) → None¶

The factory is called with the keyword arguments cutoff, heads, feature_size, n_blocks, head_dim, and n_rbf and must return a 3-tuple (translator, representation, List[Head]).

Registered models can be selected by passing model=name to create_diffusion().

Parameters:

name (str) – Alias used to select this backend (e.g. "PaiNN").

factory (Callable) –

Factory function with signature:

factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf)
    -> Tuple[Translator, nn.Module, List[Head]]

Examples

from agedi.functional import register_model

def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf):
    ...
    return translator, representation, head_list

register_model("MyModel", my_factory)

agedi.functional.sample(diffusion: Agedi, *, n_samples: int, n_atoms: int | None = None, atomic_numbers: List[int] | None = None, formula: str | None = None, positions: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, template: agedi.data.AtomsGraph | ase.Atoms | None = None, confinement: Tuple[float, float] | None = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: agedi.diffusion.ForcefieldGuidanceConfig | None = None, property: Dict[str, float] | None = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True, sampler=None, sampler_kwargs=None) → List[agedi.data.AtomsGraph] | List[ase.Atoms] | List[List[agedi.data.AtomsGraph]] | List[List[ase.Atoms]]¶

Sample structures from a trained diffusion model.

Parameters:

diffusion – A trained Agedi model.
n_samples – Number of structures to generate.
n_atoms – Number of atoms per structure. Automatically determined from formula if provided, or from the length of atomic_numbers when n_atoms is not explicitly given.
atomic_numbers – Atomic numbers of the generated atoms. Not required when the model has a types-noiser or when formula is provided.
formula – Chemical formula (e.g. "H2O"). Used to derive n_atoms and atomic_numbers when they are not provided explicitly.
positions – Fixed positions of the atoms (shape (n_atoms, 3)). Required when no positions-noiser is configured (type-only diffusion). Positions will not be modified during sampling.
cell – Unit-cell matrix (3×3 array or flat length-9 array). Not required when template is provided (the template’s cell is used instead).
pbc – Periodic boundary conditions as a length-3 boolean array (e.g. [True, True, False]). When template is provided its pbc is used unless this argument is given explicitly. Defaults to [True, True, True] (fully periodic) when neither template nor pbc is supplied.
template – Template structure. May be an AtomsGraph or an ASE Atoms object; the latter is automatically converted to an AtomsGraph (with confinement applied when provided). When given, cell and pbc are taken from the template unless explicitly provided.
ff_guidance – Force-field guidance configuration. When None (default) a ForcefieldGuidanceConfig with default values is used (i.e. guidance is disabled).
compile – When True, use torch.compile on the reverse diffusion step for faster sampling. Before the sampling loop starts, the maximum number of neighbors and cell-list dimensions are estimated automatically via NVIDIA nvalchemiops (estimate_max_neighbors and estimate_cell_list_sizes), and all neighbor-list buffers are pre-allocated with fixed shapes. Requires NVIDIA nvalchemiops. Defaults to False.
print_timings – When True, print a per-stage timing breakdown at the end of each sampling batch (graph init, score model, denoise, neighbor list, etc.). Defaults to False.

agedi.functional.train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: lightning.Trainer | None = None, ckpt_path: str | pathlib.Path | None = None, **trainer_kwargs) → lightning.Trainer¶

Train a diffusion model and return the trainer used.

Parameters:

diffusion – The diffusion model to train.
dataset – The dataset to train on.
trainer – A pre-configured Lightning Trainer. When None a new trainer is created from trainer_kwargs.
ckpt_path – Path to a Lightning checkpoint (.ckpt) to resume training from. When provided the full training state (model weights, optimiser, LR-scheduler, and epoch counter) is restored before fitting. Equivalent to passing ckpt_path to trainer.fit().
**trainer_kwargs – Additional keyword arguments forwarded to create_trainer() when trainer is None.

agedi.functional._build_type_map_from_data(data: Sequence[Atoms]) → List[int]¶

Build a compact type map from the element types present in training data.

The map is [0, z1, z2, ...] where z1 < z2 < ... are the sorted unique atomic numbers found in data. Index 0 is reserved for the absorbing state.

Parameters:: data (Sequence[Atoms]) – List of ASE Atoms objects to inspect.
Returns:: A list where type_map[i] is the atomic number corresponding to compact index i (and type_map[0] == 0 for the absorbing state).
Return type:: List[int]

agedi.functional.train_from_atoms(*args, **kwargs)¶

agedi.functional.train_from_config(*args, **kwargs)¶