agedi.functional¶
Backward-compatibility shim.
All public symbols are now implemented in agedi.api.
This module re-exports them so that existing code using
from agedi.functional import X continues to work unchanged.
Functions¶
|
Create and setup an AGeDi Dataset from ASE Atoms objects. |
|
Create a diffusion model for script-based training and sampling. |
|
Create a Lightning trainer configured for AGeDi. |
|
Load a trained diffusion model from an AGeDi log directory. |
|
Predict energies and forces for input structures using a trained force-field. |
|
Register a custom score model backbone factory under name. |
|
Sample structures from a trained diffusion model. |
|
Train a diffusion model and return the trainer used. |
|
Build a compact type map from the element types present in training data. |
|
|
|
Module Contents¶
- agedi.functional.create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: float | int = 0.9, val_split: float | int = 0.1, mask: str = 'none', confinement: Tuple[float, float] | None = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: int | None = None, canonical_cell: bool = False, regressor_data: Sequence[ase.Atoms] | None = None, properties: List[Dict] | None = None) agedi.data.Dataset¶
Create and setup an AGeDi Dataset from ASE Atoms objects.
- Parameters:
data (Sequence[Atoms]) – ASE Atoms objects to add to the dataset.
cutoff (float, optional) – Neighbour-list cutoff radius in Ångström.
batch_size (int, optional) – Mini-batch size used during training/validation.
train_split (Union[float, int], optional) – Fraction or absolute number of samples for the training split.
val_split (Union[float, int], optional) – Fraction or absolute number of samples for the validation split.
mask (str, optional) – Atom-mask method (e.g.
"MaskFixed"or"none").confinement (Tuple[float, float], optional) – Z-axis confinement bounds
(z_min, z_max).conditioning (str, optional) – Name of the per-structure property to use as a conditioning signal. The value is read from
atoms.info[conditioning]or the correspondingatoms.get_<conditioning>()method. Ignored when set to"none"(default).conditioning_type (str, optional) –
"scalar"(default) or"node"; controls how the conditioning property is broadcast onto the graph.repeat (int, optional) – When given, augment the dataset by repeating each structure up to
repeattimes along the first two cell vectors.canonical_cell (bool, optional) – Store cells in canonical lower-triangular form.
regressor_data (Sequence[Atoms], optional) – Additional ASE Atoms objects used to train a regressor head.
properties (List[Dict], optional) – Per-structure property dictionaries; must contain exactly one entry per element in data. Each dictionary is merged into the corresponding graph object via
setattr, matching the layout accepted byadd_atoms_data(). Keys already produced by the conditioning logic are overwritten by values in properties when both are present.
- Returns:
A fully set-up
Datasetready for training.- Return type:
- agedi.functional.create_diffusion(model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[str | Noiser] = ('CellPositions',), sde: str | SDE = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Tuple[float, float] | None = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: str | torch.device | None = None, type_map: List[int] | None = None) agedi.Agedi¶
Create a diffusion model for script-based training and sampling.
- Parameters:
model (str, optional) – GNN backbone architecture. The name is looked up in the model registry; use
register_model()to add custom backends. The built-in default is"PaiNN"(SchNetPack PaiNN).cutoff (float, optional) – Neighbour-list cutoff radius in Å. Defaults to
6.0.feature_size (int, optional) – Embedding / feature dimension. Defaults to
64.n_blocks (int, optional) – Number of interaction blocks. Defaults to
4.n_rbf (int, optional) – Number of radial basis functions. Defaults to
30.noisers (Sequence[str or Noiser], optional) –
Noiser identifiers or instances to include. Defaults to
("CellPositions",). Recognised string identifiers (CamelCase preferred; snake_case aliases also accepted for backwards compatibility):"Positions"/"positions"–Positions(StandardNormal prior + Normal, for gas-phase clusters)."CellPositions"/"cell_positions"–CellPositions(UniformCell prior + Normal, for periodic bulk/surface systems)."ConfinedCellPositions"/"confined_cell_positions"–ConfinedCellPositions(UniformCellConfined prior + TruncatedNormal, for Z-confined systems)."Types"/"types"–Types.
sde (str or SDE, optional) – SDE for position noisers. Short aliases:
"ve"(default),"vp". Pass an instantiatedSDEfor full control.conditioning (str, optional) – Property to condition on, or
"none"for time-only conditioning. Defaults to"none".conditioning_type (str, optional) – Type of the conditioning module:
"scalar"or"integer". Defaults to"scalar".confinement (Tuple[float, float], optional) – Z-direction confinement bounds
(z_min, z_max)in Å.force_field (bool, optional) – When
True, attach adiffusion.regressor_model. The heads shares the same representation and translator as the score model so that atomic embeddings are learned jointly. It is trained whenever the training batch contains per-atom forces and total energies (i.e. the ASE training structures have DFT (or other) energy and forces). The trained forces head enables force-field guided sampling viaForcefieldGuidanceConfig. Defaults toFalse.lr (float, optional) – Learning rate. Defaults to
1e-4.lr_factor (float, optional) – LR-scheduler reduction factor. Defaults to
0.95.lr_patience (int, optional) – LR-scheduler patience (epochs). Defaults to
100.weight_decay (float, optional) – Optimizer weight-decay. Defaults to
0.0.eps (float, optional) – Minimum diffusion time. Defaults to
1e-5.guidance_weight (float, optional) – Classifier-free guidance weight. Defaults to
-1.0(disabled).device (str or torch.device, optional) – Target compute device. When
NoneCUDA is used if available, otherwise CPU.type_map (List[int], optional) – Compact type map for the
Typesnoiser.type_map[0]must be0(absorbing state) andtype_map[i]is the atomic number for compact indexi. When provided, theTypesnoiser and theTypesScorehead use a reduced vocabulary of sizelen(type_map)instead of the default 100. Auto-populated bytrain_from_atoms()when a"Types"noiser is requested.
- Returns:
A freshly initialised
Agedimodel.- Return type:
- agedi.functional.create_trainer(*, epochs: int = -1, max_time: int | Dict | datetime.timedelta | None = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: int | None = None, repeat_epoch: int | None = None, hparams: Dict | None = None, extra_callbacks: List[lightning.pytorch.callbacks.Callback] | None = None) lightning.Trainer¶
Create a Lightning trainer configured for AGeDi.
- Parameters:
epochs – Maximum number of training epochs (
-1= unlimited).max_time –
Wall-clock time limit for training. Accepts:
int– number of hours (e.g.24≡ 24 hours).dict– Lightning-style mapping, e.g.{"days": 0, "hours": 12, "minutes": 30, "seconds": 0}.datetime.timedelta– a Python timedelta object.None– no time limit.
accelerator – Hardware accelerator to use (e.g.
"auto","gpu","cpu"). Default:"auto".devices – Number of devices to train on. Default:
1.logger – Logging backend:
"tensorboard"(default) or"wandb".log_dir – Root directory for logs and checkpoints. Default:
"logs".project – WandB project name (only used when
logger="wandb").name – Experiment display name used by TensorBoard and WandB as the run sub-directory / run name. Default:
"agedi".log_interval – How often (in steps) to log metrics. Default:
10.gradient_clip_val – Maximum gradient norm for gradient clipping. Default:
10.0.progress_bar – Whether to show a Lightning progress bar. Default:
False.print_epoch_interval – Print a one-line training summary to stdout every this many epochs. Set to
0to disable. Default:10.log_grad_norm – Whether to log the total gradient norm during training. Disable for large models where the per-step overhead is undesirable. Default:
True.repeat – Number of repetition levels for cell-repeat data augmentation. Must be set together with repeat_epoch. When
None(default), no repetition augmentation is applied.repeat_epoch – How many epochs between repetition-level increases. Required when repeat is set.
hparams – Hyperparameters dict logged to
hparams.yamlviaHParamsMetricLogger. WhenNone(default), no extra hyperparameter logging is performed.extra_callbacks – Extra Lightning callbacks to append to the default callback list. When
None(default) only the built-in callbacks are used.
- Returns:
A configured
Trainerready to calltrainer.fit(diffusion, dataset).- Return type:
lightning.Trainer
- agedi.functional.load_diffusion(path: str | pathlib.Path, checkpoint: str | pathlib.Path | None = None, device: str | torch.device | None = None) Agedi¶
Load a trained diffusion model from an AGeDi log directory.
The model architecture is fully reconstructed from the Hydra-compatible
diffusionconfig stored inhparams.yaml, so no additional parameters are needed.- Parameters:
path – Path to the AGeDi log / model directory (or directly to the
hparams.yamlfile).checkpoint – Path to a specific checkpoint file. When
Nonethe latest checkpoint (checkpoints/last_model.ckpt) is loaded automatically.device – Device to load the model onto. When
NoneCUDA is used if available, otherwise CPU.
- agedi.functional.predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: float | None = None) List[ase.Atoms]¶
Predict energies and forces for input structures using a trained force-field.
The model must have been trained with
force_field=True(i.e. it must have aregressor_modelattached). The predicted energy and forces are attached to the returnedAtomsobjects via anSinglePointCalculator.- Parameters:
diffusion – A trained
Agedimodel with a force-field regressor (trained with--force_field).structures – Input ASE
Atomsobjects to run predictions on.batch_size – Number of structures per inference batch. Defaults to
64.cutoff – Neighbour-list cutoff in Å. When
None(default), the cutoff is read from the model’s representation automatically.
- Returns:
The input structures with a
SinglePointCalculatorattached containing the predicted energy and/or forces.- Return type:
List[Atoms]
- Raises:
ValueError – If the model does not have a force-field regressor.
- agedi.functional.register_model(name: str, factory: Callable) None¶
Register a custom score model backbone factory under name.
The factory is called with the keyword arguments
cutoff,heads,feature_size,n_blocks,head_dim, andn_rbfand must return a 3-tuple(translator, representation, List[Head]).Registered models can be selected by passing
model=nametocreate_diffusion().- Parameters:
name (str) – Alias used to select this backend (e.g.
"PaiNN").factory (Callable) –
Factory function with signature:
factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf) -> Tuple[Translator, nn.Module, List[Head]]
Examples
from agedi.functional import register_model def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf): ... return translator, representation, head_list register_model("MyModel", my_factory)
- agedi.functional.sample(diffusion: Agedi, *, n_samples: int, n_atoms: int | None = None, atomic_numbers: List[int] | None = None, formula: str | None = None, positions: numpy.ndarray | None = None, cell: numpy.ndarray | None = None, pbc: numpy.ndarray | None = None, template: agedi.data.AtomsGraph | ase.Atoms | None = None, confinement: Tuple[float, float] | None = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: agedi.diffusion.ForcefieldGuidanceConfig | None = None, property: Dict[str, float] | None = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True) List[agedi.data.AtomsGraph] | List[ase.Atoms] | List[List[agedi.data.AtomsGraph]] | List[List[ase.Atoms]]¶
Sample structures from a trained diffusion model.
- Parameters:
diffusion – A trained
Agedimodel.n_samples – Number of structures to generate.
n_atoms – Number of atoms per structure. Automatically determined from
formulaif provided, or from the length ofatomic_numberswhenn_atomsis not explicitly given.atomic_numbers – Atomic numbers of the generated atoms. Not required when the model has a types-noiser or when
formulais provided.formula – Chemical formula (e.g.
"H2O"). Used to deriven_atomsandatomic_numberswhen they are not provided explicitly.positions – Fixed positions of the atoms (shape
(n_atoms, 3)). Required when no positions-noiser is configured (type-only diffusion). Positions will not be modified during sampling.cell – Unit-cell matrix (3×3 array or flat length-9 array). Not required when
templateis provided (the template’s cell is used instead).pbc – Periodic boundary conditions as a length-3 boolean array (e.g.
[True, True, False]). Whentemplateis provided itspbcis used unless this argument is given explicitly. Defaults to[True, True, True](fully periodic) when neithertemplatenorpbcis supplied.template – Template structure. May be an
AtomsGraphor an ASEAtomsobject; the latter is automatically converted to anAtomsGraph(withconfinementapplied when provided). When given,cellandpbcare taken from the template unless explicitly provided.ff_guidance – Force-field guidance configuration. When
None(default) aForcefieldGuidanceConfigwith default values is used (i.e. guidance is disabled).compile – When
True, usetorch.compileon the reverse diffusion step for faster sampling. Before the sampling loop starts, the maximum number of neighbors and cell-list dimensions are estimated automatically via NVIDIA nvalchemiops (estimate_max_neighborsandestimate_cell_list_sizes), and all neighbor-list buffers are pre-allocated with fixed shapes. Requires NVIDIA nvalchemiops. Defaults toFalse.print_timings – When
True, print a per-stage timing breakdown at the end of each sampling batch (graph init, score model, denoise, neighbor list, etc.). Defaults toFalse.
- agedi.functional.train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: lightning.Trainer | None = None, ckpt_path: str | pathlib.Path | None = None, **trainer_kwargs) lightning.Trainer¶
Train a diffusion model and return the trainer used.
- Parameters:
diffusion – The diffusion model to train.
dataset – The dataset to train on.
trainer – A pre-configured Lightning
Trainer. WhenNonea new trainer is created from trainer_kwargs.ckpt_path – Path to a Lightning checkpoint (
.ckpt) to resume training from. When provided the full training state (model weights, optimiser, LR-scheduler, and epoch counter) is restored before fitting. Equivalent to passingckpt_pathtotrainer.fit().**trainer_kwargs – Additional keyword arguments forwarded to
create_trainer()when trainer isNone.
- agedi.functional._build_type_map_from_data(data: Sequence[Atoms]) List[int]¶
Build a compact type map from the element types present in training data.
The map is
[0, z1, z2, ...]wherez1 < z2 < ...are the sorted unique atomic numbers found in data. Index 0 is reserved for the absorbing state.- Parameters:
data (Sequence[Atoms]) – List of ASE
Atomsobjects to inspect.- Returns:
A list where
type_map[i]is the atomic number corresponding to compact indexi(andtype_map[0] == 0for the absorbing state).- Return type:
List[int]
- agedi.functional.train_from_atoms(*args, **kwargs)¶
- agedi.functional.train_from_config(*args, **kwargs)¶