Concepts and model behavior¶
Graph representation¶
AGeDi uses AtomsGraph as the main data object:
Nodes: atomic numbers (
x) and positions (pos)Edges: neighbor graph from periodic cutoff
Graph-level data: cell, pbc, optional confinement
Optional mask marks fixed atoms during diffusion updates
Diffusion components¶
Agedi combines:
A score model (predicts scores for configured targets)
One or more noisers (e.g., positions, types)
Optimizer/scheduler configuration for Lightning training
Supported score/noiser pairing is enforced by key matching.
Position noisers¶
Three position noisers are available, each with a fixed prior and noise distribution baked in. Choose based on the physics of your system:
Class / identifier |
Prior |
Distribution |
Use case |
|---|---|---|---|
|
Gas-phase (molecules, clusters) |
||
|
Periodic bulk / surface (default) |
||
|
Surface overlayer/adsorbate |
The prior is the distribution used to initialise atomic positions at the
start of the reverse (sampling) process. The distribution is the noise
kernel applied during the forward (training) process. The SDE can still be
chosen freely on all three classes (default: Variance-Exploding, "ve").
Discrete atom types can be diffused by adding a
Types to the noiser list.
Sampling semantics¶
During sampling, required defaults depend on enabled noisers:
n_atomscan come from explicit input,atomic_numbers, orformulaatomic_numbersare needed if type noising is not enabled and formula is not providedpositionsare needed if position noising is not enabledcellis needed unless a template provides it
If a template is provided, generated atoms are appended to template atoms and template atoms are masked as fixed.
Training outputs¶
By default, training writes to logs/version_x:
hparams.yaml: run hyperparameters and data metadatacheckpoints/: model checkpoints
load_diffusion reconstructs the model from these artifacts.
Property conditioning¶
The score model can be conditioned on a per-structure scalar or integer
property so that sampling can be steered towards a target value (e.g.
formation energy or band gap). Use the conditioning parameter (CLI:
--conditioning) to specify the property name and
conditioning_type (CLI: --conditioning_type) to choose between
"scalar" (continuous, default) and "integer" (discrete) encoding.
The property value is looked up from atoms.info[conditioning] or
atoms.get_<conditioning>() for each training structure. At sampling
time pass the target value in the property dict:
structures = sample(diffusion, n_samples=10, formula="Pd4O4",
property={"energy": -3.5})
Data augmentation (cell repeat)¶
For periodic systems it can be beneficial to augment the training data by
tiling each structure along the first two cell vectors. Enable this with
repeat (CLI: --repeat) and set the epoch interval at which the
repetition level increases with repeat_epoch (CLI: --repeat_epoch).
For example, repeat=3, repeat_epoch=50 starts training on the original
cells, increases to 2×2×1 at epoch 50, then to 3×3×1 at epoch 100.