Command-line interface¶
AGeDi installs a CLI entrypoint named agedi.
Discover commands¶
agedi --help
Main commands¶
agedi train: train a diffusion model from a trajectory file or YAML configagedi sample: sample structures from a saved training runagedi predict: predict energies and forces for input structures (requires--force_fieldtraining)agedi inspect: printhparams.yamlfrom a run directory
To get information about options for each use
agedi train --help
for train and likewise for sample, predict, and inspect.
Training¶
Choose one of the three position noisers to match your system type:
Noiser |
Prior |
Distribution |
Use case |
|---|---|---|---|
|
StandardNormal |
Normal |
Gas-phase (molecules, clusters) |
|
UniformCell |
Normal |
Periodic bulk / surface (default) |
|
UniformCellConfined |
TruncatedNormal |
Surface overlayer/adsorbate |
Minimal training example for a surface system with Z-confinement:
agedi train --noisers ConfinedCellPositions --mask MaskFixed --confinement 2 10 training_data.traj
Minimal training example for a periodic bulk or surface system:
agedi train --noisers CellPositions training_data.traj
Minimal training example for a periodic bulk system with atomic types diffusion:
agedi train --noisers CellPositions,Types training_data.traj
Minimal training example for a gas-phase cluster:
agedi train --noisers Positions training_data.traj
Important options:
--max_time_minutes/-tor--epochs/-e: stopping criteria (use-Tto specify time in hours instead of minutes)--noisers:CellPositions(default),ConfinedCellPositions,Positions,Types. Accepts a comma-separated list to specify multiple noisers in one flag (e.g.--noisers ConfinedCellPositions,Types), or repeat the flag (e.g.--noisers ConfinedCellPositions --noisers Types).--sde:ve(default),vp--mask MaskFixed: freezes atoms tagged with ASEFixAtoms--confinement zmin zmax: z-direction confinement bounds (required forConfinedCellPositions)--n_classes N: restrict theTypesnoiser vocabulary to the first N element types (sorted by atomic number); defaults to all distinct types found in the training data--canonical_cell: store unit cells in canonical lower-triangular form--force_field: train a force-field head jointly with the diffusion score (see below)
Continue training from a checkpoint¶
To resume an interrupted run or continue fine-tuning on new data, pass
--checkpoint with either a run directory or a specific .ckpt file:
# Resume the last checkpoint of a previous run (same data)
agedi train training_data.traj --checkpoint logs/agedi/version_0
# Resume from a specific checkpoint file
agedi train training_data.traj --checkpoint logs/agedi/version_0/checkpoints/best_model.ckpt
# Fine-tune on new data starting from a previous checkpoint
agedi train new_data.traj --checkpoint logs/agedi/version_0
In all cases the model architecture and weights are loaded from the checkpoint,
and the full training state (optimiser, LR-scheduler, epoch counter) is
restored. Combine with --epochs or --max_time to control how long
the continued run should train.
When using a config file, set the checkpoint key:
data_path: training_data.traj
checkpoint: logs/agedi/version_0 # or a .ckpt file path
From Python:
from agedi import train_from_atoms
diffusion, dataset, trainer = train_from_atoms(
data,
checkpoint="logs/agedi/version_0",
epochs=100,
)
Sampling¶
agedi sample logs/agedi/version_0 -f Pd2O2 --template_path template.traj --steps 500 --confinement 2 10
This samples using the last_model.ckpt checkpoint found in
logs/agedi/version_0. If you want to use a different checkpoint, you can
specify the exact path to it.
Important options:
-f/--formulaor-a/--n_atoms--template_pathfor template-guided generation--steps,--epsfor reverse diffusion resolution--save_trajectory: save the full reverse-diffusion trajectory for each sample (one file per sample rather than only the final structures)--print_timings: print a per-stage timing breakdown after each sampling batch (useful for profiling GPU bottlenecks)--compile: compile the reverse-diffusion step withtorch.compilefor faster GPU sampling; neighbor-list buffer sizes are estimated automatically (requires NVIDIA nvalchemiops)
Force-field guided training and sampling¶
To also train a forces prediction head alongside the diffusion model, add the
--force_field flag during training:
agedi train --noisers ConfinedCellPositions --mask MaskFixed --confinement 2 10 --force_field training_data.traj
The training data must contain DFT (or other source) per-atom forces and total energy (e.g. loaded from a VASP/GPAW calculation via ASE). The force field is trained jointly with the diffusion score.
Regressor-only dataset
You can optionally supply a second dataset that is used exclusively to train the force-field head — its structures are never passed through the diffusion loss. This is useful when you have non-equilibrium structures (e.g. from MD or NEB calculations) that would be unsuitable as diffusion training targets but contain valuable force/energy information for the regressor:
agedi train --noisers ConfinedCellPositions --mask MaskFixed --confinement 2 10 --force_field training_data.traj
and in train.yaml:
data_path: training_data.traj
force_field: true
regressor_data_path: nonequilibrium_data.traj
Or from Python:
from agedi import train_from_atoms
diffusion, dataset, trainer = train_from_atoms(
equilibrium_structures,
force_field=True,
regressor_data=nonequilibrium_structures,
)
Once training is complete, force-field guidance can be used during sampling
via the --ff_guidance option:
agedi sample logs/agedi/version_0 -f Pd2O2 --ff_guidance 5.0
--ff_guidance: guidance scale (0= disabled,> 0enables guidance). Higher values increase the influence of the predicted forces on the generated structures.--ff_zeta: time-weight exponent (default3.0). Higher values concentrate guidance near the end of the reverse trajectory.
In Python this is equivalent to:
from agedi.functional import load_diffusion, sample
from agedi.diffusion import ForcefieldGuidanceConfig
diffusion = load_diffusion("logs/agedi/version_0")
structures = sample(
diffusion,
n_samples=10,
formula="Pd2O2",
ff_guidance=ForcefieldGuidanceConfig(
guidance=5.0,
zeta=3.0,
force_threshold=0.05, # max per-atom force (eV/Å) for post-diffusion relaxation
max_extra_steps=0, # number of extra relaxation steps after the trajectory
),
)
ForcefieldGuidanceConfig fields:
guidance(float): guidance scale;0.0disables guidance entirely.zeta(float): time-weight exponent(1-t)**zeta; default3.0.force_threshold(float): convergence criterion (max per-atom force in eV/Å) for the optional post-diffusion relaxation; default0.05.max_extra_steps(int): maximum extra relaxation steps performed after the main diffusion trajectory whenguidance > 0; default0(disabled).
Predicting energies and forces¶
When the model has been trained with --force_field, you can run energy
and force predictions on existing structures with agedi predict:
agedi predict logs/agedi/version_0 structures.traj
This reads all structures from structures.traj, runs the force-field
regressor, and writes the results (with predicted energies and forces
attached as an ASE SinglePointCalculator) to predicted.traj in
the current directory.
Important options:
-o/--output: directory to save the output file (default:.)--name: base name for the output file (default:predicted)-b/--batch_size: number of structures per inference batch (default:64)
In Python this is equivalent to:
from ase.io import read, write
from agedi import load_diffusion, predict
diffusion = load_diffusion("logs/agedi/version_0")
structures = read("structures.traj", index=":")
predicted = predict(diffusion, structures)
write("predicted.traj", predicted)
Inspect run metadata¶
agedi inspect logs/agedi/version_0
This prints the saved hyperparameters from the run directory (for example, the parsed contents of hparams.yaml).
Logging options¶
AGeDi saves TensorBoard logs by default. WandB can be saved instead
using the --logger wandb option when training.
To follow training use
tensorboard --logdir .
This hosts TensorBoard at localhost. Remember to forward a specific
port to your local machine if using HPC. You can use the --port
xxxx option for TensorBoard to host at this specific port.