Python API workflow

This page shows the script-based workflow using functions from agedi.functional, re-exported at the top-level agedi package. Using the functional API allows for more customisation than relying on the CLI.

Position noisers

Choose the noiser that matches your system type:

Position noisers

Noiser string / class

Prior

Distribution

Use case

"Positions" / Positions

StandardNormal

Normal

Gas-phase (molecules, clusters)

"CellPositions" / CellPositions

UniformCell

Normal

Periodic bulk / surface (default)

"ConfinedCellPositions" / ConfinedCellPositions

UniformCellConfined

TruncatedNormal

Surface overlayer/adsorbate

Training

Here we show the same example as with the CLI, using train_from_atoms().

from ase.io import read
from agedi import train_from_atoms

data = read("training_data.traj", ":")

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    max_time=2,  # hours
    log_dir="logs",
)

Force-field training with a regressor dataset

To train a force-field head alongside the diffusion model, pass force_field=True. You can additionally supply a separate regressor_data sequence of Atoms objects that will be used only to train the force-field head (not the diffusion score). This is useful for non-equilibrium structures that carry informative forces but would be unsuitable as diffusion training targets:

from ase.io import read
from agedi import train_from_atoms

equilibrium = read("training_data.traj", ":")
nonequilibrium = read("nonequilibrium.traj", ":")

diffusion, dataset, trainer = train_from_atoms(
    equilibrium,
    force_field=True,
    regressor_data=nonequilibrium,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
)

Using create_dataset() directly:

from ase.io import read
from agedi import create_dataset

dataset = create_dataset(
    read("training_data.traj", ":"),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    regressor_data=read("nonequilibrium.traj", ":"),
)

More detailed workflow

Here we show a more detailed example setting up the diffusion model, the dataset and the trainer individually.

from ase.io import read
from agedi import create_diffusion, create_dataset, create_trainer, train

data = read("training_data.traj", ":")

diffusion = create_diffusion(
    noisers=("ConfinedCellPositions",),
)

dataset = create_dataset(
    data,
    mask="MaskFixed",
    confinement=(2.0, 10.0)
)

trainer = create_trainer(
    max_time=2,  # hours
    log_dir="logs"
)

train(diffusion, dataset, trainer=trainer)

Sampling with template

To sample from a trained model:

from ase.io import read, write
from agedi import load_diffusion, sample, AtomsGraph

diffusion = load_diffusion("logs/agedi/version_0")

template = AtomsGraph.from_atoms(read("template.traj"), confinement=(2.0, 10.0))

structures = sample(
    diffusion,
    n_samples=12,
    formula="X2Y3",
    template=template,
    confinement=(2.0, 10.0),
    steps=500,
)

write("sampled.traj", structures)

Similar to the CLI, this samples using the last_model.ckpt checkpoint found in logs/agedi/version_0. If you want to use a different checkpoint, you can specify the exact path to it when calling load_diffusion().

Force-field training and prediction

To train a forces prediction head alongside the diffusion model, pass force_field=True to train_from_atoms(). The training data must include per-atom forces and total energy (e.g. from a DFT calculation loaded via ASE):

from ase.io import read
from agedi import train_from_atoms

data = read("training_data.traj", ":")  # must contain forces and energy

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    force_field=True,
    max_time=2,
)

Once trained, use predict() to run energy and force predictions on existing structures. The results are returned as ASE Atoms objects with a SinglePointCalculator attached:

from ase.io import read, write
from agedi import load_diffusion, predict

diffusion = load_diffusion("logs/agedi/version_0")

structures = read("structures.traj", index=":")
predicted = predict(diffusion, structures)

# Access predictions on the first structure
print(predicted[0].get_potential_energy())  # eV
print(predicted[0].get_forces())            # eV/Å

write("predicted.traj", predicted)

Core public functions

Custom model backends

AGeDi ships with the "PaiNN" SchNetPack backend. You can register your own GNN backbone via register_model():

from agedi import register_model

def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf):
    # Build and return (translator, representation, head_list)
    ...

register_model("MyModel", my_factory)

# Then use it in create_diffusion / train_from_atoms:
diffusion = create_diffusion(model="MyModel", ...)

Additional sampling options

The sample() function supports several advanced options beyond the basic n_samples / formula / steps arguments:

  • compile=True — compile the reverse-diffusion step with torch.compile for faster GPU sampling. Requires NVIDIA nvalchemiops. Neighbor-list buffer sizes are estimated automatically before the sampling loop.

  • save_trajectory=True — return a list of per-sample diffusion trajectories (one list of AtomsGraph / ASE Atoms per sample) instead of only the final structures.

  • print_timings=True — print a per-stage timing breakdown after each sampling batch (graph init, score model, denoise step, neighbor list, etc.). Useful for profiling.

  • property — pass a dict of property values to condition sampling on (requires the model to have been trained with conditioning).

from agedi import load_diffusion, sample

diffusion = load_diffusion("logs/agedi/version_0")

# Compiled, 500 steps, save full trajectories
trajectories = sample(
    diffusion,
    n_samples=4,
    formula="Pd4O4",
    steps=500,
    compile=True,
    save_trajectory=True,
    print_timings=True,
)
# trajectories[i] is the full reverse-diffusion path for sample i

Property conditioning

Models can optionally be conditioned on a scalar or integer per-structure property (e.g. formation energy, band gap, or total magnetisation). Enable conditioning at training time and then supply the target value at sampling time.

Training with conditioning:

from agedi import train_from_atoms

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("CellPositions",),
    conditioning="energy",        # key in atoms.info or atoms.get_energy()
    conditioning_type="scalar",   # "scalar" (default) or "integer"
)

Sampling with a conditioning value:

from agedi import load_diffusion, sample

diffusion = load_diffusion("logs/agedi/version_0")

structures = sample(
    diffusion,
    n_samples=10,
    formula="Pd4O4",
    property={"energy": -3.5},   # target value for the conditioned property
)

The conditioning key must match the atoms.info key (or an atoms.get_<key>() method) used in the training data.