Python API workflow¶

This page shows the script-based workflow using functions from agedi.functional, re-exported at the top-level agedi package. Using the functional API allows for more customisation than relying on the CLI.

Position noisers¶

Choose the noiser that matches your system type:

Position noisers¶
Noiser string / class	Prior	Distribution	Use case
`"Positions"` / `Positions`	StandardNormal	Normal	Gas-phase (molecules, clusters)
`"CellPositions"` / `CellPositions`	UniformCell	Normal	Periodic bulk / surface (default)
`"ConfinedCellPositions"` / `ConfinedCellPositions`	UniformCellConfined	TruncatedNormal	Surface overlayer/adsorbate

Training¶

Here we show the same example as with the CLI, using train_from_atoms().

from ase.io import read
from agedi import train_from_atoms

data = read("training_data.traj", ":")

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    max_time=2,  # hours
    log_dir="logs",
)

Force-field training with a regressor dataset¶

To train a force-field head alongside the diffusion model, pass force_field=True. You can additionally supply a separate regressor_data sequence of Atoms objects that will be used only to train the force-field head (not the diffusion score). This is useful for non-equilibrium structures that carry informative forces but would be unsuitable as diffusion training targets:

from ase.io import read
from agedi import train_from_atoms

equilibrium = read("training_data.traj", ":")
nonequilibrium = read("nonequilibrium.traj", ":")

diffusion, dataset, trainer = train_from_atoms(
    equilibrium,
    force_field=True,
    regressor_data=nonequilibrium,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
)

Using create_dataset() directly:

from ase.io import read
from agedi import create_dataset

dataset = create_dataset(
    read("training_data.traj", ":"),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    regressor_data=read("nonequilibrium.traj", ":"),
)

More detailed workflow¶

Here we show a more detailed example setting up the diffusion model, the dataset and the trainer individually.

from ase.io import read
from agedi import create_diffusion, create_dataset, create_trainer, train

data = read("training_data.traj", ":")

diffusion = create_diffusion(
    noisers=("ConfinedCellPositions",),
)

dataset = create_dataset(
    data,
    mask="MaskFixed",
    confinement=(2.0, 10.0)
)

trainer = create_trainer(
    max_time=2,  # hours
    log_dir="logs"
)

train(diffusion, dataset, trainer=trainer)

Sampling with template¶

To sample from a trained model:

from ase.io import read, write
from agedi import load_diffusion, sample, AtomsGraph

diffusion = load_diffusion("logs/agedi/version_0")

template = AtomsGraph.from_atoms(read("template.traj"), confinement=(2.0, 10.0))

structures = sample(
    diffusion,
    n_samples=12,
    formula="X2Y3",
    template=template,
    confinement=(2.0, 10.0),
    steps=500,
)

write("sampled.traj", structures)

Similar to the CLI, this samples using the last_model.ckpt checkpoint found in logs/agedi/version_0. If you want to use a different checkpoint, you can specify the exact path to it when calling load_diffusion().

Choosing a sampler¶

Pass sampler to sample() to select the reverse-diffusion algorithm:

`sampler`	Description
`None` (default)	Euler–Maruyama (EM): one score evaluation per step
`"em"`	Euler–Maruyama (explicit alias)
`"pc"`	Predictor-corrector: EM predictor + Langevin corrector steps at t_{i-1}
`"heun"`	2nd-order stochastic (Karras et al. 2022): two score evaluations per step
`"ddim"`	Deterministic probability-flow ODE: no noise, fully reproducible
`"heun_ode"`	2nd-order deterministic ODE (Heun’s method on the PF-ODE)
`"ffpc"`	Force-field augmented predictor-corrector (requires a force-field head)

structures = sample(diffusion, n_samples=10, formula="Pd2O2",
                    template=template, sampler="heun", steps=200)

Additional keyword arguments are passed via sampler_kwargs:

structures = sample(
    diffusion, n_samples=10, formula="Pd2O2", template=template,
    sampler="pc",
    sampler_kwargs=dict(corrector_steps=3, corrector_step_size=1e-3),
)

You can also pass a Sampler instance directly instead of a string alias:

from agedi.diffusion.samplers import HeunSampler

sampler = HeunSampler(diffusion.score_model, diffusion.noisers)
structures = sample(diffusion, n_samples=10, formula="Pd2O2",
                    template=template, sampler=sampler)

Force-field augmented sampling (`ffpc`)¶

The ffpc sampler blends the neural score with the force-field gradient during the corrector phase:

\[\tilde{s}(x, t) = (1 - f(t))\,s_\theta(x) + f(t)\,F(x)\]

where \(f(t) = (1-t)^\zeta\). It optionally runs additional Langevin dynamics after the last diffusion step via terminal_steps:

from agedi import load_diffusion, sample

diffusion = load_diffusion("logs/agedi/version_0")

# EM predictor + force-field augmented Langevin corrector
structures = sample(
    diffusion, n_samples=10, formula="Pd2O2", template=template,
    sampler="ffpc",
    sampler_kwargs=dict(corrector_steps=1, mixing_zeta=1.0),
)

# Add overdamped Langevin terminal steps for extra relaxation at 300 K
structures = sample(
    diffusion, n_samples=10, formula="Pd2O2", template=template,
    sampler="ffpc",
    sampler_kwargs=dict(
        corrector_steps=0,
        terminal_steps=200,
        terminal_dynamics="overdamped",
        temperature=0.026,          # eV ≈ 300 K
        # terminal_step_size auto-selected (1e-3, T-independent)
    ),
)

# BAOAB Langevin MD terminal steps with real atomic masses
structures = sample(
    diffusion, n_samples=10, formula="Pd2O2", template=template,
    sampler="ffpc",
    sampler_kwargs=dict(
        corrector_steps=0,
        terminal_steps=500,
        terminal_dynamics="langevin_md",
        temperature=0.026,          # eV ≈ 300 K
        # terminal_step_size auto-selected (1.0 fs for eV/Å models)
        # terminal_friction  auto-selected (γ·dt = 0.1)
    ),
    save_trajectory=True,           # includes bridge + terminal frames
)

With save_trajectory=True the returned list contains one trajectory per sample. Each trajectory has steps + 1 + terminal_steps frames: the pre-step diffusion frames, a bridge frame (the denoised structure before terminal dynamics), and then the terminal step frames.

Full list of ffpc kwargs:

corrector_steps (default 1): corrector iterations per diffusion step
corrector_step_size (default 1e-3): Langevin corrector step size
mixing_zeta (default 1.0): mixing schedule exponent
temperature (default 1.0): temperature for terminal dynamics
terminal_steps (default 0): post-diffusion terminal steps; 0 disables
terminal_dynamics (default "overdamped"): "overdamped" or "langevin_md"
terminal_step_size (default None): auto-selected per mode
terminal_friction (default None): auto-selected (langevin_md only)

Force-field training and prediction¶

To train a forces prediction head alongside the diffusion model, pass force_field=True to train_from_atoms(). The training data must include per-atom forces and total energy (e.g. from a DFT calculation loaded via ASE):

from ase.io import read
from agedi import train_from_atoms

data = read("training_data.traj", ":")  # must contain forces and energy

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("ConfinedCellPositions",),
    mask="MaskFixed",
    confinement=(2.0, 10.0),
    force_field=True,
    max_time=2,
)

Once trained, use predict() to run energy and force predictions on existing structures. The results are returned as ASE Atoms objects with a SinglePointCalculator attached:

from ase.io import read, write
from agedi import load_diffusion, predict

diffusion = load_diffusion("logs/agedi/version_0")

structures = read("structures.traj", index=":")
predicted = predict(diffusion, structures)

# Access predictions on the first structure
print(predicted[0].get_potential_energy())  # eV
print(predicted[0].get_forces())            # eV/Å

write("predicted.traj", predicted)

Core public functions¶

Custom model backends¶

AGeDi ships with the "PaiNN" SchNetPack backend. You can register your own GNN backbone via register_model():

from agedi import register_model

def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf):
    # Build and return (translator, representation, head_list)
    ...

register_model("MyModel", my_factory)

# Then use it in create_diffusion / train_from_atoms:
diffusion = create_diffusion(model="MyModel", ...)

Additional sampling options¶

The sample() function supports several advanced options beyond the basic n_samples / formula / steps arguments:

compile=True — compile the reverse-diffusion step with torch.compile for faster GPU sampling. Requires NVIDIA nvalchemiops. Neighbor-list buffer sizes are estimated automatically before the sampling loop.
save_trajectory=True — return a list of per-sample diffusion trajectories (one list of AtomsGraph / ASE Atoms per sample) instead of only the final structures.
print_timings=True — print a per-stage timing breakdown after each sampling batch (graph init, score model, denoise step, neighbor list, etc.). Useful for profiling.
property — pass a dict of property values to condition sampling on (requires the model to have been trained with conditioning).

from agedi import load_diffusion, sample

diffusion = load_diffusion("logs/agedi/version_0")

# Compiled, 500 steps, save full trajectories
trajectories = sample(
    diffusion,
    n_samples=4,
    formula="Pd4O4",
    steps=500,
    compile=True,
    save_trajectory=True,
    print_timings=True,
)
# trajectories[i] is the full reverse-diffusion path for sample i

Property conditioning¶

Models can optionally be conditioned on a scalar or integer per-structure property (e.g. formation energy, band gap, or total magnetisation). Enable conditioning at training time and then supply the target value at sampling time.

Training with conditioning:

from agedi import train_from_atoms

diffusion, dataset, trainer = train_from_atoms(
    data,
    noisers=("CellPositions",),
    conditioning="energy",        # key in atoms.info or atoms.get_energy()
    conditioning_type="scalar",   # "scalar" (default) or "integer"
)

Sampling with a conditioning value:

from agedi import load_diffusion, sample

diffusion = load_diffusion("logs/agedi/version_0")

structures = sample(
    diffusion,
    n_samples=10,
    formula="Pd4O4",
    property={"energy": -3.5},   # target value for the conditioned property
)

The conditioning key must match the atoms.info key (or an atoms.get_<key>() method) used in the training data.