agedi ===== .. py:module:: agedi Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/agedi/api/index /autoapi/agedi/cli/index /autoapi/agedi/data/index /autoapi/agedi/diffusion/index /autoapi/agedi/functional/index /autoapi/agedi/models/index /autoapi/agedi/utils/index Classes ------- .. autoapisummary:: agedi.AtomsGraph agedi.Agedi agedi.Diffusion agedi.ForcefieldGuidanceConfig Functions --------- .. autoapisummary:: agedi.create_dataset agedi.create_diffusion agedi.create_trainer agedi.load_diffusion agedi.predict agedi.register_model agedi.sample agedi.train agedi.train_from_atoms agedi.train_from_config Package Contents ---------------- .. py:class:: AtomsGraph Bases: :py:obj:`torch_geometric.data.Data` Atomistic Graph Class Class defining a graph with atoms as nodes and edges formed between all atoms within a finite cutoff radius. :param pos: The positions of the atoms with shape (n_atoms, 3). :type pos: torch.Tensor :param x: The node features i.e atomic types of the graph with shape (n_nodes, 1). :type x: torch.Tensor :param edge_index: The edge index tensor of the graph with shape (2, n_edges). :type edge_index: torch.Tensor :param edge_attr: The edge attributes of the graph with shape (n_edges, n_edge_features). :type edge_attr: torch.Tensor :param y: The target tensor of the graph with shape (n_targets,). :type y: Optional[torch.Tensor] :param representation: The representation of the atoms in the graph. :type representation: Optional[Representation] :param confinement: z-directional confinement of the atoms with shape (1,2). :type confinement: Optional[torch.Tensor] :param kwargs: :type kwargs: Dict[str, torch.Tensor] .. py:method:: from_atoms(atoms: ase.Atoms, cutoff: float = 6.0, dtype: torch.dtype = torch.float, initialize_mask: Optional[bool] = None, confinement: Optional[Tuple[float, float]] = None, canonical_cell: bool = False) -> AtomsGraph :classmethod: Create a graph from an ASE Atoms object. :param atoms: The ASE Atoms object. :type atoms: Atoms :param cutoff: The cutoff radius for the edges. :type cutoff: float :param dtype: The data type of the tensors. :type dtype: torch.dtype :param initialize_mask: Whether to initialize the mask tensor. When ``None`` (the default), the mask is initialised only when ``confinement`` is not provided (i.e. ``initialize_mask`` defaults to ``False`` for template / confinement graphs). :type initialize_mask: Optional[bool] :param confinement: Optional z-directional confinement bounds ``(z_min, z_max)`` to attach to the graph. When provided, a ``confinement`` tensor of shape ``(1, 2)`` is stored on the graph. When ``None`` (the default), no confinement attribute is added. :type confinement: Optional[Tuple[float, float]] :param canonical_cell: When ``True``, the cell is stored in canonical lower-triangular form. If the input cell is not already canonical, Cartesian positions are recomputed to preserve fractional coordinates and a warning is raised. Set to ``False`` (the default) to store the cell exactly as provided by ASE (no rotation or recomputation is performed). :type canonical_cell: bool :returns: **graph** -- The graph object. :rtype: AtomsGraph .. py:method:: empty(cutoff: float = 6.0) -> AtomsGraph :classmethod: Create an empty graph. :param cutoff: The cutoff radius for the edges. :type cutoff: float :returns: **graph** -- The graph object. :rtype: AtomsGraph .. py:method:: add_batch_attr(key: str, value: torch.Tensor, type: str = 'node') -> None Add a batch attribute to the graph. :param key: The key of the attribute. :type key: str :param value: The value of the attribute. :type value: torch.Tensor :param type: The type of the attribute. Can be either "node" or "graph" :type type: str :rtype: None .. py:method:: to_atoms() -> ase.Atoms Convert the graph to an ASE Atoms object. Only works on unbatched graphs. :returns: **atoms** -- The atoms object. :rtype: ase.Atoms .. py:method:: _get_scalar_attr(key: str) -> Optional[float] .. py:method:: prepare_for_compile(cutoff: float) -> None Pre-allocate neighbor-list buffers for ``torch.compile`` compatibility. Estimates the maximum number of neighbors per atom using :func:`~nvalchemiops.torch.neighbors.neighbor_utils.estimate_max_neighbors` and the cell-list dimensions using :func:`~nvalchemiops.torch.neighbors.cell_list.estimate_cell_list_sizes`, then allocates the cell list and all output buffers with fixed shapes. Fixed shapes are required for ``torch.compile`` to trace the reverse diffusion step once without retracing on subsequent iterations. Must be called on a :class:`~torch_geometric.data.Batch` **before** the first :meth:`update_graph` call. Requires the ``nvalchemiops`` package. :param cutoff: Neighbor-list cutoff radius (Å). :type cutoff: float :raises RuntimeError: When ``nvalchemiops`` is not installed. :raises TypeError: When called on an unbatched :class:`AtomsGraph` instead of a :class:`~torch_geometric.data.Batch`. .. py:method:: _cell_list_to_graph(neighbor_matrix: torch.Tensor, neighbor_shifts: torch.Tensor, cell: torch.Tensor, dtype: torch.dtype, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: Convert cell-list query output to ``(edge_index, shift_vectors)``. .. py:method:: update_graph() -> bool Update the graph with new edges This should be called after changing any of the positions or cell. :returns: **rebuilt** -- ``True`` when the neighbor list was fully recomputed. :rtype: bool .. py:method:: _make_graph_matscipy(positions: torch.Tensor, cell: torch.Tensor, cutoff: float, pbc: torch.Tensor, dtype: Optional[torch.dtype] = None, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: .. py:method:: make_graph(positions: torch.Tensor, cell: torch.Tensor, cutoff: float, pbc: torch.Tensor, dtype: torch.dtype = None, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: Create the graph-edges from the positions and cell. :param positions: The positions of the atoms. :type positions: torch.Tensor :param cell: The cell of the system. :type cell: torch.Tensor :param cutoff: The cutoff radius for the edges. :type cutoff: float :param pbc: The periodic boundary conditions. :type pbc: torch.Tensor :param dtype: The data type of the output. :type dtype: torch.dtype :returns: * **edge_index** (*torch.Tensor*) -- The edge index tensor. * **shift_vectors** (*torch.Tensor*) -- The shift vectors tensor. .. py:method:: clear_graph() -> None Clear the graph removing all edges :rtype: None .. py:method:: __len__() -> int Return the number of atoms in the graph. :returns: **n_atoms** -- The number of atoms in the graph. :rtype: int .. py:property:: cell :type: torch.Tensor Return the canonical cell matrix of the graph. :returns: **cell** -- The cell matrix of shape ``(3, 3)``. :rtype: torch.Tensor .. py:property:: frac :type: torch.Tensor Return the fractional coordinates of the positions :returns: **frac** -- The fractional coordinates of the atoms. :rtype: torch.Tensor .. py:method:: frac_to_pos(f: torch.Tensor) -> torch.Tensor Fraction -> Cartesian coordinates. Convert fractional coordinates to cartesian coordinates. :param f: The fractional coordinates. :type f: torch.Tensor :returns: **r** -- The cartesian coordinates. :rtype: torch.Tensor .. py:method:: pos_to_frac(r: torch.Tensor) -> torch.Tensor Cartesian -> Fractional coordinates. Convert cartesian coordinates to fractional coordinates. :param r: The cartesian coordinates. :type r: torch.Tensor :returns: **f** -- The fractional coordinates. :rtype: torch.Tensor .. py:property:: positions_mask :type: torch.Tensor Return the mask of the positions that are fixed. True for fixed atom-positions and else false. :returns: **mask** -- The mask of the positions that are fixed. :rtype: torch.Tensor .. py:property:: time :type: torch.Tensor Return the time of the graph. :returns: **time** -- The time of the graph. :rtype: torch.Tensor .. py:property:: representation :type: Optional[Representation] Return the representation of the graph. :returns: **representation** -- The representation of the graph, or ``None`` if not set. :rtype: Optional[Representation] .. py:method:: wrap_positions() -> None Wrap the positions of the atoms to the unit cell. :rtype: None .. py:method:: apply_mask(x: torch.Tensor, val: float = 0.0) -> torch.Tensor Apply the mask to the tensor x. :param x: The tensor to apply the mask to. :type x: torch.Tensor :param val: The value to set the masked values to. :type val: float :returns: **x** -- The tensor with the mask applied. :rtype: torch.Tensor .. py:property:: confinement :type: torch.Tensor Return the confinement of the graph. :returns: **confinement** -- The confinement of the graph. :rtype: torch.Tensor .. py:property:: cellpar :type: torch.Tensor Return the cell parameters of the graph. .. py:method:: _is_lower_triangular(cell: torch.Tensor) -> bool :staticmethod: Return True if *cell* is in canonical lower-triangular form. A cell matrix is considered canonical when the three strictly upper-triangular entries (cell[0,1], cell[0,2], cell[1,2]) are all zero (within a tight floating-point tolerance of 1e-10). :param cell: The cell matrix. :type cell: torch.Tensor :returns: True if the cell is already lower-triangular. :rtype: bool .. py:method:: cell_to_vectors(cell: torch.Tensor) -> torch.Tensor :staticmethod: Convert cell matrix to cell parameters. :param cell: The cell matrix of shape ``(N, 3)`` or ``(N, 3, 3)``. :type cell: torch.Tensor :returns: The cell parameters of shape ``(N, 6)``. :rtype: torch.Tensor .. py:method:: vector_to_cell(cellpar: torch.Tensor) -> torch.Tensor :staticmethod: Convert cell parameters to cell matrix. :param cellpar: The cell parameters of shape ``(N, 6)``. :type cellpar: torch.Tensor :returns: The cell matrix of shape ``(N, 3, 3)`` where each row is a lattice vector. :rtype: torch.Tensor .. py:class:: Agedi(score_model: agedi.models.ScoreModel, noisers: List[agedi.diffusion.noisers.Noiser], regressor_model: Optional[torch.nn.Module] = None, regressor_heads: Optional[List] = None, regressor_loss_weight: float = 1.0, optim_config: Optional[Dict] = None, scheduler_config: Optional[Dict] = None, eps: float = 1e-05) Bases: :py:obj:`lightning.LightningModule`, :py:obj:`agedi.diffusion.diffusion.Diffusion` Full diffusion model: training + sampling. Combines the :class:`~agedi.diffusion.Diffusion` sampling pipeline with :class:`~lightning.LightningModule` training hooks. :param score_model: The score model. :type score_model: ScoreModel :param noisers: A list of noisers. :type noisers: List[Noiser] :param regressor_model: An optional regressor model used for force-field guidance during sampling. When present, its loss is added to the diffusion loss during training. :type regressor_model: torch.nn.Module, optional :param regressor_heads: When provided, a :class:`~agedi.models.regressor.RegressorModel` is built internally using these heads while **sharing** the translator and representation from ``score_model``. Use this parameter (instead of ``regressor_model``) when the backbone should be shared. :type regressor_heads: List, optional :param regressor_loss_weight: Weight applied to the regressor loss. Defaults to ``1.0``. :type regressor_loss_weight: float, optional :param optim_config: Keyword arguments forwarded to :class:`torch.optim.AdamW`. :type optim_config: dict, optional :param scheduler_config: Keyword arguments forwarded to :class:`torch.optim.lr_scheduler.ReduceLROnPlateau`. :type scheduler_config: dict, optional :param eps: Minimum diffusion time value. :type eps: float, optional .. py:attribute:: regressor_loss_weight :value: 1.0 .. py:attribute:: optim_config :value: None .. py:attribute:: scheduler_config :value: None .. py:attribute:: _regressor_training :value: False .. py:method:: on_fit_start() -> None Write ``hparams.yaml`` to the trainer log directory at training start. .. py:method:: get_hparams() -> Dict Return hyperparameters sufficient to reconstruct this diffusion model. :returns: Hyperparameter dictionary with ``_target_``, ``score_model``, ``noisers``, ``optim_config``, ``scheduler_config``, ``eps``, and optionally ``regressor_heads`` or ``regressor_model``. :rtype: dict .. py:method:: setup(stage: str = None) -> None Set up the model (put score model in training mode). .. py:method:: forward(batch: agedi.data.AtomsGraph) -> agedi.data.AtomsGraph Forward pass through the score model. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :returns: The output of the score model forward pass. :rtype: AtomsGraph .. py:method:: loss(batch: agedi.data.AtomsGraph, batch_idx: torch.Tensor) -> Dict Compute the combined diffusion + regressor loss. Always computes the diffusion (denoising) loss on a noised copy of the batch. When a regressor model is present and the batch contains force labels, the regressor loss is added with weight ``regressor_loss_weight``. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param batch_idx: The index of the batch. :type batch_idx: torch.Tensor :returns: A dictionary of losses. :rtype: dict .. py:method:: diffusion_loss(batch: agedi.data.AtomsGraph, batch_idx: torch.Tensor) -> Dict Compute the diffusion (denoising score-matching) loss. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param batch_idx: The index of the batch. :type batch_idx: torch.Tensor :returns: A dictionary of losses. :rtype: dict .. py:method:: regressor_loss(batch: agedi.data.AtomsGraph, batch_idx: torch.Tensor) -> Dict Compute the regressor loss on the un-noised batch. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param batch_idx: The index of the batch. :type batch_idx: torch.Tensor :returns: A dictionary of losses. :rtype: dict :raises ValueError: If no regressor model is attached. .. py:method:: training_step(batch, batch_idx: torch.Tensor) -> torch.Tensor Perform a training step. Computes the combined diffusion + regressor loss (see :meth:`loss`). When the :class:`~agedi.data.Dataset` was set up with a dedicated regressor dataset (via :meth:`~agedi.data.Dataset.add_regressor_data`), ``batch`` is a dict with two keys: * ``"main"`` – a regular training batch used for both the diffusion and regressor loss. * ``"regressor"`` – a regressor-only batch whose structures are *only* forwarded through the regressor loss (not the diffusion loss). When no regressor dataset is present ``batch`` is a plain :class:`~agedi.data.AtomsGraph` batch and the behaviour is identical to the pre-existing implementation. :param batch: A batch of AtomsGraph data, or a dict with ``"main"`` and ``"regressor"`` keys when a dedicated regressor dataset is used. :type batch: AtomsGraph or dict :param batch_idx: The index of the batch. :type batch_idx: torch.Tensor :returns: The combined loss. :rtype: torch.Tensor .. py:method:: validation_step(batch: agedi.data.AtomsGraph, batch_idx: torch.Tensor) -> torch.Tensor Perform a validation step. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param batch_idx: The index of the batch. :type batch_idx: torch.Tensor :returns: The combined loss. :rtype: torch.Tensor .. py:method:: configure_optimizers() -> Dict Configure optimizers and learning-rate schedulers. When a regressor model is present a single optimizer is built over the deduplicated union of ``score_model`` and ``regressor_model`` parameters (shared parameters appear only once). :returns: A dictionary with ``"optimizer"``, ``"lr_scheduler"``, and ``"monitor"`` keys. :rtype: dict .. py:method:: _scheduler_monitor() -> str Return the metric used by ReduceLROnPlateau. .. py:property:: regressor_training :type: bool Whether the regressor model is in training mode. .. py:class:: Diffusion(score_model: ScoreModel, noisers: List[agedi.diffusion.noisers.Noiser], regressor_model: Optional[torch.nn.Module] = None, eps: float = 1e-05) Pure-Python sampling core for diffusion models. Holds the score model, noisers, and an optional regressor and provides the full forward / reverse / sampling pipeline. This class does **not** inherit from :class:`torch.nn.Module` or :class:`lightning.LightningModule` and therefore has no training hooks. When used through :class:`~agedi.diffusion.Agedi` (which inherits from both this class and :class:`lightning.LightningModule`), the Lightning infrastructure manages device placement and module registration. When used standalone, device information is derived from the score model's parameters via the :attr:`device` property. :param score_model: The score model. :type score_model: ScoreModel :param noisers: A list of noisers. :type noisers: List[Noiser] :param regressor_model: An optional regressor model used for force-field guidance during sampling. :type regressor_model: torch.nn.Module, optional :param eps: Minimum value for the diffusion time step (used in :meth:`sample_time`). :type eps: float, optional .. py:attribute:: score_model .. py:attribute:: noisers .. py:attribute:: regressor_model :value: None .. py:attribute:: eps :value: 1e-05 .. py:attribute:: lbfgs_step_sizer :type: Optional[agedi.diffusion.guidance.BatchedLBFGSStepSizer] :value: None .. py:attribute:: zeta :type: float :value: 3.0 .. py:attribute:: noiser_keys .. py:attribute:: score_keys .. py:attribute:: _compiled_reverse_step :value: None .. py:property:: device :type: torch.device Infer the computation device from the score model's parameters. When used through :class:`~agedi.diffusion.Agedi` (which also inherits :class:`lightning.LightningModule`), Lightning's own ``device`` property takes precedence. .. py:method:: sample_time(batch: agedi.data.AtomsGraph) -> None Sample a random diffusion time for each graph in *batch*. Draws times uniformly from ``[eps, 1]`` and assigns them to ``batch.time`` at atom resolution. :param batch: A batch of AtomsGraph data; modified in-place. :type batch: AtomsGraph .. py:method:: forward_step(batch: agedi.data.AtomsGraph) -> agedi.data.AtomsGraph Forward diffusion step (corruption). Applies each noiser in order to corrupt the batch. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :returns: The corrupted batch. :rtype: AtomsGraph .. py:method:: reverse_step(batch: agedi.data.AtomsGraph, delta_t: float, force_field_guidance: float, last: bool = False, timings: Optional[SamplingTimings] = None) -> agedi.data.AtomsGraph Reverse diffusion step (denoising). Evaluates the score model and applies one reverse-SDE step through all noisers. Optionally applies force-field guidance afterwards. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param delta_t: The time step. :type delta_t: float :param force_field_guidance: Scale of the force-field guidance (``0.0`` disables it). :type force_field_guidance: float :param last: Whether this is the final denoising step. :type last: bool, optional :param timings: If provided, timing measurements are accumulated here. :type timings: SamplingTimings, optional :returns: The denoised batch. :rtype: AtomsGraph .. py:method:: corrector_step(batch: agedi.data.AtomsGraph, corrector_dt: float) -> agedi.data.AtomsGraph Langevin corrector step at constant time. Evaluates the score model and applies one Langevin corrector step through all noisers (in reverse order). :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param corrector_dt: Step size for the Langevin corrector. :type corrector_dt: float :returns: The corrected batch. :rtype: AtomsGraph .. py:method:: force_field_guidance_step(batch: agedi.data.AtomsGraph, scale: float, max_step_size: float = 0.1) -> agedi.data.AtomsGraph Apply one force-field guidance step. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param scale: Base scale of the force field guidance. :type scale: float :param max_step_size: Maximum allowed step size magnitude. :type max_step_size: float, optional :returns: Updated batch. :rtype: AtomsGraph .. py:method:: post_diffusion_relaxation_step(batch: agedi.data.AtomsGraph, scale: float = 0.1) -> agedi.data.AtomsGraph Perform a pure force-based relaxation step. :param batch: A batch of AtomsGraph data. :type batch: AtomsGraph :param scale: Step size scaling factor. :type scale: float, optional :returns: Updated batch. :rtype: AtomsGraph .. py:method:: _initialize_graph(cutoff: float, **kwargs) -> agedi.data.AtomsGraph Initialise a single graph from noiser priors. :param cutoff: Cutoff radius for the neighbour list. :type cutoff: float :param \*\*kwargs: Additional keyword arguments passed to the graph (e.g. ``cell``, ``template``, ``pbc``). :returns: The initialised graph. :rtype: AtomsGraph .. py:method:: _sync_for_timing(device: Optional[torch.device]) -> None :staticmethod: .. py:method:: _time_sampling_call(device: Optional[torch.device], timings: SamplingTimings, key: str, fn, *args, **kwargs) .. py:method:: _format_timing_line(label: str, value: float, count: Optional[int] = None) -> str :staticmethod: .. py:method:: _print_sampling_timings(timings: SamplingTimings) -> None .. py:property:: compiled_reverse_step Lazily compile :meth:`reverse_step` with ``torch.compile``. The compiled kernel is cached as ``self._compiled_reverse_step`` so that compilation happens at most once per model instance. Using a per-instance cache (rather than a class-level ``@torch.compile`` decorator) means that two :class:`Diffusion` objects with different architectures will each compile their own kernel and never interfere. .. note:: ``timings`` must **not** be passed to the compiled function — ``time.perf_counter`` is not traceable by Dynamo. Time the compiled call from outside in :meth:`_sample_batch` using the ``is_compiled`` flag. .. py:method:: _sample_batch(batch: torch_geometric.data.Batch, steps: int, eps: float, force_field_guidance: float, save_trajectory: bool, progress_bar: bool, force_threshold: float, max_extra_steps: int, corrector_steps: int = 0, corrector_step_size: float = 0.001, timings: Optional[SamplingTimings] = None, reverse_step_fn=None, is_compiled: bool = False) -> List[agedi.data.AtomsGraph] Run the reverse-diffusion loop for a pre-built batch. :param batch: A batch of :class:`~agedi.data.AtomsGraph` data at ``t=1``. :type batch: Batch :param steps: Number of reverse-diffusion steps. :type steps: int :param eps: Minimum time value (end of trajectory). :type eps: float :param force_field_guidance: Scale of the force-field guidance (``0.0`` disables it). :type force_field_guidance: float :param save_trajectory: Whether to collect and return all intermediate states. :type save_trajectory: bool :param progress_bar: Whether to display a tqdm progress bar. :type progress_bar: bool :param force_threshold: Maximum per-atom force for terminating post-diffusion relaxation. :type force_threshold: float :param max_extra_steps: Maximum extra relaxation steps after the main trajectory. :type max_extra_steps: int :param corrector_steps: Number of Langevin corrector passes after each predictor step. ``0`` (default) disables the corrector (standard DDPM/EM sampling). :type corrector_steps: int, optional :param corrector_step_size: Step size used for each Langevin corrector step. Defaults to ``1e-3``. :type corrector_step_size: float, optional :param timings: If provided, timing measurements are accumulated here. :type timings: SamplingTimings, optional :param reverse_step_fn: The reverse step function to use. Defaults to ``self.reverse_step``. Pass a ``torch.compile``-wrapped version to enable compiled sampling. :type reverse_step_fn: callable, optional :param is_compiled: Whether ``reverse_step_fn`` is a compiled function. :type is_compiled: bool, optional :returns: Final structures, or (when *save_trajectory* is ``True``) a list of trajectories (one per graph). :rtype: List[AtomsGraph] .. py:method:: _sample(N: int, steps: int, cutoff: float, eps: float, force_field_guidance: float, force_threshold: float, max_extra_steps: int, progress_bar: bool, save_trajectory: bool, corrector_steps: int = 0, corrector_step_size: float = 0.001, print_timings: bool = False, compile: bool = False, **kwargs) -> List[agedi.data.AtomsGraph] Build *N* graphs from priors and run the sampling loop. :param N: Number of structures to generate. :type N: int :param steps: Number of reverse-diffusion steps. :type steps: int :param cutoff: Cutoff radius for the neighbour list. :type cutoff: float :param eps: Minimum time value (end of trajectory). :type eps: float :param force_field_guidance: Scale of the force-field guidance. :type force_field_guidance: float :param force_threshold: Maximum per-atom force for post-diffusion relaxation. :type force_threshold: float :param max_extra_steps: Maximum extra relaxation steps. :type max_extra_steps: int :param progress_bar: Show tqdm progress bar. :type progress_bar: bool :param save_trajectory: Collect all intermediate states. :type save_trajectory: bool :param corrector_steps: Langevin corrector passes per predictor step. :type corrector_steps: int, optional :param corrector_step_size: Step size for each corrector pass. :type corrector_step_size: float, optional :param print_timings: Print a timing breakdown after sampling completes. :type print_timings: bool, optional :param compile: Use ``torch.compile`` on the reverse diffusion step. :type compile: bool, optional :param \*\*kwargs: Keyword arguments forwarded to :meth:`_initialize_graph`. :returns: Sampled structures (or trajectories when *save_trajectory* is ``True``). :rtype: List[AtomsGraph] .. py:method:: sample(N: int, template=None, batch_size: Optional[int] = 64, steps: Optional[int] = 500, cutoff: Optional[float] = 6.0, eps: Optional[float] = 0.001, n_atoms: Optional[int] = None, atomic_numbers: Optional[List[int]] = None, formula: Optional[str] = None, positions: Optional[numpy.ndarray] = None, cell: Optional[numpy.ndarray] = None, pbc: Optional[numpy.ndarray] = None, confinement: Optional[Tuple[float, float]] = None, compile: bool = False, ff_guidance: Optional[agedi.diffusion.guidance.ForcefieldGuidanceConfig] = None, property: Optional[Dict] = None, progress_bar: Optional[bool] = False, save_trajectory: Optional[bool] = False, print_timings: Optional[bool] = False, corrector_steps: int = 0, corrector_step_size: float = 0.001) -> List[agedi.data.AtomsGraph] Sample structures from the diffusion model. The minimum required arguments depend on the configured noisers and whether a template is provided: * ``n_atoms`` -- always required unless derivable from ``atomic_numbers`` or ``formula``. * ``atomic_numbers`` -- required unless a types-noiser is configured (key ``"x"``), or derivable from ``formula``. * ``positions`` -- required when no positions-noiser is configured (type-only diffusion). * ``cell`` -- required when no ``template`` is given. * ``pbc`` -- optional; defaults to ``[True, True, True]``. :param N: Number of structures to generate. :type N: int :param template: Template structure. ``cell`` and ``pbc`` are taken from the template when not explicitly provided. :type template: AtomsGraph or ase.Atoms, optional :param batch_size: Internal batch size for splitting large *N*. :type batch_size: int, optional :param steps: Number of reverse-diffusion steps. :type steps: int, optional :param cutoff: Cutoff radius for the neighbour list. :type cutoff: float, optional :param eps: Minimum time value at the end of the trajectory. :type eps: float, optional :param n_atoms: Number of atoms per structure. :type n_atoms: int, optional :param atomic_numbers: Atomic numbers of the atoms to generate. :type atomic_numbers: List[int], optional :param formula: Chemical formula (e.g. ``"H2O"``). :type formula: str, optional :param positions: Fixed atom positions (shape ``(n_atoms, 3)``). :type positions: np.ndarray, optional :param cell: Unit-cell matrix (3x3). :type cell: np.ndarray, optional :param pbc: Periodic boundary conditions. :type pbc: np.ndarray, optional :param confinement: Z-directional confinement ``(z_min, z_max)``. :type confinement: Tuple[float, float], optional :param compile: When ``True``, use ``torch.compile`` on the reverse diffusion step for improved throughput on CUDA hardware. :type compile: bool, optional :param ff_guidance: Force-field guidance configuration. :type ff_guidance: ForcefieldGuidanceConfig, optional :param property: Conditioning properties (key -> scalar tensor). :type property: dict, optional :param progress_bar: Show a tqdm progress bar. :type progress_bar: bool, optional :param save_trajectory: Return full trajectories instead of final structures. :type save_trajectory: bool, optional :param print_timings: Print a timing breakdown after sampling completes. :type print_timings: bool, optional :param corrector_steps: Number of Langevin corrector passes after each predictor step. ``0`` (default) gives standard (predictor-only) sampling. :type corrector_steps: int, optional :param corrector_step_size: Step size for each corrector pass. Defaults to ``1e-3``. :type corrector_step_size: float, optional :returns: Sampled structures, or trajectories when *save_trajectory* is ``True``. :rtype: List[AtomsGraph] .. py:class:: ForcefieldGuidanceConfig Configuration for force-field guided sampling. :param guidance: Scale of the force-field guidance applied at each reverse step. Set to ``0.0`` (the default) to disable guidance entirely. :type guidance: float :param zeta: Exponent for the time-dependent weight factor ``(1 - t)**zeta``. Higher values concentrate guidance near the end of the trajectory. :type zeta: float :param force_threshold: Convergence criterion for the optional post-diffusion relaxation: the maximum per-atom force magnitude (eV/Å) below which relaxation stops. :type force_threshold: float :param max_extra_steps: Maximum number of additional relaxation steps performed after the main diffusion trajectory when ``guidance > 0``. :type max_extra_steps: int .. py:attribute:: guidance :type: float :value: 0.0 .. py:attribute:: zeta :type: float :value: 3.0 .. py:attribute:: force_threshold :type: float :value: 0.05 .. py:attribute:: max_extra_steps :type: int :value: 0 .. py:function:: create_dataset(data: Sequence[ase.Atoms], cutoff: float = 6.0, batch_size: int = 64, train_split: Union[float, int] = 0.9, val_split: Union[float, int] = 0.1, mask: str = 'none', confinement: Optional[Tuple[float, float]] = None, conditioning: str = 'none', conditioning_type: str = 'scalar', repeat: Optional[int] = None, canonical_cell: bool = False, regressor_data: Optional[Sequence[ase.Atoms]] = None, properties: Optional[List[Dict]] = None) -> agedi.data.Dataset Create and setup an AGeDi Dataset from ASE Atoms objects. :param data: ASE Atoms objects to add to the dataset. :type data: Sequence[Atoms] :param cutoff: Neighbour-list cutoff radius in Ångström. :type cutoff: float, optional :param batch_size: Mini-batch size used during training/validation. :type batch_size: int, optional :param train_split: Fraction or absolute number of samples for the training split. :type train_split: Union[float, int], optional :param val_split: Fraction or absolute number of samples for the validation split. :type val_split: Union[float, int], optional :param mask: Atom-mask method (e.g. ``"MaskFixed"`` or ``"none"``). :type mask: str, optional :param confinement: Z-axis confinement bounds ``(z_min, z_max)``. :type confinement: Tuple[float, float], optional :param conditioning: Name of the per-structure property to use as a conditioning signal. The value is read from ``atoms.info[conditioning]`` or the corresponding ``atoms.get_()`` method. Ignored when set to ``"none"`` (default). :type conditioning: str, optional :param conditioning_type: ``"scalar"`` (default) or ``"node"``; controls how the conditioning property is broadcast onto the graph. :type conditioning_type: str, optional :param repeat: When given, augment the dataset by repeating each structure up to ``repeat`` times along the first two cell vectors. :type repeat: int, optional :param canonical_cell: Store cells in canonical lower-triangular form. :type canonical_cell: bool, optional :param regressor_data: Additional ASE Atoms objects used to train a regressor head. :type regressor_data: Sequence[Atoms], optional :param properties: Per-structure property dictionaries; **must** contain exactly one entry per element in *data*. Each dictionary is merged into the corresponding graph object via ``setattr``, matching the layout accepted by :meth:`~agedi.data.Dataset.add_atoms_data`. Keys already produced by the *conditioning* logic are overwritten by values in *properties* when both are present. :type properties: List[Dict], optional :returns: A fully set-up :class:`~agedi.data.Dataset` ready for training. :rtype: Dataset .. py:function:: create_diffusion(model: str = 'PaiNN', cutoff: float = 6.0, feature_size: int = 64, n_blocks: int = 4, n_rbf: int = 30, noisers: Sequence[Union[str, Noiser]] = ('CellPositions', ), sde: Union[str, SDE] = 've', conditioning: str = 'none', conditioning_type: str = 'scalar', confinement: Optional[Tuple[float, float]] = None, force_field: bool = False, lr: float = 0.0001, lr_factor: float = 0.95, lr_patience: int = 100, weight_decay: float = 0.0, eps: float = 1e-05, guidance_weight: float = -1.0, device: Optional[Union[str, torch.device]] = None, type_map: Optional[List[int]] = None) -> agedi.Agedi Create a diffusion model for script-based training and sampling. :param model: GNN backbone architecture. The name is looked up in the model registry; use :func:`register_model` to add custom backends. The built-in default is ``"PaiNN"`` (SchNetPack PaiNN). :type model: str, optional :param cutoff: Neighbour-list cutoff radius in Å. Defaults to ``6.0``. :type cutoff: float, optional :param feature_size: Embedding / feature dimension. Defaults to ``64``. :type feature_size: int, optional :param n_blocks: Number of interaction blocks. Defaults to ``4``. :type n_blocks: int, optional :param n_rbf: Number of radial basis functions. Defaults to ``30``. :type n_rbf: int, optional :param noisers: Noiser identifiers or instances to include. Defaults to ``("CellPositions",)``. Recognised string identifiers (CamelCase preferred; snake_case aliases also accepted for backwards compatibility): * ``"Positions"`` / ``"positions"`` – :class:`~agedi.diffusion.noisers.Positions` (StandardNormal prior + Normal, for gas-phase clusters). * ``"CellPositions"`` / ``"cell_positions"`` – :class:`~agedi.diffusion.noisers.CellPositions` (UniformCell prior + Normal, for periodic bulk/surface systems). * ``"ConfinedCellPositions"`` / ``"confined_cell_positions"`` – :class:`~agedi.diffusion.noisers.ConfinedCellPositions` (UniformCellConfined prior + TruncatedNormal, for Z-confined systems). * ``"Types"`` / ``"types"`` – :class:`~agedi.diffusion.noisers.Types`. :type noisers: Sequence[str or Noiser], optional :param sde: SDE for position noisers. Short aliases: ``"ve"`` (default), ``"vp"``. Pass an instantiated :class:`~agedi.diffusion.sdes.SDE` for full control. :type sde: str or SDE, optional :param conditioning: Property to condition on, or ``"none"`` for time-only conditioning. Defaults to ``"none"``. :type conditioning: str, optional :param conditioning_type: Type of the conditioning module: ``"scalar"`` or ``"integer"``. Defaults to ``"scalar"``. :type conditioning_type: str, optional :param confinement: Z-direction confinement bounds ``(z_min, z_max)`` in Å. :type confinement: Tuple[float, float], optional :param force_field: When ``True``, attach a ``diffusion.regressor_model``. The heads **shares** the same representation and translator as the score model so that atomic embeddings are learned jointly. It is trained whenever the training batch contains per-atom forces and total energies (i.e. the ASE training structures have DFT (or other) energy and forces). The trained forces head enables force-field guided sampling via :class:`~agedi.diffusion.ForcefieldGuidanceConfig`. Defaults to ``False``. :type force_field: bool, optional :param lr: Learning rate. Defaults to ``1e-4``. :type lr: float, optional :param lr_factor: LR-scheduler reduction factor. Defaults to ``0.95``. :type lr_factor: float, optional :param lr_patience: LR-scheduler patience (epochs). Defaults to ``100``. :type lr_patience: int, optional :param weight_decay: Optimizer weight-decay. Defaults to ``0.0``. :type weight_decay: float, optional :param eps: Minimum diffusion time. Defaults to ``1e-5``. :type eps: float, optional :param guidance_weight: Classifier-free guidance weight. Defaults to ``-1.0`` (disabled). :type guidance_weight: float, optional :param device: Target compute device. When ``None`` CUDA is used if available, otherwise CPU. :type device: str or torch.device, optional :param type_map: Compact type map for the :class:`~agedi.diffusion.noisers.Types` noiser. ``type_map[0]`` must be ``0`` (absorbing state) and ``type_map[i]`` is the atomic number for compact index ``i``. When provided, the ``Types`` noiser and the ``TypesScore`` head use a reduced vocabulary of size ``len(type_map)`` instead of the default 100. Auto-populated by :func:`train_from_atoms` when a ``"Types"`` noiser is requested. :type type_map: List[int], optional :returns: A freshly initialised :class:`~agedi.Agedi` model. :rtype: Agedi .. py:function:: create_trainer(*, epochs: int = -1, max_time: Optional[Union[int, Dict, datetime.timedelta]] = 24, accelerator: str = 'auto', devices: int = 1, logger: str = 'tensorboard', log_dir: str = 'logs', project: str = 'agedi', name: str = 'agedi', log_interval: int = 10, gradient_clip_val: float = 10.0, progress_bar: bool = False, print_epoch_interval: int = 10, log_grad_norm: bool = True, repeat: Optional[int] = None, repeat_epoch: Optional[int] = None, hparams: Optional[Dict] = None, extra_callbacks: Optional[List[lightning.pytorch.callbacks.Callback]] = None) -> lightning.Trainer Create a Lightning trainer configured for AGeDi. :param epochs: Maximum number of training epochs (``-1`` = unlimited). :param max_time: Wall-clock time limit for training. Accepts: * ``int`` – number of *hours* (e.g. ``24`` ≡ 24 hours). * ``dict`` – Lightning-style mapping, e.g. ``{"days": 0, "hours": 12, "minutes": 30, "seconds": 0}``. * :class:`datetime.timedelta` – a Python timedelta object. * ``None`` – no time limit. :param accelerator: Hardware accelerator to use (e.g. ``"auto"``, ``"gpu"``, ``"cpu"``). Default: ``"auto"``. :param devices: Number of devices to train on. Default: ``1``. :param logger: Logging backend: ``"tensorboard"`` (default) or ``"wandb"``. :param log_dir: Root directory for logs and checkpoints. Default: ``"logs"``. :param project: WandB project name (only used when ``logger="wandb"``). :param name: Experiment display name used by TensorBoard and WandB as the run sub-directory / run name. Default: ``"agedi"``. :param log_interval: How often (in steps) to log metrics. Default: ``10``. :param gradient_clip_val: Maximum gradient norm for gradient clipping. Default: ``10.0``. :param progress_bar: Whether to show a Lightning progress bar. Default: ``False``. :param print_epoch_interval: Print a one-line training summary to stdout every this many epochs. Set to ``0`` to disable. Default: ``10``. :param log_grad_norm: Whether to log the total gradient norm during training. Disable for large models where the per-step overhead is undesirable. Default: ``True``. :param repeat: Number of repetition levels for cell-repeat data augmentation. Must be set together with *repeat_epoch*. When ``None`` (default), no repetition augmentation is applied. :param repeat_epoch: How many epochs between repetition-level increases. Required when *repeat* is set. :param hparams: Hyperparameters dict logged to ``hparams.yaml`` via :class:`~agedi.data.callbacks.HParamsMetricLogger`. When ``None`` (default), no extra hyperparameter logging is performed. :param extra_callbacks: Extra Lightning callbacks to append to the default callback list. When ``None`` (default) only the built-in callbacks are used. :returns: A configured :class:`~lightning.Trainer` ready to call ``trainer.fit(diffusion, dataset)``. :rtype: lightning.Trainer .. py:function:: load_diffusion(path: Union[str, pathlib.Path], checkpoint: Optional[Union[str, pathlib.Path]] = None, device: Optional[Union[str, torch.device]] = None) -> Agedi Load a trained diffusion model from an AGeDi log directory. The model architecture is fully reconstructed from the Hydra-compatible ``diffusion`` config stored in ``hparams.yaml``, so no additional parameters are needed. :param path: Path to the AGeDi log / model directory (or directly to the ``hparams.yaml`` file). :param checkpoint: Path to a specific checkpoint file. When ``None`` the latest checkpoint (``checkpoints/last_model.ckpt``) is loaded automatically. :param device: Device to load the model onto. When ``None`` CUDA is used if available, otherwise CPU. .. py:function:: predict(diffusion: Agedi, structures: Sequence[ase.Atoms], *, batch_size: int = 64, cutoff: Optional[float] = None) -> List[ase.Atoms] Predict energies and forces for input structures using a trained force-field. The model must have been trained with ``force_field=True`` (i.e. it must have a ``regressor_model`` attached). The predicted energy and forces are attached to the returned :class:`~ase.Atoms` objects via an :class:`~ase.calculators.singlepoint.SinglePointCalculator`. :param diffusion: A trained :class:`~agedi.Agedi` model with a force-field regressor (trained with ``--force_field``). :param structures: Input ASE :class:`~ase.Atoms` objects to run predictions on. :param batch_size: Number of structures per inference batch. Defaults to ``64``. :param cutoff: Neighbour-list cutoff in Å. When ``None`` (default), the cutoff is read from the model's representation automatically. :returns: The input structures with a :class:`~ase.calculators.singlepoint.SinglePointCalculator` attached containing the predicted energy and/or forces. :rtype: List[Atoms] :raises ValueError: If the model does not have a force-field regressor. .. py:function:: register_model(name: str, factory: Callable) -> None Register a custom score model backbone factory under *name*. The factory is called with the keyword arguments ``cutoff``, ``heads``, ``feature_size``, ``n_blocks``, ``head_dim``, and ``n_rbf`` and must return a 3-tuple ``(translator, representation, List[Head])``. Registered models can be selected by passing ``model=name`` to :func:`create_diffusion`. :param name: Alias used to select this backend (e.g. ``"PaiNN"``). :type name: str :param factory: Factory function with signature:: factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf) -> Tuple[Translator, nn.Module, List[Head]] :type factory: Callable .. rubric:: Examples :: from agedi.functional import register_model def my_factory(cutoff, heads, feature_size, n_blocks, head_dim, n_rbf): ... return translator, representation, head_list register_model("MyModel", my_factory) .. py:function:: sample(diffusion: Agedi, *, n_samples: int, n_atoms: Optional[int] = None, atomic_numbers: Optional[List[int]] = None, formula: Optional[str] = None, positions: Optional[numpy.ndarray] = None, cell: Optional[numpy.ndarray] = None, pbc: Optional[numpy.ndarray] = None, template: Optional[Union[agedi.data.AtomsGraph, ase.Atoms]] = None, confinement: Optional[Tuple[float, float]] = None, compile: bool = False, steps: int = 500, eps: float = 0.001, batch_size: int = 64, ff_guidance: Optional[agedi.diffusion.ForcefieldGuidanceConfig] = None, property: Optional[Dict[str, float]] = None, progress_bar: bool = False, save_trajectory: bool = False, print_timings: bool = False, as_atoms: bool = True) -> Union[List[agedi.data.AtomsGraph], List[ase.Atoms], List[List[agedi.data.AtomsGraph]], List[List[ase.Atoms]]] Sample structures from a trained diffusion model. :param diffusion: A trained :class:`~agedi.Agedi` model. :param n_samples: Number of structures to generate. :param n_atoms: Number of atoms per structure. Automatically determined from ``formula`` if provided, or from the length of ``atomic_numbers`` when ``n_atoms`` is not explicitly given. :param atomic_numbers: Atomic numbers of the generated atoms. Not required when the model has a types-noiser or when ``formula`` is provided. :param formula: Chemical formula (e.g. ``"H2O"``). Used to derive ``n_atoms`` and ``atomic_numbers`` when they are not provided explicitly. :param positions: Fixed positions of the atoms (shape ``(n_atoms, 3)``). Required when no positions-noiser is configured (type-only diffusion). Positions will not be modified during sampling. :param cell: Unit-cell matrix (3×3 array or flat length-9 array). Not required when ``template`` is provided (the template's cell is used instead). :param pbc: Periodic boundary conditions as a length-3 boolean array (e.g. ``[True, True, False]``). When ``template`` is provided its ``pbc`` is used unless this argument is given explicitly. Defaults to ``[True, True, True]`` (fully periodic) when neither ``template`` nor ``pbc`` is supplied. :param template: Template structure. May be an :class:`~agedi.AtomsGraph` or an ASE :class:`~ase.Atoms` object; the latter is automatically converted to an :class:`~agedi.AtomsGraph` (with ``confinement`` applied when provided). When given, ``cell`` and ``pbc`` are taken from the template unless explicitly provided. :param ff_guidance: Force-field guidance configuration. When ``None`` (default) a :class:`~agedi.diffusion.ForcefieldGuidanceConfig` with default values is used (i.e. guidance is disabled). :param compile: When ``True``, use ``torch.compile`` on the reverse diffusion step for faster sampling. Before the sampling loop starts, the maximum number of neighbors and cell-list dimensions are estimated automatically via NVIDIA nvalchemiops (``estimate_max_neighbors`` and ``estimate_cell_list_sizes``), and all neighbor-list buffers are pre-allocated with fixed shapes. Requires NVIDIA nvalchemiops. Defaults to ``False``. :param print_timings: When ``True``, print a per-stage timing breakdown at the end of each sampling batch (graph init, score model, denoise, neighbor list, etc.). Defaults to ``False``. .. py:function:: train(diffusion: Agedi, dataset: agedi.data.Dataset, trainer: Optional[lightning.Trainer] = None, ckpt_path: Optional[Union[str, pathlib.Path]] = None, **trainer_kwargs) -> lightning.Trainer Train a diffusion model and return the trainer used. :param diffusion: The diffusion model to train. :param dataset: The dataset to train on. :param trainer: A pre-configured Lightning :class:`~lightning.Trainer`. When ``None`` a new trainer is created from *trainer_kwargs*. :param ckpt_path: Path to a Lightning checkpoint (``.ckpt``) to resume training from. When provided the full training state (model weights, optimiser, LR-scheduler, and epoch counter) is restored before fitting. Equivalent to passing ``ckpt_path`` to ``trainer.fit()``. :param \*\*trainer_kwargs: Additional keyword arguments forwarded to :func:`create_trainer` when *trainer* is ``None``. .. py:function:: train_from_atoms(*args, **kwargs) .. py:function:: train_from_config(*args, **kwargs)