agedi.data ========== .. py:module:: agedi.data Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/agedi/data/atoms_graph/index /autoapi/agedi/data/callbacks/index /autoapi/agedi/data/dataset/index /autoapi/agedi/data/transforms/index Classes ------- .. autoapisummary:: agedi.data.AtomsGraph agedi.data.Representation agedi.data.Dataset Package Contents ---------------- .. py:class:: AtomsGraph Bases: :py:obj:`torch_geometric.data.Data` Atomistic Graph Class Class defining a graph with atoms as nodes and edges formed between all atoms within a finite cutoff radius. :param pos: The positions of the atoms with shape (n_atoms, 3). :type pos: torch.Tensor :param x: The node features i.e atomic types of the graph with shape (n_nodes, 1). :type x: torch.Tensor :param edge_index: The edge index tensor of the graph with shape (2, n_edges). :type edge_index: torch.Tensor :param edge_attr: The edge attributes of the graph with shape (n_edges, n_edge_features). :type edge_attr: torch.Tensor :param y: The target tensor of the graph with shape (n_targets,). :type y: Optional[torch.Tensor] :param representation: The representation of the atoms in the graph. :type representation: Optional[Representation] :param confinement: z-directional confinement of the atoms with shape (1,2). :type confinement: Optional[torch.Tensor] :param kwargs: :type kwargs: Dict[str, torch.Tensor] .. py:method:: from_atoms(atoms: ase.Atoms, cutoff: float = 6.0, dtype: torch.dtype = torch.float, initialize_mask: Optional[bool] = None, confinement: Optional[Tuple[float, float]] = None, canonical_cell: bool = False) -> AtomsGraph :classmethod: Create a graph from an ASE Atoms object. :param atoms: The ASE Atoms object. :type atoms: Atoms :param cutoff: The cutoff radius for the edges. :type cutoff: float :param dtype: The data type of the tensors. :type dtype: torch.dtype :param initialize_mask: Whether to initialize the mask tensor. When ``None`` (the default), the mask is initialised only when ``confinement`` is not provided (i.e. ``initialize_mask`` defaults to ``False`` for template / confinement graphs). :type initialize_mask: Optional[bool] :param confinement: Optional z-directional confinement bounds ``(z_min, z_max)`` to attach to the graph. When provided, a ``confinement`` tensor of shape ``(1, 2)`` is stored on the graph. When ``None`` (the default), no confinement attribute is added. :type confinement: Optional[Tuple[float, float]] :param canonical_cell: When ``True``, the cell is stored in canonical lower-triangular form. If the input cell is not already canonical, Cartesian positions are recomputed to preserve fractional coordinates and a warning is raised. Set to ``False`` (the default) to store the cell exactly as provided by ASE (no rotation or recomputation is performed). :type canonical_cell: bool :returns: **graph** -- The graph object. :rtype: AtomsGraph .. py:method:: empty(cutoff: float = 6.0) -> AtomsGraph :classmethod: Create an empty graph. :param cutoff: The cutoff radius for the edges. :type cutoff: float :returns: **graph** -- The graph object. :rtype: AtomsGraph .. py:method:: add_batch_attr(key: str, value: torch.Tensor, type: str = 'node') -> None Add a batch attribute to the graph. :param key: The key of the attribute. :type key: str :param value: The value of the attribute. :type value: torch.Tensor :param type: The type of the attribute. Can be either "node" or "graph" :type type: str :rtype: None .. py:method:: to_atoms() -> ase.Atoms Convert the graph to an ASE Atoms object. Only works on unbatched graphs. :returns: **atoms** -- The atoms object. :rtype: ase.Atoms .. py:method:: _get_scalar_attr(key: str) -> Optional[float] .. py:method:: prepare_for_compile(cutoff: float) -> None Pre-allocate neighbor-list buffers for ``torch.compile`` compatibility. Estimates the maximum number of neighbors per atom using :func:`~nvalchemiops.torch.neighbors.neighbor_utils.estimate_max_neighbors` and the cell-list dimensions using :func:`~nvalchemiops.torch.neighbors.cell_list.estimate_cell_list_sizes`, then allocates the cell list and all output buffers with fixed shapes. Fixed shapes are required for ``torch.compile`` to trace the reverse diffusion step once without retracing on subsequent iterations. Must be called on a :class:`~torch_geometric.data.Batch` **before** the first :meth:`update_graph` call. Requires the ``nvalchemiops`` package. :param cutoff: Neighbor-list cutoff radius (Å). :type cutoff: float :raises RuntimeError: When ``nvalchemiops`` is not installed. :raises TypeError: When called on an unbatched :class:`AtomsGraph` instead of a :class:`~torch_geometric.data.Batch`. .. py:method:: _cell_list_to_graph(neighbor_matrix: torch.Tensor, neighbor_shifts: torch.Tensor, cell: torch.Tensor, dtype: torch.dtype, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: Convert cell-list query output to ``(edge_index, shift_vectors)``. .. py:method:: update_graph() -> bool Update the graph with new edges This should be called after changing any of the positions or cell. :returns: **rebuilt** -- ``True`` when the neighbor list was fully recomputed. :rtype: bool .. py:method:: _make_graph_matscipy(positions: torch.Tensor, cell: torch.Tensor, cutoff: float, pbc: torch.Tensor, dtype: Optional[torch.dtype] = None, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: .. py:method:: make_graph(positions: torch.Tensor, cell: torch.Tensor, cutoff: float, pbc: torch.Tensor, dtype: torch.dtype = None, batch_idx: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor] :staticmethod: Create the graph-edges from the positions and cell. :param positions: The positions of the atoms. :type positions: torch.Tensor :param cell: The cell of the system. :type cell: torch.Tensor :param cutoff: The cutoff radius for the edges. :type cutoff: float :param pbc: The periodic boundary conditions. :type pbc: torch.Tensor :param dtype: The data type of the output. :type dtype: torch.dtype :returns: * **edge_index** (*torch.Tensor*) -- The edge index tensor. * **shift_vectors** (*torch.Tensor*) -- The shift vectors tensor. .. py:method:: clear_graph() -> None Clear the graph removing all edges :rtype: None .. py:method:: __len__() -> int Return the number of atoms in the graph. :returns: **n_atoms** -- The number of atoms in the graph. :rtype: int .. py:property:: cell :type: torch.Tensor Return the canonical cell matrix of the graph. :returns: **cell** -- The cell matrix of shape ``(3, 3)``. :rtype: torch.Tensor .. py:property:: frac :type: torch.Tensor Return the fractional coordinates of the positions :returns: **frac** -- The fractional coordinates of the atoms. :rtype: torch.Tensor .. py:method:: frac_to_pos(f: torch.Tensor) -> torch.Tensor Fraction -> Cartesian coordinates. Convert fractional coordinates to cartesian coordinates. :param f: The fractional coordinates. :type f: torch.Tensor :returns: **r** -- The cartesian coordinates. :rtype: torch.Tensor .. py:method:: pos_to_frac(r: torch.Tensor) -> torch.Tensor Cartesian -> Fractional coordinates. Convert cartesian coordinates to fractional coordinates. :param r: The cartesian coordinates. :type r: torch.Tensor :returns: **f** -- The fractional coordinates. :rtype: torch.Tensor .. py:property:: positions_mask :type: torch.Tensor Return the mask of the positions that are fixed. True for fixed atom-positions and else false. :returns: **mask** -- The mask of the positions that are fixed. :rtype: torch.Tensor .. py:property:: time :type: torch.Tensor Return the time of the graph. :returns: **time** -- The time of the graph. :rtype: torch.Tensor .. py:property:: representation :type: Optional[Representation] Return the representation of the graph. :returns: **representation** -- The representation of the graph, or ``None`` if not set. :rtype: Optional[Representation] .. py:method:: wrap_positions() -> None Wrap the positions of the atoms to the unit cell. :rtype: None .. py:method:: apply_mask(x: torch.Tensor, val: float = 0.0) -> torch.Tensor Apply the mask to the tensor x. :param x: The tensor to apply the mask to. :type x: torch.Tensor :param val: The value to set the masked values to. :type val: float :returns: **x** -- The tensor with the mask applied. :rtype: torch.Tensor .. py:property:: confinement :type: torch.Tensor Return the confinement of the graph. :returns: **confinement** -- The confinement of the graph. :rtype: torch.Tensor .. py:property:: cellpar :type: torch.Tensor Return the cell parameters of the graph. .. py:method:: _is_lower_triangular(cell: torch.Tensor) -> bool :staticmethod: Return True if *cell* is in canonical lower-triangular form. A cell matrix is considered canonical when the three strictly upper-triangular entries (cell[0,1], cell[0,2], cell[1,2]) are all zero (within a tight floating-point tolerance of 1e-10). :param cell: The cell matrix. :type cell: torch.Tensor :returns: True if the cell is already lower-triangular. :rtype: bool .. py:method:: cell_to_vectors(cell: torch.Tensor) -> torch.Tensor :staticmethod: Convert cell matrix to cell parameters. :param cell: The cell matrix of shape ``(N, 3)`` or ``(N, 3, 3)``. :type cell: torch.Tensor :returns: The cell parameters of shape ``(N, 6)``. :rtype: torch.Tensor .. py:method:: vector_to_cell(cellpar: torch.Tensor) -> torch.Tensor :staticmethod: Convert cell parameters to cell matrix. :param cellpar: The cell parameters of shape ``(N, 6)``. :type cellpar: torch.Tensor :returns: The cell matrix of shape ``(N, 3, 3)`` where each row is a lattice vector. :rtype: torch.Tensor .. py:class:: Representation Representation class A simple container holding the scalar (l=0) and vector (l=1) equivariant representations produced by the backbone network. Both fields are optional so that the class can also be used for partial representations. Registered as a ``torch.utils._pytree`` node so that ``torch.compile`` can traverse instances transparently without introducing graph breaks. :param scalar: Per-node scalar features of shape ``(n_nodes, n_features, 1)``. Default is ``None``. :type scalar: Optional[torch.Tensor] :param vector: Per-node vector features of shape ``(n_nodes, n_features, 3)``. Default is ``None``. :type vector: Optional[torch.Tensor] .. py:attribute:: scalar :type: Optional[torch.Tensor] :value: None .. py:attribute:: vector :type: Optional[torch.Tensor] :value: None .. py:method:: to_tensor(n_graphs: int) -> Tuple[torch.Tensor, torch.Tensor] Serialise scalar and vector tensors into a single flat representation. Concatenates ``scalar`` and ``vector`` (when present) along the feature dimension. Returns the concatenated tensor together with per-graph slice boundaries and degree values so that :meth:`from_tensor` can reconstruct the original fields. :param n_graphs: The number of graphs in the batch. The slice and degree tensors are repeated once per graph so they can be stored as graph-level attributes. :type n_graphs: int :returns: * **tensor** (*torch.Tensor*) -- Concatenated representation of shape ``(n_nodes, total_features)``. * **slices** (*torch.Tensor*) -- Cumulative slice boundaries of shape ``(n_graphs, n_parts + 1)``. * **ls** (*torch.Tensor*) -- Degree values of shape ``(n_graphs, n_parts)``. .. py:method:: from_tensor(tensor: torch.Tensor, slices: torch.Tensor, ls: torch.Tensor) -> Representation :classmethod: Reconstruct a :class:`Representation` from a flat serialised form. :param tensor: Flat representation of shape ``(n_nodes, total_features)``. :type tensor: torch.Tensor :param slices: Cumulative slice boundaries of shape ``(n_graphs, n_parts + 1)``. :type slices: torch.Tensor :param ls: Degree values of shape ``(n_graphs, n_parts)``. :type ls: torch.Tensor :rtype: Representation .. py:class:: Dataset(batch_size: int = 32, n_train: Union[float, int] = 0.9, n_val: Union[float, int] = 0.1, n_test: Union[float, int] = 0.0, shuffle: bool = True, properties: List[str] = ['energy', 'forces'], cutoff: float = 6.0, phase_transforms: Optional[List[List[torch_geometric.transforms.BaseTransform]]] = None, num_workers: int = 0, **kwargs) Bases: :py:obj:`lightning.LightningDataModule` Defines a custom dataset for AtomsGraph data :param batch_size: The batch size for the DataLoader :type batch_size: int :param n_train: The number of training samples. If float, it is interpreted as a fraction of the dataset size :type n_train: Union[float, int] :param n_val: The number of validation samples. If float, it is interpreted as a fraction of the dataset size :type n_val: Union[float, int] :param n_test: The number of test samples. If float, it is interpreted as a fraction of the dataset size :type n_test: Union[float, int] :param shuffle: Whether to shuffle the dataset :type shuffle: bool :param properties: The properties to include in the dataset. Can be "energy", "forces", or both :type properties: List[str] :param cutoff: The cutoff radius for the neighbor list :type cutoff: float :param phase_transforms: The data augmentation transforms to apply to each training phase :type phase_transforms: Optional[List[List[BaseTransform]]] :rtype: Dataset .. py:attribute:: batch_size :value: 32 .. py:attribute:: n_train :value: 0.9 .. py:attribute:: n_val :value: 0.1 .. py:attribute:: n_test :value: 0.0 .. py:attribute:: properties :value: ['energy', 'forces'] .. py:attribute:: cutoff :value: 6.0 .. py:attribute:: dataset :value: None .. py:attribute:: train_idx :value: None .. py:attribute:: val_idx :value: None .. py:attribute:: test_idx :value: None .. py:attribute:: phase_transforms :value: None .. py:attribute:: num_workers :value: 0 .. py:attribute:: regressor_dataset :value: None .. py:attribute:: regressor_train_loader :value: None .. py:method:: add_atoms_data(data: List[ase.Atoms], mask_method: Optional[str] = None, confinement: Optional[Tuple[float, float]] = None, properties: Optional[List[Dict]] = None, canonical_cell: bool = False) -> None Add ASE data to the dataset Converts a list of ASE Atoms objects to AtomsGraph objects and adds them to the dataset :param data: A list of ASE Atoms objects :type data: List[Atoms] :param mask_method: Method for computing the atom mask (e.g. ``"MaskFixed"``). :type mask_method: str, optional :param confinement: Z-axis confinement bounds ``(z_min, z_max)`` applied to every structure. :type confinement: Tuple[float, float], optional :param properties: Per-structure property dictionaries; each entry is mapped to the corresponding graph via :func:`setattr`. :type properties: List[Dict], optional :param canonical_cell: When ``True`` (the default), cells are stored in canonical lower-triangular form. Set to ``False`` to store cells exactly as provided by ASE. :type canonical_cell: bool, optional :rtype: None .. py:method:: add_graph_data(data: List[agedi.data.atoms_graph.AtomsGraph]) -> None Add AtomsGraph data to the dataset Adds a list of AtomsGraph objects to the dataset :param data: A list of AtomsGraph objects :type data: List[AtomsGraph] :rtype: None .. py:method:: add_regressor_data(data: List[ase.Atoms], canonical_cell: bool = False) -> None Add atoms data that will be used exclusively for regressor training. Structures in this dataset are only used to train the regressor model (e.g. force-field heads) and are never passed through the diffusion loss. This allows the regressor to learn from non-equilibrium structures that would be unsuitable as diffusion training targets. Energy and forces are read from the ASE calculator attached to each :class:`~ase.Atoms` object when available. :param data: A list of ASE :class:`~ase.Atoms` objects, each with an attached calculator that provides energy and forces. :type data: List[Atoms] :param canonical_cell: When ``True``, cells are stored in canonical lower-triangular form. Defaults to ``False``. :type canonical_cell: bool, optional :rtype: None .. py:method:: setup(stage: Optional[str] = None) -> None Set up train/validation/test splits and initialise data loaders. Performs a random split of the dataset (if not already split) and calls :meth:`set_phase` to create the initial data loaders. :param stage: Lightning stage identifier (``"fit"``, ``"test"``, etc.). Not used internally; present for API compatibility. :type stage: str, optional .. py:method:: train_dataloader() -> torch_geometric.loader.DataLoader Get the training DataLoader Returns a DataLoader for the training dataset. When a separate regressor dataset has been added via :meth:`add_regressor_data`, a :class:`~lightning.pytorch.utilities.CombinedLoader` is returned so that each training step receives both a regular batch (key ``"main"``) and a regressor-only batch (key ``"regressor"``). :rtype: DataLoader or CombinedLoader .. py:method:: val_dataloader() -> torch_geometric.loader.DataLoader Get the validation DataLoader Returns a DataLoader for the validation dataset :rtype: DataLoader .. py:method:: test_dataloader() -> torch_geometric.loader.DataLoader Get the test DataLoader Returns a DataLoader for the test dataset :rtype: DataLoader .. py:method:: set_phase(phase: int) -> None Switch the dataset to the given training phase. Applies the phase-specific transforms to the dataset splits and re-creates the data loaders with the augmented data. :param phase: Zero-based phase index. Phase 0 uses the original data; subsequent phases append transformed copies according to ``phase_transforms[phase]``. :type phase: int .. py:method:: _check_confinement(dataset: List[agedi.data.atoms_graph.AtomsGraph], confinement: Tuple[float, float]) -> None Check that all unmasked atoms in *dataset* lie within *confinement*. :param dataset: The list of graphs to validate. :type dataset: List[AtomsGraph] :param confinement: The ``(z_min, z_max)`` confinement bounds. :type confinement: Tuple[float, float] :raises ValueError: If any unmasked atom has a Z position outside the confinement. The error message includes a suggested confinement that covers all unmasked atoms. .. py:method:: _has_energy_forces(atoms) Check if the given ASE Atoms object has energy and forces information available. This method checks if a calculator is attached to the Atoms object and if it contains the 'energy' and 'forces' properties in its results. It avoids a calculation if there is a calculator, but it has not yet been used. :param atoms: The ASE Atoms object to check for energy and forces information. :type atoms: Atoms :returns: A tuple indicating whether energy and forces information is available, respectively. :rtype: Tuple[bool, bool]