# Runtime Backends and Integrations

## Neighbor-List Backends

`build_neighbor_list` normalizes backend-specific neighbor-list data into
`NeighborListData`.

- `ase` uses `ase.neighborlist` and is always available with the base
  dependency set.
- `vesin` is optional and can be installed with `.[vesin]`.
- `auto` prefers `vesin` when importable, then falls back to ASE.
- `metatomic` neighbor lists are consumed when running through the metatomic
  adapter.

ASE currently supports full lists in this package. Three-body terms also require
full neighbor lists because source-centered neighbor pairs must be visible.

## Native Three-Body Kernels

The default installation is pure Python and Torch. Native C++ and CUDA
extensions are optional and are built only when requested:

```sh
UFP_BUILD_NATIVE=1 python setup.py build_ext --inplace
UFP_BUILD_NATIVE=1 UFP_BUILD_CUDA=1 python setup.py build_ext --inplace
```

At runtime, high-level helpers use `auto` backend selection and fall back to
Torch when native kernels are unavailable. Set environment variables when you
need explicit control:

- `UFP_THREEBODY_BACKEND` controls dynamic three-body evaluation.
- `UFP_THREEBODY_BUCKET_BACKEND` controls source bucketing.
- `UFP_THREEBODY_LSTSQ_BACKEND` controls least-squares assembly.

### Three-Body Runtime Map

Three-body execution deliberately keeps several specialized paths instead of a
single generic visitor. The specialization keeps dispatch and cache decisions
outside the innermost spline and matrix loops.

```text
UFPInput with full neighbor list
  -> SplineThreeBodyTerm._bucket_triplets()
  -> preprocess_sources_native_or_torch()
  -> Buckets plus optional tensor pattern plans
  -> dynamic evaluation, feature-cache construction, or least-squares assembly
```

Source bucketing starts from supported center-neighbor pairs, then groups rows by
source atom and neighbor-category pattern. `UFP_THREEBODY_BUCKET_BACKEND` accepts
`auto`, `native`, `python`, or `tensor`; `torch` is accepted as an alias for
`python`. The native source-preprocessing backend is CPU-only. In `auto`, CUDA
inputs use the Python/Torch path for bucketing, while CPU inputs use native
preprocessing only when the extension and dtype/device contract are available.

Dynamic energy/force evaluation is selected by `UFP_THREEBODY_BACKEND` or the
resolved `ThreeBodyRuntimeConfig`. `auto` tries the native operator when the
optional extension is registered, the spline and dtype are supported, the device
has a kernel, and the tensors are not participating in autograd. Otherwise it
falls back to the Torch evaluator. Explicit `native` raises when those contracts
are not met; explicit `torch` bypasses native dispatch.

Feature caches use the same bucket representation but have their own storage
policy. `feature_cache_mode="auto"` loads a compatible disk cache when one is
available and builds one otherwise. `read` requires a compatible disk cache and
raises if none is found. `refresh` rebuilds and overwrites the settings-named
cache entry. CPU feature caches are held as dense Torch blocks; disk feature
caches are loaded as `.npy` memmaps through the V2 manifest format.

Dense and memmap cache compatibility is checked in `ufp.terms._threebody_cache`
against metadata that includes the input geometry signature, atomic/triplet
categories, coefficient shape, active triplets, spline support parameters,
row semantics, and cache schema version. A cache with a superset of active
triplets can satisfy a narrower request when the remaining metadata matches.

Native availability and fallback checks live in `ufp.terms._threebody_kernels`.
They test operator registration, device kernel availability, spline family,
dtype, autograd requirements, active-mask placement, and CPU-only constraints
for source preprocessing and dense cache construction. These checks happen at
dispatch boundaries before the hot tensor loops or native calls.

Least-squares assembly is selected separately by `UFP_THREEBODY_LSTSQ_BACKEND`
or `LinearFitter(threebody_lstsq_backend=...)`. `auto` can use the native/CUDA
assembly operator when supported and otherwise uses Torch assembly. The cache
metadata written by `LinearFitter` records both the least-squares assembly
backend and the bucket backend so assembled-batch caches are invalidated when
backend choices change.

## ASE

`UFPASECalculator` exposes UFP models through the ASE calculator interface. It
is the simplest integration path for prediction, relaxation, and example
workflows.

## Metatomic

`wrap_atomistic_model` and `UFPMetatomicModule` convert between metatomic
systems, metatensor outputs, and UFP tensor inputs. Optional imports are guarded
so the base package can import without metatomic installed.

This adapter is a Python integration path for prototyping, torch-sim use, and
tests. Production LAMMPS export uses the dedicated UF2+3 exporter instead:

```python
from ufp.adapters.metatomic_export import export_uf23_checkpoint

export_uf23_checkpoint(
    "best.pt",
    model_factory=build_model_architecture,
    output_path="exported-uf23.pt",
    collect_extensions="torch-extensions",
)
```

The checkpoint must contain a `state_dict` or `model_state_dict`; the model
architecture is rebuilt by `model_factory` before loading weights. Legacy
single-element workflow checkpoints with `onebody_energy` are handled when the
factory returns either a compatible one-body term or a single-element interaction
model.

Install optional metatomic dependencies with `ufp[metatomic]`. For CUDA UF2+3
production runs with three-body terms, build the native extension with:

```sh
UFP_BUILD_NATIVE=1 UFP_BUILD_CUDA=1 python setup.py build_ext --inplace
```

The first LAMMPS target is NVE/NVT molecular dynamics with total `energy` and
direct `non_conservative_forces`. Direct stress/virial output is not implemented
yet, so production NPT workflows should wait for validated stress support.

Single-element LAMMPS usage maps the atom type to the exported atomic number:

```text
pair_style metatomic exported-uf23.pt device cuda extensions ./torch-extensions non_conservative on
pair_coeff * * 74
```

For multi-element exports, list one atomic number per LAMMPS atom type in the
same order used by the data file:

```text
pair_coeff * * 6 8
```

## Torch-Sim

`build_torchsim_model` prefers the metatomic-backed torch-sim path when
`metatomic-torchsim` is installed. An ASE-backed fallback is available for
debugging and compatibility, but it should not be treated as the high-performance
or fully differentiable path for large simulations.