# Architecture Ownership

This page describes module ownership and dependency direction for contributors.
It is a refactor guardrail: performance-sensitive code is allowed to keep local
specialization when that preserves measured behavior.

## Package Boundaries

- `ufp.core` owns normalized inputs, outputs, potentials, device and dtype
  movement, and small disk-cache primitives. Other packages may depend on
  `ufp.core`; `ufp.core` should not depend on model terms, training, adapters,
  or workflows.
- `ufp.cache` owns public settings-addressed cache identity helpers. It may
  delegate low-level digest and name formatting to `ufp.core._disk_cache`, but
  workflow modules should not define competing cache namespaces.
- `ufp.neighbors` owns neighbor-list construction and conversion to
  `NeighborListData`. It is the boundary between ASE, `vesin`, `metatomic`, and
  the tensor layout consumed by `UFPInput`.
- `ufp.splines` owns spline basis and stencil math. Term modules and fitting
  assembly may call these helpers directly, but spline modules should remain
  independent of model terms.
- `ufp.terms` owns additive energy terms and their optional extension hooks.
  Public term classes live in literal modules such as `onebody.py`,
  `twobody.py`, and `threebody.py`; shared implementation details stay in
  underscore-prefixed modules.
- `ufp.leastsquares` owns parameter layouts, design-matrix assembly, cache IO,
  normal equations, and linear solve orchestration. It may know about built-in
  terms for optimized assembly, but shared containers live in dependency-light
  modules such as `ufp.leastsquares._types`.
- `ufp.coefficients` owns coefficient-channel discovery, compatibility checks,
  copying, zeroing, and staged-fit selector aliases. It may use least-squares
  layout metadata to identify blocks, but it should not own solvers or assembly
  code.
- `ufp.projection` owns offline projection of analytic priors and existing
  spline channels into ordinary coefficient tensors. Projection is a model
  preparation tool; runtime term evaluation should not depend on projection
  modules.
- `ufp.training` owns ASE-oriented datasets, batches, losses, metrics, and
  gradient training loops. It may warm term caches through the term contract but
  should not duplicate least-squares assembly policy.
- `ufp.adapters` owns simulation-engine integration. Adapter modules should keep
  engine-specific objects at the boundary and pass normalized tensors or UFP
  models inward.
- `ufp.workflows` owns high-level data, model, prediction, training, and runtime
  selection helpers. Workflows may combine lower layers but should not become
  the owner of kernels, cache formats, or term internals.
- `ufp.analysis` and `ufp.benchmarks` own plotting, diagnostics, and benchmark
  harnesses. They may depend on the rest of the package but should not be
  imported by runtime hot paths.

## Dependency Direction

The usual dependency direction is:

```text
core + neighbors
  -> splines
  -> terms
  -> leastsquares + training
  -> coefficients + projection
  -> adapters + workflows
  -> examples + benchmarks
```

Private modules can be shared within a package when they clarify ownership, but
cross-package imports should prefer public modules or explicit contract modules.
For custom terms, import extension types from `ufp.terms.contracts` instead of
`ufp.terms._base` or `ufp.terms._parameters`.

## Public And Internal APIs

- Stable user-facing APIs are exported from package `__init__.py` files and
  documented in `docs/api.md`.
- Extension-facing APIs that are not normal user entry points live in explicit
  contract namespaces such as `ufp.terms.contracts`; spline stencils,
  derivative rows, and fitting helpers in `ufp.splines` are also supported for
  term and fitting extensions.
- Underscore-prefixed modules are implementation details unless a contract page
  explicitly says otherwise.
- `ufp.workflows.cache` is a compatibility alias for `ufp.cache`, not a second
  public cache identity namespace.
- Cache data classes, memmap layout helpers, and kernel dispatch functions are
  internal to their owning package.
- `ufp.coefficients.CoefficientSelector` is an additive public alias of
  `ufp.leastsquares.CoefficientSelector`; keep both imports working because
  selectors are used by least squares, coefficient interchange, and training
  freeze helpers.

UFP uses these API stability tiers:

| Tier | Policy |
| --- | --- |
| Stable user API | Normal user imports intended to remain compatible across minor releases. |
| Extension API | Supported hooks for third-party/custom terms and fitting integrations. |
| Expert diagnostics | Importable debugging, profiling, benchmark, or inspection helpers. These remain supported for their documented use, but they may expose lower-level implementation details. |
| Experimental workflow API | High-level orchestration helpers for examples and studies that can change faster than the core model and term APIs. |
| Compatibility-only | Transition and compatibility tools, including `compat/`, that preserve migration paths without becoming new model-building APIs. |
| Internal/private | Underscore modules, runtime dispatch helpers, cache layouts, and hot-path implementation details unless explicitly documented otherwise. |

`ufp.benchmarks` is an expert benchmark API: it is installed and importable for
speed gates and performance comparisons, but runtime packages should not depend
on it. Three-body bucket/evaluator helpers exposed from `ufp.terms.threebody`
are expert diagnostics for benchmark and debugging workflows, not normal model
assembly entry points.

## Hot Paths

Treat the following as performance-sensitive boundaries:

- `UFPInput` pair, category, and atomic metadata caches;
- pair and two-body spline evaluation;
- three-body bucketing, evaluator dispatch, dense and sparse feature caches, and
  memmap cache loading;
- least-squares assembly, normal-equation construction, and block-matrix
  matvec/rmatvec helpers;
- cached ASE training batches and cache warming.

Refactors in these areas should preserve dispatch decisions outside inner loops,
avoid extra tensor materialization, preserve dtype and device metadata, and run
the relevant speed-gate or benchmark checks listed in `docs/benchmarks.md`.

## Least-Squares Setup Split

`LinearFitter` is the orchestration entry point, but setup-time responsibilities
are separated from assembled-matrix and solver hot paths:

```text
model + fitter options
  -> ParameterLayout
  -> selection plan and checkpoint/write-back metadata
  -> cache write/projection plan
  -> assembled batches or cached batches
  -> BlockLinearProblem.solve()
```

Selection planning lives in `ufp.leastsquares._setup`. It resolves fit/freeze
selectors once during `LinearFitter.__init__`, stores primitive metadata such as
selected block ids and compact size, and provides checkpoint signatures plus
selected-vector read/write helpers. Cache layout planning remains in
`ufp.leastsquares._cache_layout`, normal-equation cache construction remains in
`ufp.leastsquares._normal_equations`, and block matrix operations remain in
`ufp.leastsquares._block`.

Do not pass setup objects into repeated row assembly, `BlockLinearProblem`
`matvec`/`rmatvec`, normal-equation accumulation, or CG loops. Those paths should
continue to consume concrete tensors, compact block matrices, primitive slices,
and locally-bound functions. Additional split points should follow the same
rule: normalize or plan before assembly starts, then unpack the result before
the measured loop.

## Intentional Duplication

Small duplicated loops or validation branches are acceptable when they keep a
hot path direct, avoid a tensor allocation, or preserve TorchScript and native
dispatch constraints. Before removing duplication in a hot path, identify the
benchmark or speed-gate that protects it and compare behavior before and after
the change.

Prefer shared helpers for metadata hashing, public contract validation, docs,
configuration parsing, and non-hot-path orchestration. Prefer local specialized
code for kernel-adjacent tensor layout and high-frequency matrix operations.

## Known Technical Debt

The least-squares package still reaches into selected term-private assembly
helpers for optimized design-matrix construction. That dependency is deliberate
for now because moving it would touch performance-sensitive fitting paths.
Treat it as technical debt to document and measure before refactoring, not as a
cleanup candidate for onboarding-only passes.