# Fitting Workflows

UFP has two complementary fitting paths:

- direct least squares for models that are linear in their fitted coefficients;
- gradient training for general differentiable models.

This page keeps the detailed fitting examples in one place. For a shorter map
of coefficient selection, coefficient copying, projection, freezing, staged
workflows, checkpoints, and prepared-geometry caches, start with
[coefficient workflows](coefficient-workflows.md) and
[workflow management](workflow-management.md).

## Direct Least Squares

For fixed geometry, one-body and spline coefficient blocks are linear in their
parameters. UFP assembles weighted design blocks and target vectors from
`FitSample` objects. In the simplest energy-only case,

$$
\mathbf A \mathbf c \approx \mathbf y,
$$

where $\mathbf c$ is the flattened coefficient vector and $\mathbf y$ contains
target energies. Force fitting adds rows corresponding to

$$
-\frac{\partial E}{\partial \mathbf R_i},
$$

which are still linear in spline coefficients because the basis derivatives are
known once geometry is fixed.

The least-squares objective is conceptually

$$
\min_{\mathbf c}
\left\| \mathbf W(\mathbf A\mathbf c - \mathbf y) \right\|_2^2
 + \lambda \left\| \mathbf c \right\|_2^2
 + \Omega(\mathbf c),
$$

where $\mathbf W$ contains target weights, $\lambda$ is ridge regularization,
and $\Omega$ can include block-specific shape penalties.

## Regularization

`LinearFitter` supports global ridge values and term-family-specific ridge
values for one-body, two-body, and three-body blocks. Two-body spline shape
penalties can discourage rough or unphysical coefficient curves by penalizing
third finite differences, early positive slopes, initial curvature, or low first
coefficients.

These penalties operate on coefficients, not on sampled energies. They are
therefore cheap and stable, but they are priors on the representation rather
than direct physical constraints at every distance.

When a selected two-body fit covers only part of a pair channel or knot range,
third-difference shape penalties use boundary-aware partial semantics. Rows that
touch both selected and fixed coefficients still contribute selected-coefficient
terms to the compact solve. The fixed neighboring coefficients are moved to the
regularization right-hand side, so refitting a middle knot range with nonzero
outside knots preserves the same physical prior as the corresponding full-block
penalty.

### Data-Scaled Ridge Guesses

For ridge values, raw neighbor, pair, or triplet counts are only indirect
proxies. They affect the scale and conditioning of the design matrix, but the
quantity that matters for a coefficient group is the weighted block curvature:

$$
\lambda_g = \alpha \frac{\operatorname{trace}(\mathbf A_g^\mathsf T \mathbf A_g)}
{n_g},
$$

where $\mathbf A_g$ is the weighted least-squares design block for group $g$ and
$n_g$ is that group's parameter count. `ufp.workflows` exposes this as a
semi-automated starting point:

```python
from ufp.workflows import estimate_linear_ridge_scales

estimate = estimate_linear_ridge_scales(
    model,
    training_samples,
    fitter_kwargs={
        "fit_energy": True,
        "fit_forces": True,
        "solver": "normal_equation_direct",
        "dtype": dtype,
    },
    fit_kwargs={"batch_size": 64},
    subset_size=128,
    alpha=1.0e-6,
)

ridge_kwargs = estimate.candidate().as_fitter_kwargs()
```

For larger studies, use progressive regularization tuning. It estimates the
same data-scaled seed, evaluates log-spaced candidates on deterministic training
and validation subsets, then rechecks the best candidates on the full validation
split:

```python
from ufp.workflows import RegularizationSearchConfig, tune_linear_regularization

search = tune_linear_regularization(
    make_model,
    dataset,
    config=RegularizationSearchConfig(
        stage_subset_sizes=(64, 256),
        top_k_per_stage=5,
        refit_full=True,
    ),
    fitter_kwargs={
        "fit_energy": True,
        "fit_forces": True,
        "solver": "cg",
        "dtype": dtype,
    },
    fit_kwargs={"batch_size": 64},
)

best_model = search.final_model
best_ridges = search.best_candidate.as_fitter_kwargs()
```

The tuner only adjusts coefficient ridge weights in v1. It preserves any
explicit two-body shape penalty passed through `fitter_kwargs`, but does not tune
that deprecated penalty family.

## Selective Coefficient Fitting

`LinearFitter` can fit a subset of coefficients while preserving the current
values of all other fittable coefficients. Existing string selectors still
select whole blocks or body-order groups, for example `fit_blocks=["threebody"]`
or `freeze_blocks=["twobody"]`. For finer control, use `CoefficientSelector`:

```python
from ufp.leastsquares import (
    CoefficientSelector,
    LinearFitter,
    ParameterLayout,
    resolve_coefficient_selection_summary,
)

# Fit only three-body coefficients.
LinearFitter(model, fit_blocks=["threebody"])

# Keep an existing two-body term fixed and fit the remaining coefficients as a
# correction. Explicit freeze selectors take precedence over fit selectors, and
# fixed coefficients are subtracted from the least-squares target internally.
LinearFitter(model, freeze_blocks=["twobody"])

# Fit one pair channel inside a categorized two-body term. Pair selectors are
# canonicalized with the term's symmetry convention.
LinearFitter(
    model,
    fit_blocks=[CoefficientSelector(block="twobody", channel=(1, 8))],
)

# Fit one source-distinguished three-body triplet channel.
LinearFitter(
    model,
    fit_blocks=[CoefficientSelector(block="threebody", channel=(8, 1, 1))],
)

# Refit only the first four knots for one pair channel.
LinearFitter(
    model,
    fit_blocks=[
        CoefficientSelector(
            block="twobody",
            channel=(1, 8),
            coeff_slice=slice(0, 4),
        )
    ],
)

# Inspect how selectors resolve before starting a fit.
layout = ParameterLayout.from_model(model, include_frozen=True)
summary = resolve_coefficient_selection_summary(
    layout,
    fit_blocks=[
        CoefficientSelector(
            block="twobody",
            channel=(1, 8),
            coeff_slice=slice(0, 4),
        )
    ],
)
summary.selected_block_labels
summary.fit_entries[0].original_indices
summary.fit_entries[0].compact_slice
```

Pair and two-body selectors use pair channels `(Z_i, Z_j)`. Three-body and
2D-triplet selectors use source-distinguished triplet channels
`(Z_center, Z_neighbor_1, Z_neighbor_2)`, with the two neighbor entries
canonicalized into the term's symmetric order. Coefficient slices address the
trailing spline dimensions: one dimension for pair terms, three dimensions for
`SplineThreeBodyTerm`, and two dimensions for `SplineTriplet2DTerm`.

Partial refits preserve non-selected coefficients exactly and subtract their
current linear contribution from the target internally. With the default
`assembly_contract="term"`, terms that implement selected-column assembly build
only the requested compact columns. Unsupported blocks and
`assembly_contract="block"` keep the full assembly plus slicing path as the
compatibility fallback, so missing selected assemblers are a performance
limitation rather than a correctness gap.

For example, this refits four middle pair knots while keeping nonzero boundary
neighbors fixed. Legacy two-body shape penalties, when present in older scripts,
continue to use boundary-aware partial semantics, but new workflows should start
with ridge values and validation-driven tuning:

```python
from ufp.leastsquares import CoefficientSelector, LinearFitter

fitter = LinearFitter(
    model,
    fit_blocks=[CoefficientSelector(block="pair", coeff_slice=slice(2, 6))],
    pair_ridge=1.0e-8,
)
fitter.fit(samples, batch_size=32)
```

Large partial three-body refits can use the same selector syntax. The default
term assembly contract will use selected assembly when the term provides it and
will otherwise fall back to full assembly plus slicing:

```python
fitter = LinearFitter(
    model,
    fit_blocks=[
        CoefficientSelector(
            block="threebody",
            channel=(74, 74, 74),
            coeff_slice=(slice(0, 8), slice(0, 8), slice(0, 8)),
        )
    ],
    solver="cg",
    matrix_storage="column_chunked",
)
fitter.fit(samples, batch_size=8, cache_directory="cache/partial-w")
```

## Coefficient Interchange

The `ufp.coefficients` namespace provides model-surgery helpers for compatible
physical coefficient channels. They build on `CoefficientSelector`, validate
channel identity, spline family, grid, cutoff, dtype, device, active masks,
symmetry, and category ordering, then copy only compatible values.

A common staged workflow is to fit elemental pair models first, then initialize
the matching channels of a mixed model while leaving cross terms at their
template values:

```python
from ufp.coefficients import (
    clone_model_with_zeroed_coefficients,
    copy_matching_coefficients,
)

mixed = clone_model_with_zeroed_coefficients(mixed_template)
hh_report = copy_matching_coefficients(hh_pair_model, mixed)
oo_report = copy_matching_coefficients(oo_pair_model, mixed)

hh_report.copied_count
oo_report.copied_count
```

Use `copy_coefficient_channel()` when the source and target selectors differ or
when a workflow should fail unless one specific channel is compatible.

## Offline Projection

The `ufp.projection` namespace contains offline helpers for turning prior
functions or existing spline channels into coefficient blocks. Projection is a
model-construction step: it writes ordinary spline coefficients and returns
diagnostics, but it does not add projection logic to runtime term evaluation.

Use `project_radial_function()` to sample a callable pair prior and solve for a
1D spline row. Use `project_pair_prior()` when the source is one of the
analytic pair priors from `ufp.terms`; it calls the prior's
`radial_values(pair, distances)` method and attaches the prior's
`projection_metadata()` payload to `result.metadata`. Both results store
detailed sampled values on the result object and shared `ProjectionDiagnostics`
under `result.diagnostics`:

```python
import torch

from ufp import SplinePairTerm, SplineTwoBodyTerm, UFPModel
from ufp.coefficients import copy_matching_coefficients
from ufp.leastsquares import CoefficientSelector, LinearFitter
from ufp.projection import project_radial_function

hh_prior = project_radial_function(
    lambda r: 0.2 * r.square() + 0.05 * r + 0.3,
    coeff_size=8,
    full_support_start=0.0,
    cutoff=2.0,
    dtype=torch.float64,
    channel_label="pair[(1, 1)]",
)

hh_model = UFPModel(
    pair_terms=[
        SplinePairTerm(
            cutoff=2.0,
            pair=(1, 1),
            coeffs=hh_prior.coeffs.clone(),
            full_support_start=0.0,
        )
    ],
    atomic_types=[1],
)
mixed_model = UFPModel(
    pair_terms=[
        SplineTwoBodyTerm(
            cutoff=2.0,
            atomic_types=[1, 8],
            coeffs_by_pair=torch.zeros((3, 8), dtype=torch.float64),
            full_support_start=0.0,
        )
    ],
    atomic_types=[1, 8],
)

copy_matching_coefficients(hh_model, mixed_model)

LinearFitter(
    mixed_model,
    fit_blocks=[
        CoefficientSelector(
            block="twobody",
            channel=(1, 1),
            coeff_slice=slice(2, 6),
        )
    ],
    fit_forces=False,
    solver="normal_equation_direct",
    ridge=1.0e-8,
).fit(samples, batch_size=32)

hh_prior.diagnostics.summary.max_value_rmse
```

Spline-to-spline helpers such as `project_pair_to_twobody()`,
`project_triplet2d_channel()`, and `project_threebody_channel()` support
compatible channel projection between existing terms. Their diagnostics use the
same channel-level objects and include derivative-error summaries when the
projection includes derivative rows.

## Analytic Priors and Residualized Corrections

Stage-wise workflows can combine nonlinear physical priors with linear spline
corrections. Keep the three mechanisms distinct:

- `project_pair_prior()` turns an analytic pair prior into ordinary spline
  coefficients. After projection, least squares sees only a spline block.
- `HybridLinearFitter` subtracts frozen nonlinear terms on the fly, then
  delegates the remaining selected coefficient solve to `LinearFitter`.
- `materialize_residual_dataset()` writes residual energy, force, or stress
  labels into an `ASEAtomsDataset` for longer training runs.

The public analytic prior names are
`InversePowerPairPrior`, `DampedInversePowerPairPrior`,
`ExponentialRepulsionPairPrior`, `MorsePairPrior`, and `ScaledZBLPairPrior`.
`CutoffEnvelope` and the helper functions `cutoff_envelope_values()`,
`cutoff_envelope_derivatives()`, `normalize_cutoff_envelope()`, and
`apply_cutoff_envelope()` live under `ufp.terms`.

Project an analytic prior into a pair spline, then fine-tune only selected
knots:

```python
import torch

from ufp import ExponentialRepulsionPairPrior, SplinePairTerm, UFPModel
from ufp.leastsquares import CoefficientSelector, LinearFitter
from ufp.projection import project_pair_prior

prior = ExponentialRepulsionPairPrior(
    cutoff=2.0,
    atomic_types=[1],
    amplitudes_by_pair={(1, 1): 0.9},
    decay_rates_by_pair={(1, 1): 1.4},
    cutoff_envelope="none",
    trainable=False,
    dtype=torch.float64,
)
projection = project_pair_prior(
    prior,
    (1, 1),
    coeff_size=10,
    full_support_start=0.2,
)
model = UFPModel(
    pair_terms=[
        SplinePairTerm(
            cutoff=projection.cutoff,
            pair=(1, 1),
            coeffs=projection.coeffs.clone(),
            spline=projection.spline,
            full_support_start=projection.full_support_start,
        )
    ],
    atomic_types=[1],
)

LinearFitter(
    model,
    fit_blocks=[CoefficientSelector(block="pair", coeff_slice=slice(2, 6))],
    fit_forces=False,
).fit(samples, batch_size=32)

projection.metadata["prior"]["parameter_state_hash"]
```

Use `HybridLinearFitter` when the prior should stay nonlinear and frozen while
a selected spline correction is solved. This is not the same as
`LinearFitter(freeze_blocks=...)`: freeze selectors only subtract fixed linear
coefficient blocks already represented in the least-squares layout.

```python
import torch

from ufp import ExponentialRepulsionPairPrior, SplineThreeBodyTerm, UFPModel
from ufp.leastsquares import CoefficientSelector, HybridLinearFitter

prior = ExponentialRepulsionPairPrior(
    cutoff=2.0,
    atomic_types=[1],
    amplitudes_by_pair={(1, 1): 0.4},
    decay_rates_by_pair={(1, 1): 0.8},
    cutoff_envelope="none",
    trainable=False,
    dtype=torch.float64,
)
threebody = SplineThreeBodyTerm(
    cutoff=2.0,
    atomic_types=[1],
    coeffs_by_triplet=torch.zeros((1, 8, 8, 8), dtype=torch.float64),
    full_support_start_xy=0.0,
    full_support_start_z=0.0,
)
model = UFPModel(
    pair_terms=[prior],
    threebody_terms=[threebody],
    atomic_types=[1],
)

HybridLinearFitter(
    model,
    frozen_terms=[prior],
    fit_blocks=[
        CoefficientSelector(
            block="threebody",
            channel=(1, 1, 1),
            coeff_slice=(slice(0, 4), slice(0, 4), slice(0, 4)),
        )
    ],
    fit_forces=False,
).fit(samples, batch_size=16, cache_directory="cache/hybrid")
```

For longer optimizer runs, materialize residual labels once and carry strict
metadata with every sample:

```python
from ufp.workflows import materialize_residual_dataset

residual = materialize_residual_dataset(
    model,
    dataset,
    selectors=("pair",),
    targets=("energy", "forces"),
    target_weights={"energy": 1.0, "forces": 0.1},
    metadata_key="frozen_prior_residuals",
)

residual.dataset[0].metadata["frozen_prior_residuals"]["metadata_hash"]
residual.metadata["frozen_terms_state_hash"]
```

Residual metadata hashes include the selected frozen term states, selectors,
target keys, target weights, units, and optional analytic prior projection
metadata. Hybrid cache directories use the same frozen-term state hash under
`HybridLinearFitter.frozen_terms_state_hash`.

## Optimizer-Time Freezing

Gradient training can freeze the same coefficient selections by building a
mask-and-restore wrapper around a PyTorch optimizer:

```python
import torch

from ufp.leastsquares import CoefficientSelector
from ufp.training import freeze_model_coefficients

freeze_state = freeze_model_coefficients(
    model,
    [
        CoefficientSelector(
            block="twobody",
            channel=(1, 8),
            coeff_slice=slice(0, 4),
        )
    ],
)

freeze_state.affected_parameter_names
freeze_state.frozen_counts
freeze_state.selector_metadata

optimizer = torch.optim.AdamW(model.parameters(), lr=1.0e-3, weight_decay=1.0e-2)
freeze_state.wrap_optimizer(optimizer)
```

The wrapper masks gradients, restores frozen entries after every optimizer step,
and clears optimizer state entries for frozen coefficients. This prevents AdamW
weight decay and stale moment buffers from drifting frozen coefficients. Calling
`wrap_optimizer()` with the same `CoefficientFreezeState` is idempotent; applying
a different freeze state to an already wrapped optimizer raises a clear error.

## Workflow Stages and Checkpoints

The `ufp.workflows` namespace includes small stage objects for explicit
staged fitting scripts. They are wrappers around existing primitives, not a
hidden workflow runner: users still build a sequence, run each stage, and own
the context mapping between stages.

Available stage objects are `ProjectStage`, `LinearFitStage`,
`ResidualizeStage`, `TrainStage`, and `ValidateStage`. Each declares required
context keys, produced context keys, and checkpoint-ready metadata. Use
`workflow_stage_metadata()` to collect those stage records, then pass the result
to `save_workflow_checkpoint()`:

```python
import torch

from ufp.leastsquares import CoefficientSelector
from ufp.projection import project_radial_function
from ufp.training import LossWeights
from ufp.workflows import (
    LinearFitStage,
    ProjectStage,
    TrainStage,
    ValidateStage,
    save_workflow_checkpoint,
    workflow_stage_metadata,
)

fit_selector = CoefficientSelector(block="pair", coeff_slice=slice(2, 6))
freeze_selector = CoefficientSelector(block="pair", coeff_slice=slice(0, 2))

context = {"model": model, "fit_samples": samples}
stages = [
    ProjectStage(projector=project_radial_function, projector_kwargs=projection_args),
    LinearFitStage(
        fitter_kwargs={
            "fit_blocks": [fit_selector],
            "fit_forces": False,
            "solver": "normal_equation_direct",
        },
        fit_kwargs={"batch_size": 16},
    ),
    TrainStage(
        freeze_selectors=[freeze_selector],
        fit_kwargs={
            "epochs": 5,
            "loss_weights": LossWeights(energy=1.0, forces=0.0),
            "dtype": torch.float64,
        },
    ),
    ValidateStage(evaluate_kwargs={"dtype": torch.float64}),
]

results = []
for stage in stages:
    result = stage.run(context)
    result.update_context(context)
    results.append(result)

save_workflow_checkpoint(
    "workflow.pt",
    model,
    fit_blocks=[fit_selector],
    freeze_blocks=[freeze_selector],
    stage_metadata=workflow_stage_metadata(results, name="pair-refit"),
    projection_diagnostics=context["projection_result"].diagnostics,
    validation_metrics=context["validation_metrics"],
)
```

Workflow checkpoints store the package version, model and term metadata,
coefficient layout, selector metadata, fixed-coefficient hashes, stage metadata,
projection diagnostics, validation metrics, user metadata, and the model
`state_dict`. `load_workflow_checkpoint()` validates the schema and can reject
model-layout mismatches before loading parameters.

`ufp.workflows.prepared` is an advanced prepared-geometry helper for
workflow-level cache reuse. It can materialize tensorized geometry, neighbor
lists, pair categories, optional triplet cache metadata, and strict source
signatures from `UFPInput`, ASE batches, training batches, or least-squares
prepared batches. It is intentionally not exported from top-level
`ufp.workflows` and is not a runtime input path for model evaluation; import it
directly only for workflows that need reusable prepared geometry.

## Caching

Large three-body fits can spend significant time assembling design blocks.
UFP can write assembled batches, normal-equation components, and dense
three-body feature caches to disk. Cache manifests include enough metadata to
reject incompatible requests when samples, targets, dtype, backend settings, or
coefficient bases change. Assembled least-squares caches use semantic
coefficient-channel metadata for built-in one-body, pair/two-body spline, and
three-body spline terms, so a full non-alchemical cache can be streamed into a
compatible alchemical fit without writing a second adapted cache. This cache
format is intentionally not compatible with assembled caches written by older
UFP versions.

Cache directories are settings-addressed. The library derives child directory
names from a stable SHA-256 digest of the cache identity metadata, and each
written cache also contains a `cache_settings.json` summary next to the manifest.
That file is intended for humans: it records the cache kind, digest, manifest
name, schema version, and the invalidating settings used to derive the folder.
Example and workflow code can use the public `ufp.cache.CacheIdentity` API, or
the top-level `ufp.suggest_cache_directory()` helper, when it needs a
deterministic parent folder name from notebook or CLI parameters.

Training dense three-body feature caches use a separate V2 manifest with input
signatures, dtype and batch size, atomic and triplet category order, active
triplet indices, spline support/grid metadata, coefficient shape, and row
semantics. `feature_cache_mode="read"` requires an exact cache or a compatible
active-triplet superset and fails when the cache is missing or stale.
`feature_cache_mode="refresh"` intentionally rebuilds the cache. The Li-P-S
training examples use `read` by default, so first-time runs require
`--refresh-cache`.

For selective fits, cache and CG checkpoint compatibility metadata includes the
stable keys `coefficient_selection`, `fixed_coefficients_signature`, and
`regularization_semantics`.
`coefficient_selection` records the compact selected layout for each block, and
`fixed_coefficients_signature` changes when preserved coefficient values outside
the solve layout change. `regularization_semantics` changes when the meaning of
partial shape regularization changes. CG checkpoints also store the concrete
regularization weights in a `regularization` payload. Selected assembly does not
need separate correctness metadata because selected-column hooks and full
assembly plus slicing are required to produce the same compact problem.

Use disk-backed caches when repeated solves or checkpointed experiments reuse
the same fixed geometries. Prefer ordinary in-memory assembly for small models
and early debugging.

## Gradient Training

The training stack keeps ASE structures in `ASEAtomsDataset`, collates them into
`ASEAtomsBatch`, prepares `UFPInput`, and optimizes model parameters with
PyTorch. Energy, forces, and stress can contribute to the loss when labels are
available.

Training is more general than linear fitting because it can optimize nonlinear
parameters and alchemical mixing weights directly. Least squares is often faster
and more deterministic when the coefficient-linear assumption applies.