Workflow Management¶
This page collects the non-hot-path orchestration pieces used by examples and small studies. These helpers make staged experiments reproducible; they are not alternate model-evaluation kernels.
Stages¶
ufp.workflows exposes small stage objects for explicit scripts:
ProjectStageruns an offline projection helper and stores its result in the context mapping;LinearFitStagewrapsLinearFitter;ResidualizeStagematerializes residual labels;TrainStageruns gradient training, optionally with coefficient freezes;ValidateStagerecords evaluation metrics.
Stages declare required inputs, produced outputs, and metadata, but users still own the sequence and context updates:
from ufp.workflows import LinearFitStage, workflow_stage_metadata
context = {"model": model, "fit_samples": samples}
stage = LinearFitStage(fit_kwargs={"batch_size": 16})
result = stage.run(context)
result.update_context(context)
workflow_stage_metadata([result], name="pair-refit")
Checkpoints¶
Use workflow checkpoints when a staged script needs enough metadata to validate that a later reload matches the model layout:
from ufp.workflows import save_workflow_checkpoint
save_workflow_checkpoint(
"workflow.pt",
model,
fit_blocks=fit_selectors,
freeze_blocks=freeze_selectors,
stage_metadata=stage_metadata,
validation_metrics=metrics,
)
Checkpoints include package version, model and term metadata, coefficient
layout, selector metadata, fixed-coefficient hashes, stage metadata, projection
diagnostics, validation metrics, user metadata, and the model state_dict.
Residualization¶
materialize_residual_dataset() writes residual energy, force, or stress labels
into an ASEAtomsDataset. Use it when a longer training run should subtract
frozen priors or fixed spline blocks once, then optimize on the residual labels.
Residual metadata records selectors, target weights, units, frozen-term state
hashes, and optional projection metadata so stale residual data can be rejected.
Prepared Geometry¶
ufp.workflows.prepared can materialize tensorized geometry, neighbor lists,
pair categories, optional triplet-cache metadata, and strict source signatures.
It is intentionally imported directly from ufp.workflows.prepared rather than
exported from top-level ufp.workflows.
Prepared geometry is useful for cache-reuse experiments and workflow validation. It is not a runtime input path for model evaluation, and it should not acquire hot-path checks or tensor transformations that belong inside terms.
Caching¶
Large least-squares or three-body studies can write assembled batches, normal-equation components, CG checkpoints, and dense feature caches. Cache manifests include enough metadata to reject incompatible sample sets, target weights, dtypes, layouts, coefficient selections, fixed-coefficient values, and regularization semantics.
Use ufp.cache for settings-addressed cache identities and human-readable
cache summaries. Top-level ufp convenience exports expose the same common
helpers for scripts. ufp.workflows.cache is a compatibility alias for older
workflow code; it is not the owner of cache identity policy.
Use disk-backed caches for repeated solves over fixed geometries. Prefer ordinary in-memory assembly for small models, early debugging, and one-off experiments.
Regularization Tuning¶
ufp.workflows.regularization adds a reusable layer for choosing linear
least-squares ridge weights. It first estimates a scale from the weighted design
matrix,
then searches log-spaced candidates for ridge, onebody_ridge,
twobody_ridge, and threebody_ridge. Pair and triplet counts are useful
diagnostics, but the default is based on design-block scale because that is what
sets the data curvature seen by each coefficient group.
from ufp.workflows import (
RegularizationSearchConfig,
save_workflow_checkpoint,
tune_linear_regularization,
workflow_stage_metadata,
)
search = tune_linear_regularization(
make_model,
dataset,
config=RegularizationSearchConfig(
stage_subset_sizes=(64, 256),
cache_directory="regularization-cache",
refit_full=True,
),
fitter_kwargs={
"fit_energy": True,
"fit_forces": True,
"solver": "normal_equation_direct",
"dtype": dtype,
},
fit_kwargs={"batch_size": 64},
)
stage_metadata = workflow_stage_metadata(
[search.metadata],
name="regularization-search",
)
save_workflow_checkpoint(
"regularized-workflow.pt",
search.final_model,
stage_metadata=stage_metadata,
validation_metrics=search.metadata,
)
When no validation split is present, tuning carves a deterministic validation
subset from the training indices and leaves holdout indices untouched. Candidate
fits use isolated models from model_factory, so search trials do not mutate a
caller-owned model.