# Workflow Management This page collects the non-hot-path orchestration pieces used by examples and small studies. These helpers make staged experiments reproducible; they are not alternate model-evaluation kernels. ## Stages `ufp.workflows` exposes small stage objects for explicit scripts: - `ProjectStage` runs an offline projection helper and stores its result in the context mapping; - `LinearFitStage` wraps `LinearFitter`; - `ResidualizeStage` materializes residual labels; - `TrainStage` runs gradient training, optionally with coefficient freezes; - `ValidateStage` records evaluation metrics. Stages declare required inputs, produced outputs, and metadata, but users still own the sequence and context updates: ```python from ufp.workflows import LinearFitStage, workflow_stage_metadata context = {"model": model, "fit_samples": samples} stage = LinearFitStage(fit_kwargs={"batch_size": 16}) result = stage.run(context) result.update_context(context) workflow_stage_metadata([result], name="pair-refit") ``` ## Checkpoints Use workflow checkpoints when a staged script needs enough metadata to validate that a later reload matches the model layout: ```python from ufp.workflows import save_workflow_checkpoint save_workflow_checkpoint( "workflow.pt", model, fit_blocks=fit_selectors, freeze_blocks=freeze_selectors, stage_metadata=stage_metadata, validation_metrics=metrics, ) ``` Checkpoints include package version, model and term metadata, coefficient layout, selector metadata, fixed-coefficient hashes, stage metadata, projection diagnostics, validation metrics, user metadata, and the model `state_dict`. ## Residualization `materialize_residual_dataset()` writes residual energy, force, or stress labels into an `ASEAtomsDataset`. Use it when a longer training run should subtract frozen priors or fixed spline blocks once, then optimize on the residual labels. Residual metadata records selectors, target weights, units, frozen-term state hashes, and optional projection metadata so stale residual data can be rejected. ## Prepared Geometry `ufp.workflows.prepared` can materialize tensorized geometry, neighbor lists, pair categories, optional triplet-cache metadata, and strict source signatures. It is intentionally imported directly from `ufp.workflows.prepared` rather than exported from top-level `ufp.workflows`. Prepared geometry is useful for cache-reuse experiments and workflow validation. It is not a runtime input path for model evaluation, and it should not acquire hot-path checks or tensor transformations that belong inside terms. ## Caching Large least-squares or three-body studies can write assembled batches, normal-equation components, CG checkpoints, and dense feature caches. Cache manifests include enough metadata to reject incompatible sample sets, target weights, dtypes, layouts, coefficient selections, fixed-coefficient values, and regularization semantics. Use `ufp.cache` for settings-addressed cache identities and human-readable cache summaries. Top-level `ufp` convenience exports expose the same common helpers for scripts. `ufp.workflows.cache` is a compatibility alias for older workflow code; it is not the owner of cache identity policy. Use disk-backed caches for repeated solves over fixed geometries. Prefer ordinary in-memory assembly for small models, early debugging, and one-off experiments. ## Regularization Tuning `ufp.workflows.regularization` adds a reusable layer for choosing linear least-squares ridge weights. It first estimates a scale from the weighted design matrix, $$ \lambda_g = \alpha \frac{\operatorname{trace}(\mathbf A_g^\mathsf T \mathbf A_g)} {n_g}, $$ then searches log-spaced candidates for `ridge`, `onebody_ridge`, `twobody_ridge`, and `threebody_ridge`. Pair and triplet counts are useful diagnostics, but the default is based on design-block scale because that is what sets the data curvature seen by each coefficient group. ```python from ufp.workflows import ( RegularizationSearchConfig, save_workflow_checkpoint, tune_linear_regularization, workflow_stage_metadata, ) search = tune_linear_regularization( make_model, dataset, config=RegularizationSearchConfig( stage_subset_sizes=(64, 256), cache_directory="regularization-cache", refit_full=True, ), fitter_kwargs={ "fit_energy": True, "fit_forces": True, "solver": "normal_equation_direct", "dtype": dtype, }, fit_kwargs={"batch_size": 64}, ) stage_metadata = workflow_stage_metadata( [search.metadata], name="regularization-search", ) save_workflow_checkpoint( "regularized-workflow.pt", search.final_model, stage_metadata=stage_metadata, validation_metrics=search.metadata, ) ``` When no validation split is present, tuning carves a deterministic validation subset from the training indices and leaves holdout indices untouched. Candidate fits use isolated models from `model_factory`, so search trials do not mutate a caller-owned model.