# Predictive Uncertainty UFP uncertainty workflows build Bayesian posteriors for coefficient-linear models. The posterior is exact for the flattened coefficient blocks exposed to `LinearFitter`; nonlinear parameters are treated as fixed unless the workflow first converts them into a coefficient-linear proxy. ## Linear Coefficient Posterior Use `fit_linear_uncertainty_model` when the model is ordinary coefficient-linear or when you have built a coefficient-linear proxy for a trained model: ```python from ufp.leastsquares import LinearFitter from ufp.uncertainty import fit_linear_uncertainty_model fitter = LinearFitter( model, fit_energy=True, fit_forces=True, solver="normal_equation_direct", ridge=1.0e-8, dtype=dtype, ) problem = fitter.build_problem(samples, batch_size=32, cache_directory=cache_dir) posterior = fit_linear_uncertainty_model( model, samples, fitter=fitter, problem=problem, refit_mean=False, ) posterior.save_memmap(posterior_dir) ``` Passing a prebuilt `problem` is useful when a workflow needs to inspect row counts, train an aleatoric noise head, or reuse a cached design assembly without triggering another `LinearFitter.build_problem` call. ## Sparse Prediction Rows Prediction variances use sparse rows and the dense coefficient covariance: ```python from ufp.uncertainty import ( combine_total_energy_rows, predict_with_uncertainty, variance_for_energy_row, ) prediction_a = predict_with_uncertainty( model, atoms_a, posterior, fitter=fitter, return_rows=True, ) prediction_b = predict_with_uncertainty( model, atoms_b, posterior, fitter=fitter, return_rows=True, ) delta_row = combine_total_energy_rows( [ (prediction_a.rows.total_energy_row, 1.0), (prediction_b.rows.total_energy_row, -1.0), ] ) delta_variance = variance_for_energy_row(delta_row, posterior) ``` Atomic energy and force-component variances are computed from row diagonals. UFP does not materialize an atom-by-atom covariance matrix. Use `variance_for_sparse_rows(rows, posterior, chunk_size=...)` when evaluating many atomic or force rows; it batches dense-covariance gathers without forming the full row covariance. ## Alchemical Models `fit_alchemical_uncertainty_model` freezes non-identity alchemical mixing weights, then builds one fixed-weight direct/proxy posterior. The posterior covers direct coefficients and proxy coefficients; mixing weights remain point estimates. ## Aleatoric Noise `SplineAleatoricNoiseModel` is a positive spline variance head using `softplus(raw) + variance_floor`. The V2 path uses `SplineAleatoricNoiseBundle`, which stores separate optional heads for structure energy per atom, per-atom energy decomposition, and force components. The default `AleatoricFeatureSpec(kind="log_num_atoms")` evaluates heads from `log1p(n_atoms)`, so prediction-time aleatoric variances can vary by structure size without rebuilding the least-squares design matrix. Pass an initialized bundle to `fit_linear_uncertainty_model` or `fit_alchemical_uncertainty_model`: ```python from ufp.uncertainty import SplineAleatoricNoiseBundle, SplineAleatoricNoiseModel noise_bundle = SplineAleatoricNoiseBundle( energy_per_atom=SplineAleatoricNoiseModel(...), force_component=SplineAleatoricNoiseModel(...), ) posterior = fit_linear_uncertainty_model( model, samples, fitter=fitter, aleatoric_noise_bundle=noise_bundle, aleatoric_steps=200, ) save_uncertainty_prediction_bundle( bundle_dir, model=linearized_model, posterior=posterior, aleatoric_noise_bundle=noise_bundle, ) ``` `make_predictions.py --uncertainty-bundle ...` evaluates the serialized bundle for each structure. It writes energy and per-atom aleatoric arrays by default, and force-component aleatoric arrays when `--uncertainty-forces` is supplied. The older scalar `aleatoric_variance` bundle field is still supported for backward-compatible prediction files. ## Prediction Bundles Use `save_uncertainty_prediction_bundle` to persist the model, posterior memmap, posterior layout, optional aleatoric noise bundle, optional calibration state, and manifest needed for standalone prediction: ```python from ufp.uncertainty import save_uncertainty_prediction_bundle save_uncertainty_prediction_bundle( bundle_dir, model=linearized_model, posterior=posterior, source_checkpoint=checkpoint_path, aleatoric_noise_bundle=noise_bundle, ) ``` The bundle is a model artifact, not a least-squares cache. Prediction with `examples/make_predictions.py --uncertainty-bundle bundle_dir` does not need the training-set design cache; that cache only accelerates posterior fitting. The manifest records hashes for the model checkpoint, posterior files, serialized aleatoric artifacts, and calibration files when present, so stale bundle members are rejected on load. Use `examples/inspect_uncertainty_bundle.py bundle_dir` to print schema version, posterior size/layout, aleatoric state, energy variance scale, source checkpoint metadata, and validation status. ## Variance Scaling Calibration can fit a post-hoc multiplicative energy variance scale from prediction files: ```sh python examples/calibrate_uncertainty.py \ predictions_holdout.npz \ --fit-energy-scale \ --save-scale-to-bundle path/to/uncertainty_bundle ``` When a bundle has a saved energy scale, `make_predictions.py` applies it to energy epistemic, aleatoric, total, per-atom standard-deviation, and per-atom energy variance arrays. Force variance scaling is not applied in V2; force calibration is diagnostic-only. ## Calibration After writing uncertainty-enabled prediction files, run calibration diagnostics on the split `.npz` outputs: ```sh python examples/calibrate_uncertainty.py \ examples/02-tungsten/tungsten_holdout_predictions.npz \ --plot-dir examples/02-tungsten/uncertainty_plots ``` When `examples/make_predictions.py` is run with `--uncertainty-bundle`, it prints the matching `examples/calibrate_uncertainty.py` command after writing prediction files. The calibration helper compares per-atom energy residuals with predicted per-atom energy standard deviations derived from `energy_total_variance` by default. It reports Gaussian NLL, normalized residual mean/std, empirical coverage at common nominal intervals, a calibration slope, and the correlation between absolute residual and predicted standard deviation. Use `--variance-key energy_epistemic_variance` to inspect epistemic-only calibration. Add `--include-forces` to compute the same diagnostics for `force_total_variance_components` when prediction files were written with `make_predictions.py --uncertainty-forces`. `examples/plot_prediction_density.py --with-uncertainty` can also write energy and force calibration plots next to the usual density plots for prediction files that contain uncertainty arrays. ## Minimal Alchemical Example `examples/alchemical_uncertainty_demo.py` is a small synthetic fixed-weight alchemical example. It fits an alchemical proxy posterior, saves a reusable bundle, reloads it, and verifies prediction uncertainty without requiring an external dataset. ## Li-P-S Alchemical Example `examples/05-lips/alchemical_uncertainty.py` is the real alchemical uncertainty workflow. It loads `examples/05-lips/lips_alchemical_uf23_model_lstsq.pt`, builds a fixed-weight direct/proxy posterior without rerunning ALS by default, saves a reusable bundle under `examples/05-lips/uncertainty_models/`, and prints follow-on `make_predictions.py --uncertainty-bundle ...` and `calibrate_uncertainty.py --include-forces ...` commands. Use `--max-training-structures`, `--max-prediction-structures`, and `--aleatoric-steps` for bounded smoke runs. ## Tungsten Example `examples/02-tungsten/uf23_constrained_wall_uncertainty_demo.py` demonstrates the full workflow on the constrained-wall tungsten training checkpoint. It loads `uf23_constrained_wall_training_best.pt`, converts the constrained wall into an equivalent ordinary spline pair term, fits the posterior over the resulting coefficient-linear proxy, saves a reusable uncertainty bundle under `uncertainty_models/`, and saves an ignored `.npz` summary of holdout uncertainties. The script prints the follow-on `make_predictions.py` and `calibrate_uncertainty.py` commands for full prediction-file calibration.