Benchmark Ownership¶
ufp.benchmarks is an expert benchmark API for speed gates, benchmark
automation, and performance investigations. Benchmarks are not part of default
CI, but they are the acceptance layer for refactors that may affect runtime
behavior. Smoke tests in
tests/speed/test_benchmarking.py protect entry points and result shape; benchmark
runs protect relative timing.
Quick Smoke Checks¶
Run these after changing benchmark modules or public benchmark exports:
python -m pytest tests/speed/test_benchmarking.py
Run speed gates after touching three-body evaluation, least-squares assembly, block matrices, or cache warming:
python -m pytest tests/speed
tox -e speed
Area To Benchmark Map¶
Refactor area |
Required checks |
|---|---|
Pair and two-body term evaluation |
Two-body tests, training tests that use pair terms, and speed gates. |
Three-body bucketing or evaluator dispatch |
|
Three-body feature caches or memmap caches |
Three-body cache reuse tests, training cache tests, speed gates, and a three-body cache benchmark comparison. |
Least-squares assembly or block matrices |
Least-squares periodic tests, alchemical tests, speed gates, and least-squares-vs-training benchmark comparison. |
Training batch caching |
Training tests, workflow example tests, and speed gates. |
Runtime backend option parsing |
Three-body tests, least-squares periodic tests, benchmark smoke tests, and explicit environment override tests. |
Examples and docs only |
Docs build or targeted example tests; no benchmark is required unless executable workflow code changes. |
Benchmark Commands¶
Least-squares versus training toy benchmark:
python -m ufp.benchmarks._leastsquares_vs_training --scenario triangle_pair_threebody --device cpu --dtype float64 --training-epochs 4 --cg-checkpoints 1,2,3,4
Named A/B checkpoints for least-squares and training:
python -m ufp.benchmarks._leastsquares_vs_training --scenario pair_only --checkpoint baseline --checkpoint cached_neighbor_lists --device cpu
Three-body dynamic and cache benchmarks currently expose Python entry points. Use a short script when comparing backends or refactors:
from ufp.benchmarks import (
run_threebody_cache_benchmark,
run_threebody_dynamic_breakdown_benchmark,
)
print(
run_threebody_dynamic_breakdown_benchmark(
scenario="ternary_alloy",
backend="torch",
device="cpu",
dtype="float64",
repeats=20,
warmup=5,
)
)
print(
run_threebody_cache_benchmark(
scenario="ternary_alloy",
backend="torch",
device="cpu",
dtype="float64",
repeats=20,
warmup=5,
)
)
If native C++ or CUDA kernels are part of the change, build the extension first
and repeat the relevant commands with backend="native" or CUDA devices where
available.
Acceptance Rules¶
Compare against the same machine, device, dtype, scenario, and repeat counts.
Keep correctness tests as the first gate. A faster run with changed numerical behavior is not accepted.
Treat
tests/speed/as a protected contract. Do not relax a gate as part of a refactor unless the team has agreed that the measured workload or hardware assumption changed.For CPU-only refactors, CPU benchmark parity is sufficient unless the code also changes CUDA dispatch or tensor-device movement.
For backend dispatch refactors, verify both available and unavailable native extension paths so fallback behavior remains explicit.