# Benchmark Ownership `ufp.benchmarks` is an expert benchmark API for speed gates, benchmark automation, and performance investigations. Benchmarks are not part of default CI, but they are the acceptance layer for refactors that may affect runtime behavior. Smoke tests in `tests/speed/test_benchmarking.py` protect entry points and result shape; benchmark runs protect relative timing. ## Quick Smoke Checks Run these after changing benchmark modules or public benchmark exports: ```sh python -m pytest tests/speed/test_benchmarking.py ``` Run speed gates after touching three-body evaluation, least-squares assembly, block matrices, or cache warming: ```sh python -m pytest tests/speed tox -e speed ``` ## Area To Benchmark Map | Refactor area | Required checks | | --- | --- | | Pair and two-body term evaluation | Two-body tests, training tests that use pair terms, and speed gates. | | Three-body bucketing or evaluator dispatch | `tests/terms/test_threebody_*.py`, `tests/leastsquares/test_periodic_assembly.py`, speed gates, and a dynamic three-body benchmark comparison. | | Three-body feature caches or memmap caches | Three-body cache reuse tests, training cache tests, speed gates, and a three-body cache benchmark comparison. | | Least-squares assembly or block matrices | Least-squares periodic tests, alchemical tests, speed gates, and least-squares-vs-training benchmark comparison. | | Training batch caching | Training tests, workflow example tests, and speed gates. | | Runtime backend option parsing | Three-body tests, least-squares periodic tests, benchmark smoke tests, and explicit environment override tests. | | Examples and docs only | Docs build or targeted example tests; no benchmark is required unless executable workflow code changes. | ## Benchmark Commands Least-squares versus training toy benchmark: ```sh python -m ufp.benchmarks._leastsquares_vs_training --scenario triangle_pair_threebody --device cpu --dtype float64 --training-epochs 4 --cg-checkpoints 1,2,3,4 ``` Named A/B checkpoints for least-squares and training: ```sh python -m ufp.benchmarks._leastsquares_vs_training --scenario pair_only --checkpoint baseline --checkpoint cached_neighbor_lists --device cpu ``` Three-body dynamic and cache benchmarks currently expose Python entry points. Use a short script when comparing backends or refactors: ```python from ufp.benchmarks import ( run_threebody_cache_benchmark, run_threebody_dynamic_breakdown_benchmark, ) print( run_threebody_dynamic_breakdown_benchmark( scenario="ternary_alloy", backend="torch", device="cpu", dtype="float64", repeats=20, warmup=5, ) ) print( run_threebody_cache_benchmark( scenario="ternary_alloy", backend="torch", device="cpu", dtype="float64", repeats=20, warmup=5, ) ) ``` If native C++ or CUDA kernels are part of the change, build the extension first and repeat the relevant commands with `backend="native"` or CUDA devices where available. ## Acceptance Rules - Compare against the same machine, device, dtype, scenario, and repeat counts. - Keep correctness tests as the first gate. A faster run with changed numerical behavior is not accepted. - Treat `tests/speed/` as a protected contract. Do not relax a gate as part of a refactor unless the team has agreed that the measured workload or hardware assumption changed. - For CPU-only refactors, CPU benchmark parity is sufficient unless the code also changes CUDA dispatch or tensor-device movement. - For backend dispatch refactors, verify both available and unavailable native extension paths so fallback behavior remains explicit.