Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
[0.6.0] - 2026-05-18¶
Migration notes
- InferenceConfig.provenance_fault_threshold is deprecated but still accepted
as an alias for numerical_hazard_threshold in strict-provenance
construction. Finite tiny denominators are no longer folded into
fault_mask; this threshold is monitor-only metadata, not ignored, and never
contributes to bottom_mask, fault_mask, or semantic_bottom_mask unless
a consumer separately promotes that diagnostic.
- Previously exported bundles are validated under their recorded schema and
metadata, including strict_inference_schema_version,
strict_inference_exports, and any inference-output schema sidecar; they are
not silently reinterpreted under the new hardened schema.
- New hardening-release opt-ins are explicit: SimplificationMode with
simplification_mode="field_rational", experimental structural validity
factors for public FRU provenance, soft_coverage_loss(...),
lift_semantic_targets(...), monitor-only numerical_hazard_threshold
diagnostics, and explicit GaugePolicy / ProjectiveNormalize(...)
projective gauge helpers.
- New hardening-release raises/refusals are explicit: non-finite strict P/Q
payloads and validity factors bottom through isfinite into fault_mask;
bottom-capability mismatches raise unless allow_bottom_unreachable=True;
strict FRU flattening refuses configured depth/degree-bound violations rather
than using field-rational rescue; and strict cancellation refuses unsafe
symbolic factor cancellation without a proven or declared nonzero assumption.
- Most hardening-release changes are additive, but strict-mode flattening may
change masks at singular edge cases and may refuse expressions that
field-rational simplification previously accepted.
- Known workflows to re-check when upgrading to v0.6.0:
examples/fru_strict_check_demo.py, the RR-IK reference deployment, and the
DOSE matrix/artifact path. Strict-mode flattening, bottom-capability checks,
and fail-closed masks can alter singular-edge behavior; DOSE and
reference-robotics workflows already consume fault/semantic split routing in
v0.6.x and benefit from the promotion with no opt-in flag in the hardened
release.
Changed
- Made sign_consistency_loss and SCMTrainingLoss singular-only by default
when mask_singular is omitted, using abs(Y_d) <= epsilon_sing.
- Added an opt-in reduction="conditional" mode to margin_loss while
keeping the default population-style masked batch mean.
- Added a gauge-policy design-review note comparing
canonical_denominator, unit_l2_projective, and
angular_unit_circle without changing strict-inference behavior.
- Added the public GaugePolicy enum and opt-in ProjectiveNormalize helper
for canonical-denominator, unit-L2 projective, and angular unit-circle gauge
conventions.
- Recorded canonical_denominator as the default gauge policy.
- Clarified that current post-hoc tau_infer sweeps over cached |Q|
distributions remain gauge-correct for the specific head and magnitude
convention that produced those distributions.
- Added a monitor-only numerical_hazard_rate axis to
StrictInferenceMonitor / strict_inference_rates(...) and documented that
fault_rate, semantic_bottom_rate, and finite-hazard alerts should be
thresholded separately.
- Clarified that frequent fault_mask triggers from strict-flattened heads are
an implementation-hygiene signal rather than the normal semantic-bottom path,
while intentional IEEE-arithmetic heads may use fault_mask as the correct
fail-closed bottom path.
- Promoted eager strict-inference provenance outputs to stable result attributes (fault_mask, semantic_bottom_mask, and bottom_provenance) while preserving three-field unpacking as (decoded, bottom_mask, gap_mask).
- Recorded the Q2 provenance gate as superseded for the v0.6.x fault/semantic mask set; its criteria are not retroactively re-applied to the now-stable fault_mask, semantic_bottom_mask, and bottom_provenance result attributes.
- Renamed the provenance bundle metadata sidecar to inference_output_schema; the old experimental_inference_output_schema key remains a deprecated alias that validates with DeprecationWarning.
- Made bundle smoke-test parity compare decoded values only on non-bottom entries after validating bottom_mask, keeping bottom payload sentinels non-semantic.
- Added a deployment-doc warning that consumers must honor bottom_mask instead of treating decoded NaN bottom payloads as semantic data.
- Added a bad-consumer contract test covering decoded-only, bottom-mask-only, and fault-mask-only strict-inference misuse.
- Made strict inference fail closed for all non-finite P/Q payloads and route those bottoms to fault_mask.
- Made censored-direction decoding preserve fault-like bottom orientation only through explicit finite orientation side channels or direction heads, not IEEE-infinity scalar signs.
- Preserved strict SCM fracterm flattening bottom behavior by keeping default flattening uncancelled and carrying divisor denominator factors through division.
- Added the public SimplificationMode literal alias for strict SCM versus unsafe field-rational fracterm/FRU simplification modes.
- Gated field-rational fracterm simplification behind explicit unsafe simplification_mode="field_rational".
- Documented field_rational as an explicit opt-in that is unsafe for strict bottom-preserving semantics.
- Threaded fracterm/FRU simplification mode through internal flattening helpers so strict pipelines reject unsafe field-rational simplifiers before flattening starts.
- Documented and tested common-meadow anchor identities as semantic checks rather than required emitted fracterm normal forms.
- Added explicit strict-preserving target tags for pure strict Fracterm artifacts and guarded strict FlattenedFRU artifacts.
- Allowed pure strict Fracterm zero-numerator collapse when a denominator is covered by explicit nonzero domain assumptions.
- Limited strict fracterm cancellation to safe numeric constants and assumption-covered monomial factors, while guarded FRU cancellation requires retained validity factors.
[0.5.1] - 2026-04-30¶
Added
- Added a dedicated published-docs "Reproduce the Paper" page and linked it from the docs hub plus the experiments landing page.
- Added an examples inventory page that labels every file under examples/ as a quickstart, supported example, benchmark helper, or archival/experimental path.
- Added a promoted examples tutorials page that turns the maintained quickstart, coverage-control, bundle-export, and FRU strict-check scripts into docs-backed walkthroughs.
- Added examples/README.md and examples/robotics/README.md so the maintained examples, benchmark helpers, and archival robotics/Transreal paths are clearly separated in-tree.
- Added examples/deployment/ with docs-backed end-to-end workflows for bundle runtime, reference robotics fallback routing, benchmark report regeneration, and ROS 2 launch.
- Added focused docs guides for choosing tau_infer, namespace imports, masks/provenance, deployment bundles, ROS 2 integration, and visualization/report generation.
- Bootstrapped the ROS 2 companion workspace with the first ament_python package at integrations/ros2/src/zeroproofml_ros, re-exporting the stable Python strict-inference and ONNX runtime helpers for the initial node path.
- Added the ROS 2 interface package integrations/ros2/src/zeroproofml_msgs with StrictInferenceResult.msg for decoded payloads, masks, bundle/schema metadata, threshold metadata, and optional provenance fields.
- Added the first ROS 2 strict_inference_node entry point, loading one startup-selected bundle, consuming Float64MultiArray inputs, publishing StrictInferenceResult, and emitting DiagnosticArray summaries through StrictInferenceMonitor.
- Added ROS 2 strict-inference QoS presets for low-latency control loops and reliable offline/batch replay.
- Added a ROS 2 Float64MultiArray visualization telemetry topic for Foxglove/PlotJuggler-friendly strict-inference mask, provenance, and fallback metrics.
- Added ROS 2 telemetry CSV exporters that flatten strict-inference diagnostics into stable named columns for PlotJuggler-friendly replay and ops traces.
- Added ROS 2 RViz marker helpers that convert workspace heatmap summaries into MarkerArray/CUBE_LIST overlays for RR-style debugging.
- Added RR IK RViz result overlays that render the current arm pose, requested displacement, and accepted end-effector target for single-sample spatial debugging.
- Added a managed ROS 2 lifecycle_strict_inference_node variant that reuses the strict inference bundle/message contract while loading bundles in on_configure and only processing batches after activation.
- Added the first RR IK ROS 2 launch/demo path, including a packaged rr_ik_strict_inference.launch.py, default RR topic/provider YAML, a sample Float64MultiArray payload, and an rr_ik_demo runtime smoke command for reference robotics bundles.
- Added an RR IK ROS 2 graph-composition launch example that runs the strict-inference node with one-shot sample publishing plus result and telemetry echo subscribers.
- Added a DOSE offline-batch ROS 2 launch path with packaged topic/provider YAML and a sample Float64MultiArray payload using offline_batch_replay QoS.
- Added deterministic ROS 2 rosbag-export replay fixtures and golden-output tests for the RR IK strict-inference beta path.
- Added local ROS 2 container recipes under integrations/ros2/containers/ plus a small manifest for Humble/Jammy and Jazzy/Noble CPU images that preinstall ZeroProofML, onnxruntime, and a built companion colcon workspace overlay.
- Added StrictInferenceMonitor.export_state(...) so deployment monitoring can export running counts/update coverage and optional per-batch histograms alongside aggregated rates.
- Added optional structured strict-inference event logging for bottom/gap triggers, fallback routing, and acceptance-rate drift via StrictInferenceMonitor and route_to_analytic_solver(...).
- Added load_onnx_runtime_bundle(...) / ONNXRuntimeBundle so validated ONNX export bundles can be opened through onnxruntime and still return the stable (decoded, bottom_mask, gap_mask) inference contract.
- Added run_bundle_reference_smoke_test(...), a pure-Python parity check that runs a smoke sample through both a wrapped model and its exported bundle and raises on decoded/mask drift.
- Added examples/cpp/minimal_bundle_consumer.cpp, a reference ONNX Runtime C++ consumer for the current stable merged-mask bundle contract.
- Added examples/cpp/zeroproofml_bundle.hpp, a header-only C++ wrapper around stable strict-inference ONNX bundles for robotics/embedded adopters that already use ONNX Runtime from C++.
- Added strict_inference_audit.json to reference deployment runs so operators get a versioned per-batch provenance audit sidecar alongside inference_summary.json.
- Documented the current fracterm flattening utility inventory, including the expression classes supported by zeroproofml.scm.fracterm and the coefficient-list-only boundary of FractermRationalUnit.flatten(...).
- Added an experimental FRU expression library in zeroproofml.layers.fru with typed nodes, local P/Q flattening, symbolic simplification hooks, denominator provenance, and bound validation through FractermRationalUnit.flatten_expression(...).
- Added FRU equivalence tests that compare composed rational expressions against their flattened P/Q outputs, including symbolic cancellation cases.
- Added a runnable FRU strict-check demo that composes multiple rational stages, flattens them into one checked P/Q pair, and feeds that pair into strict_inference(...).
- Added an experimental downstream pipeline simulator (zeroproofml.downstream_pipeline) with configurable stage boundaries and built-in 1-step, 3-step, and 5-step variants for composability studies.
- Added built-in bad-downstream toggles to the experimental downstream pipeline simulator, covering nan_to_num, missing-flag drops, clipping, default fills, JSON/CSV round-trips, and mean reductions.
- Added built-in downstream-pipeline comparison presets for scalar-only, abstention/uncertainty, strict-SCM, and strict-SCM-plus-direction-head composability baselines.
- Documented the examples/robotics/ inventory, marking legacy RR scripts as archive-only, rr_ik_dataset.py as a benchmark-input wrapper, and the 3R/6R paths as experimental examples.
- Added a formal DOSE experiment-matrix doc that maps the head/curriculum/balancing/direction-head follow-up rows onto the existing benchmark scripts.
- Added a compact REPRODUCIBILITY.md and linked it from the README plus the docs landing page.
- Recorded the published paper artifact DOI (10.5281/zenodo.18944465) in the paper archive metadata, reproducibility docs, and README.
- Added benchmark --skip-complete-seeds and --force-rerun controls for explicit reuse of existing run directories.
- Added scripts/ci/generate_license_report.py to emit a dependency license report for built wheels, including selected extras.
- Added a short design memo for the Q1 bottom-mask provenance decision, choosing a heuristic-first experimental rollout over an immediate strict-output contract change.
- Added an opt-in experimental provenance split for strict_inference(...) and SCMInferenceWrapper, exposing heuristic fault_mask / semantic_bottom_mask diagnostics without changing the stable three-field output contract.
- Added a dedicated guide for the experimental provenance representations, including how they refine bottom_mask while leaving gap_mask as a separate stable signal.
- Added a Q2 provenance decision memo that keeps the opt-in provenance path experimental because the published review-artifact and rerun gate has not been met.
- Added a synthetic RR trajectory-evaluation dataset generator (zeroproofml.reference_robotics_trajectory_data plus scripts/generate_reference_robotics_trajectory_data.py) that stratifies trajectories by workspace quadrant and singularity proximity.
- The RR trajectory dataset generator now also writes per-stratum stress-test subset JSONs keyed by workspace quadrant and |det(J)| bin.
- Added an importable RR trajectory-evaluation helper (zeroproofml.reference_robotics_trajectory_eval) so near-singularity policies can be replayed in closed loop and summarized by start singularity bin, fallback usage, and tracking error.
- The RR trajectory evaluator summary now also reports per-step fallback-rate traces, joint-limit/chattering events, and optional latency-budget violations for rollout-level robotics metrics.
- Added 2D/3D strict-inference mask-map helpers for scattering bottom/gap decisions over workspace or sample coordinates.
- Added plot_workspace_rate_heatmaps(...) to the experimental viz helpers so RR-style workspace points can be binned into bottom/gap/fallback heatmaps, with provenance-aware coloring for fault-vs-semantic bottom/fallback regions when available.
- Added plot_route_to_solver_overlay(...) and plot_detj_stratified_metrics(...) to the experimental viz helpers so robotics fallback routes/rejects and |det(J)|-bucketed rollout metrics can be visualized directly.
- Added fallback-route timeline and per-batch monitoring-summary viz helpers, with provenance tag series when batch artifacts include fault/semantic route or monitor breakdowns.
- Added provenance-aware coloring to plot_denominator_hist(...) so denominator histograms can split finite, fault, and semantic samples while retaining tau_infer / tau_train overlays.
- Added plot_tau_infer_sweep(...) and plot_tau_train_sweep(...) to the experimental viz helpers for threshold-sweep curves, including aggregated mean/std sweep artifacts.
- Added plot_safety_accuracy_pareto(...) to the experimental benchmark viz helpers for higher-is-better safety vs accuracy trade-off fronts.
- Added plot_confusion_matrix(...) and plot_categorical_reliability(...) to the experimental benchmark viz helpers for categorical-head diagnostics.
- Added a versioned zeroproofml.metric_log JSONL schema for trainer/eval metric logs, including canonical nested metrics plus flat-key compatibility for existing readers.
- Added JSONL metric-log aggregation helpers for multi-seed and multi-run summaries with source/run/seed context annotations.
- Added experimental metric-log wide/long row converters so dashboards and downstream tooling can flatten versioned JSONL logs without bespoke schema parsing.
- Added an optional interactive extra with Plotly-backed training-log report HTML, so python -m zeroproofml.report training-log ... can emit interactive metric traces without changing the default benchmark or bundle report dependencies.
- Validation reports now save a concise *.summary.json sidecar with bundle settings, provenance diagnostics, and any benchmark mask-rate/model-summary highlights.
- Validation reports now also pick up optional runtime/fallback summaries from deployment inference_summary.json artifacts plus DOSE calibration split provenance from aggregated/dose_operating_points.json when those sidecars are present.
- Added python -m zeroproofml.report to regenerate benchmark RUN_REPORT.md / optional RUN_REPORT.html from existing artifact directories.
- Extended python -m zeroproofml.report to accept deployment bundle directories, JSONL training logs, and paired benchmark baseline run directories as report inputs.
- Standard report generation now writes SVG summary figures for benchmark metrics, deployment validation summaries, and JSONL training logs; optional benchmark HTML embeds the generated figures.
- DOSE benchmark report regeneration now writes a domain-specific SVG figure pack for threshold sweeps, macro-F1/finite-MAE trade-offs, censor-direction confusion, assay-limit edge cases, and operating-point bottom provenance when the saved diagnostics provide those inputs.
- IK benchmark report regeneration now writes a robotics SVG figure pack for workspace heatmaps, |det(J)|-stratified error/fallback plots, analytic-fallback route maps, and fallback timelines when saved RR IK diagnostics provide those inputs.
Changed
- GitLab Pages publishes the latest main docs at the site root.
- ROS 2 strict-inference telemetry now exports explicit batch bottom/gap rates plus fallback_rate / fault_fallback_rate / semantic_fallback_rate fields so live Foxglove/PlotJuggler charts do not need client-side rate reconstruction.
- Strengthened tests/utils/test_viz.py so the experimental viz contract now checks saved artifacts, axis metadata, and non-empty rendered plot data.
- Added a small hash-based visual regression suite for representative threshold-sweep, confusion-matrix, workspace-heatmap, and Pareto plots so CI catches accidental plotting drift without relying on backend-specific golden images.
- Added a portable viz API contract test that locks the documented zeroproofml.utils.viz exports and call signatures without depending on backend rendering details.
- Kept optional extras lazy at import time by making the zeroproofml compatibility aliases and experimental viz helper exports resolve on first use instead of walking/importing child modules eagerly.
- GitLab CI now runs the ROS 2 companion workspace through colcon build, colcon test, and colcon test-result --verbose on both Humble/Jammy and Jazzy/Noble base images.
- Documented the ROS 2 layout decision: keep the first beta as an in-repo optional companion workspace under integrations/ros2/ instead of creating a separate repository up front.
- Documented the ROS 2 bundle-loading decision: the first beta node should load bundles from startup ROS params, with any reload service deferred until a later lifecycle-managed path.
- The ROS 2 beta now selects CycloneDDS (rmw_cyclonedds_cpp) as its first validated RMW across launch defaults, package metadata, container images, and CI, with a second RMW deferred until the beta path is stable.
- Documented the bundle-service decision: an optional minimal REST adapter around validated ONNX bundles is worthwhile, while gRPC remains deferred until a concrete protobuf/streaming need appears.
- Documented the Triton-style inference-server decision: treat Triton as an optional downstream recipe after ONNX Runtime bundle stability, not as a first-party runtime path yet.
- Documented ROS 2 Kilted as experimental/manual-only until the existing Humble/Jazzy beta coverage is stable, keeping the supported distro matrix aligned with the current CI/container scope.
- ROS 2 strict-inference diagnostics now expose fallback-routing telemetry fields when bundle metadata selects route_to_analytic_solver, including routed counts/rates and provenance-aware route-vs-reject breakdowns when available.
- ONNX bundle metadata.json now carries its own schema_name / schema_version declaration in addition to format_version and strict_inference_schema_version, while bundle validation remains backward-compatible with older sidecars.
- Documented the FRU flattening placement decision: keep training on the projective path, run symbolic flattening as a post-training analysis pass, and only reuse/revalidate that artifact at export time.
- Documented the FRU symbolic blow-up limits (2 ** (L - 1) * d, capped at 16 * d by default) and the cases where flattening should be refused instead of expanding unsupported or out-of-bounds heads.
- Experimental downstream-pipeline stage summaries now also report propagated reject fidelity, downstream decision accuracy, and a payload/flag-consistency calibration proxy for corruption stress tests.
- Experimental downstream-pipeline stage summaries now also split provenance fidelity by label, so fault-vs-semantic survival can be tracked per stage whenever provenance data is present.
- Experimental downstream-pipeline reports now render Markdown/HTML stage-loss summaries, calling out the exact stage where reject/provenance information is first lost and splitting fault-vs-semantic provenance fidelity when available.
- Experimental downstream-pipeline report writes now also emit DOWNSTREAM_PIPELINE_REPORT.json, capturing machine-readable per-stage sample snapshots alongside the Markdown/HTML artifacts.
- Documented the decision to keep examples/robotics/rrr_ik_* and ik6r_* as example-level workflows until they have a maintained zeroproofml.* surface, artifact contract, and CI coverage.
- Harmonized the experimental robotics example naming and CLI defaults so 3R now uses the rrr_ik artifact stem, 3R/6R dataset generators accept RR-style bucket flags, and the experiments doc records the shared convention.
- route_to_analytic_solver(...) now collapses batched invalid masks to per-sample routing decisions, coerces fallback outputs onto the decoded tensor dtype/device, and powers the reference robotics RR DLS fallback path.
- The robotics reference deployment now has an importable zeroproofml.reference_robotics_deployment API returning structured artifact paths, and scripts/reference_robotics_deployment.py is a thin wrapper over that module.
- The reference robotics deployment artifact API now also exposes the exported ONNX path, parsed bundle metadata, validation-report text, and a load_reference_robotics_deployment_artifacts(...) helper for completed run directories.
- Reference robotics deployment run directories now include a versioned output_contract.json, and the loader validates that contract while preserving legacy fixed-layout fallback for older runs.
- Reference robotics deployment summaries now record a provenance-backed comparison between merged-mask routing and provenance-aware routing, keeping fault-like bottoms on the analytic fallback path while rejecting semantic bottoms.
- Reference robotics deployment summaries now also emit hybrid_path_metrics, tracking fallback frequency plus finite-sample accuracy/runtime deltas for merged-mask and provenance-aware hybrid routing.
- Reference robotics deployment summaries now also compare merged-mask and provenance-aware hybrid routing against both the strict SCM-only path and an unconstrained decode baseline.
- Reference robotics deployment summaries now also emit provenance_routing_materiality, quantifying unsafe-accept / semantic-misroute reduction plus the coverage/runtime guardrails from the robotics provenance promotion gate.
- RF benchmark helper scripts now live under importable zeroproofml.benchmarks.domains.rf_* modules, with the scripts/rf/*.py entrypoints kept as deprecation-warning compatibility wrappers and the RF paper-suite staying in-process when it dispatches the per-seed runner.
- RF benchmark seed artifacts now stamp explicit synthetic-dataset metadata (dataset_name, dataset_version, dataset_generator) into data_config, and provenance fingerprints prefer that generator/version marker when present.
- RF benchmark seed artifacts now also stamp canonical split definitions plus artifact_naming, so the train/validation/test/extrapolation contract and checkpoint/result filenames are recorded in-band.
- RF benchmark schema parsing now preserves the domain-specific model_meta, artifact_naming, per-run train_log, and wall_time_s fields without changing the shared outer per-seed run layout.
- RF benchmark summaries now expose standardized peak-retention yield, hallucination-rate, in-band vs extrapolation error, coverage/strict-trigger, and denominator-minimum metrics from the per-seed RF artifacts.
- RF peak-clipping artifacts now include shared-denominator axis diagnostics covering minimum-frequency offset, near-minimum sweep width, and edge-hit rate when a model exposes Q(jw).
- RF seed directories now also emit rf_signal_traces.json with deterministic representative per-seed response traces for later figure regeneration.
- RF seed directories now also emit rf_frequency_response.svg, turning the saved RF trace sidecar into a first-class Bode-style artifact for debugging and paper figures.
- rf_frequency_response.svg now overlays peak annotations, model-specific strict-trigger bands, and shared-denominator minimum guides on the saved RF response traces.
- RF seed directories now also emit rf_qualitative_figure_pack/, packaging saved washout and invented-peak baseline examples into targeted SVGs plus manifest/README sidecars.
- Benchmark runs now emit a root-level RUN_REPORT.md, and the benchmark CLI/API can optionally add RUN_REPORT.html via --html-report / BenchmarkConfig.html_report.
- Standard report regeneration now avoids host-specific absolute paths, refreshes benchmark summary Markdown from stored JSON, uses deterministic paired-stat bootstrap seeds, and auto-discovers parent bundle benchmark/calibration sidecars.
- Standard benchmark reports now include a fault-vs-semantic bottom breakdown whenever provenance split rates are available.
- Standardized the DOSE benchmark metric contract: false-censored / false-in-range rates are now explicitly conditional on their target subsets, summaries expose accept_rate and gap_rate, and the benchmark docs record the canonical operating-point metric names.
- DOSE benchmark runs now emit aggregated/dose_operating_points.{json,md} to back the named safety_first / direction_aware / accuracy_first presets with recorded metrics and threshold values, while only surfacing balanced when it is actually distinct on the run.
- DOSE operating-point reports now also persist the deterministic calibration/evaluation split provenance per seed, so the sample selection behind each operating-point artifact is auditable.
- DOSE benchmark runs now emit seed_*/dose_diagnostics.json and aggregated/dose_diagnostics.json with per-seed plus aggregated confusion matrices, threshold sweeps, |Q| histograms, borderline examples, and censored-subset direction diagnostics.
- DOSE follow-up variants now record canonical config_id snapshots (variant_config.json / dirhead-only config) and the experiment-matrix doc lists the current IDs for reproducible plots and tables.
- DOSE nextsteps curriculum variants now serialize a reusable curriculum_schedule object inside variant_config.json, replacing duplicated one-off schedule assembly with named presets plus resolved lambda targets.
- DOSE nextsteps now include strict-SCM mirrors for the DirBalance and finite-MSE rows, plus a paired issue-separation catalog that labels optimization-only vs representation-only comparisons.
- Added explicit regression coverage for the DOSE mixed-objective angular_curriculum_fmse follow-up so CI now checks that the safe-censoring curriculum remains intact while the finite-MSE anchor is wired through end to end.
- InferenceConfig now owns experimental provenance controls, including opt-in mode selection, an optional fault-threshold override, and split-mask vs bottom_provenance representation selection while keeping the stable three-field inference contract unchanged by default.
- Experimental provenance results now keep the stable (decoded, bottom_mask, gap_mask) unpacking/export contract and surface richer diagnostics as backward-compatible attributes on the same object.
- The API reference stability map now marks opt-in provenance outputs and schema sidecars as experimental until the Q2 promotion decision, while keeping the core inference tuple contract stable.
- docs/06_inference_deployment.md now spells out the opt-in provenance experiment contract and the measurable Q2 promotion gate for any future stable contract change.
- ONNX bundle metadata.json now also records per-input/per-output tensor signatures, explicit batch-axis semantics, optional preprocessing/postprocessing IDs, optional normalization metadata, and a structured mask-semantics block for deployment consumers.
- StrictInferenceMonitor and strict_inference_rates(...) now report fault_rate and semantic_bottom_rate whenever experimental provenance diagnostics are supplied.
- decode_strict_censored_3way(...) now accepts optional experimental provenance diagnostics so DOSE-style direction heads can stay focused on semantic bottoms while fault-like bottoms fall back to sign(P).
- route_to_analytic_solver(...) now accepts optional experimental provenance diagnostics so robotics-style analytic fallback can skip semantic bottoms while still routing fault-like bottoms and other invalid samples.
- generate_validation_report(...) now shows experimental provenance schema details plus fault-vs-semantic bottom-rate breakdowns whenever bundle metadata or benchmark artifacts provide them.
- DOSE operating-point calibration now down-weights semantic bottoms relative to fault bottoms whenever experimental provenance split rates are present, so provenance-bearing runs can choose tau_infer with a finer-grained bottom-cost signal.
- Experimentally configured bundle sidecars now declare a versioned experimental_inference_output_schema so tooling can distinguish the split_masks and bottom_provenance diagnostic layouts without changing stable ONNX outputs.
- ONNX bundle metadata.json now declares strict_inference_exports so deployment tooling can distinguish current merged-mask bundles from future provenance-bearing output contracts.
- Clarified the Q1 bottom-mask provenance-source decision: use inference/diagnostic signals first and keep any model/training-time provenance path behind the Q2 evidence gate.
- Defined the bottom-mask provenance rollout stages explicitly in the design memo: Q1 opt-in diagnostics, Q2 promotion review, and an explicit post-Q2 disposition.
- Added measurable Q2 promotion criteria to the bottom-mask provenance design memo, including value thresholds, non-regression gates, and repeatability requirements.
- Added committed golden bundle/report fixtures for the strict inference export contract, with snapshot tests that fail on accidental metadata or validation-report drift.
- Added fixture-backed compatibility tests that validate bundles exported by the v0.4.2 and v0.4.3 release lines.
- Added focused inference bundle tests that round-trip export_bundle(...) metadata through the loader/validator and assert the expected bundle file structure.
- Added strict inference contract tests covering output ordering, merged bottom-mask provenance semantics, and bundle metadata/order validation.
- Added golden-fixture tests that snapshot the opt-in provenance result contract for strict_inference(...) and SCMInferenceWrapper.
- Added explicit wheel/sdist install smoke tests in isolated virtualenvs and wired them into the build:dist CI job.
- Build/release packaging now emits a CycloneDX SBOM from the built wheel and keeps Twine checks/uploads scoped to the actual distribution artifacts.
- Release checklist guidance now requires release notes to cite generated-artifact provenance (artifacts/paper_2026/manifest.json, run-local manifest.json / provenance.json) plus third-party dependency reports (*.sbom.cdx.json, *.licenses.json).
- Added a scheduled/manual GitLab security:vulnerability-audit job that runs pip-audit against the repo requirement manifests, and documented that live advisory scanning stays out of the tag-triggered release lane.
- Added a test:viz-extra GitLab CI lane that installs the viz extra and runs the optional plotting/logging smoke tests.
- Added a minimal test:jax-extra GitLab CI lane that installs the jax extra and runs focused JAX SCM smoke tests on CPU.
- Re-audited the API stability map in docs/08_api_reference.md to use canonical zeroproofml.* module names and capture the currently supported metrics, projective-rational, and benchmark entry points.
- Documented the visualization architecture decision: keep zeroproofml.utils.viz as experimental lightweight primitives, reserve higher-level reports for a separate layer, and keep zeroproof.utils.viz as a compatibility import.
- Split the experimental zeroproofml.utils.viz helpers into grouped training, strict_inference, benchmarks, and domains submodules while preserving the existing top-level imports.
- Documented the utility support boundary: zeroproofml.utils.logging now has a stable JSONL core plus experimental reporting conveniences, zeroproofml.utils.viz remains experimental, and zeroproof.* paths are called out as compatibility imports.
- Marked experimental autodiff/logging/viz modules more aggressively in docs and public docstrings.
- Marked zeroproofml.inference.script_module(...) as a legacy TorchScript compatibility helper in the API/deployment docs; ONNX remains the preferred deployment path.
- Package version metadata now comes from zeroproof/_version.py and both zeroproof / zeroproofml import paths use that shared source (removed hard-coded fallback strings).
- Documented the public namespace policy in the README and getting-started guide: zeroproofml is canonical, while zeroproof remains a supported compatibility namespace.
- Clarified that the zeroproof compatibility namespace remains supported through the M4 core 1.0-or-stay-0.x decision, with no planned namespace deprecation warning before a concrete migration plan exists.
- Documented the namespace layering rule for new code: SCM implementation stays under zeroproof.*, zeroproofml.* remains the canonical public import/docs surface, and zeroproofml.*-only packages are reserved for product-level entry points such as zeroproofml.benchmarks.
- Added explicit pytest coverage for the supported zeroproofml and zeroproof namespace import paths.
- Added a stable-surface API compatibility test table that keeps the documented zeroproofml.* exports aligned with their supported zeroproof.* compatibility imports.
- Synced the release version references across packaging/runtime/docs metadata (pyproject.toml, changelogs, CITATION.cff, and MkDocs version banner/footer fields).
- Added scripts/ci/check_version_sync.py and CI wiring so merges fail on version metadata drift (with release-tag validation when CI_COMMIT_TAG is set).
- Added an explicit release checklist in CONTRIBUTING.md covering namespace consistency, docs-link alignment, and install smoke checks.
- Documented the benchmark layout decision: scientific claim benchmarks stay in zeroproofml/benchmarks/, while the top-level performance suite is slated for perf/.
- Renamed GitLab benchmark jobs to claim-benchmark:* and clarified in the README/docs that the DOSE/RF/IK scientific claim benchmarks are separate from the performance microbenchmark suite.
- Benchmark domain adapters now tell the DOSE/RF paper-suite runners to skip their legacy aggregate summary write, so the canonical benchmark aggregated/summary.json no longer needs a summary_script.json backup.
- Benchmark domain dispatch now invokes the benchmark Python entrypoints in-process from zeroproofml.benchmarks.domains instead of shelling out with subprocess.
- Split the benchmark runners into importable domain modules under zeroproofml.benchmarks.domains.{dose,rf,ik} while keeping the existing runner API stable.
- The benchmark CLI now dispatches through public run_dose_benchmark(...), run_rf_benchmark(...), and run_ik_benchmark(...) helpers instead of assembling BenchmarkConfig directly.
- Corrected the CI coverage note: GitLab reports SCM-suite coverage and updates the badge, but does not enforce --cov-fail-under=90.
- GitLab CI now keeps merge-request benchmark jobs in smoke mode and exposes 10-seed paper-mode benchmark runs as scheduled/manual jobs on the default branch.
- GitLab CI now includes a scheduled/manual ONNX bundle compatibility job that runs tests/inference/test_export_compatibility.py with onnxruntime.
- GitLab CI now fans the scheduled/manual ONNX export compatibility lane out across curated Torch, ONNX, and onnxruntime version triples, running both ONNX bundle validation and onnxruntime roundtrip tests.
- Added a committed artifacts/paper_2026/ replay bundle with pinned commands, config snapshots, expected outputs, tolerance bands, and SHA256 inputs for the current paper-facing runs.
- Added make reproduce-paper plus per-domain make reproduce-* shortcuts for the frozen paper replay commands and the reference robotics deployment run.
- Added pinned CPU/GPU paper-rerun container recipes to artifacts/paper_2026/manifest.json, including base image digests and hashed build inputs.
- Added explicit schema versions to run manifests and unified provenance validation on shared benchmark schema constants.
- Benchmark provenance/manifests now record per-seed dataset fingerprints for generated configs and file-backed datasets.
- Benchmark provenance/manifests now record SHA256 hashes for saved checkpoints, discovered bundle directories, and post-processed summary files.
- Benchmark provenance now snapshots whether the git worktree was dirty when a benchmark run started, rather than evaluating that flag only after artifacts were written.
- Benchmark provenance now records CPU/GPU model details, OS/Python runtime info, Torch and ONNX/onnxruntime versions, and installed optional backend versions.
- Added validate_run_dir(...) to verify benchmark run directories have the required artifacts, valid JSON schemas, and on-disk hash consistency.
- Paper-mode benchmark runs now support --resume, persist resume_state.json at run start, and record original plus resumed attempt metadata in provenance.json.
- Moved the Phase 17 dose/RF paper-suite runners plus the RR IK dataset/comparison implementations into importable zeroproofml.benchmarks.domains.* modules, with the legacy entrypoint files kept as thin compatibility wrappers over those Python APIs.
- CI now exercises the importable smoke-mode benchmark path directly in pytest, and the legacy compatibility wrappers emit runtime DeprecationWarning messages that point to the supported Python APIs.
- Expanded the GitLab test stage with an explicit Python 3.10-3.13 matrix so CI matches the versions advertised in packaging metadata.
- Marked DOI-backed code-artifact archival as non-executable locally because minting the snapshot DOI requires an authenticated external archive upload.
- Added typed benchmark schema dataclasses for per-seed raw results, summaries, paired stats, claim audits, manifests, and provenance artifacts.
- Benchmark per-seed raw result artifacts now carry explicit schema names/versions, and the domain adapters return/write canonical per-seed payloads before benchmark metadata generation (with runner backfill kept only for legacy test doubles that still return paths alone).
- Benchmark artifact loaders now fail fast with a clear compatibility error when asked to read older run schemas; migration loaders are still deferred.
- Benchmark smoke and paper runs now use seed_{n} directories consistently across the DOSE, RF, and IK domains; the RF suite still reads legacy quick_s{n} folders when aggregating older artifacts.
- Benchmark smoke and paper runs now normalize all per-seed raw artifacts to seed_*/per_seed_result.json, while transparently migrating legacy domain-specific filenames during harness execution and resume.
[0.4.3] - 2026-03-03¶
Release focused on documentation coherence & reproducibility entry points.
Added
- Experiments & reproduction landing page (docs/21_experiments.md) with benchmark commands, artifact contract, and provenance-based baseline comparison.
- API reference stability map (stable vs experimental) in docs/08_api_reference.md.
Changed
- Legacy experimental suite guide is now explicitly archived and points to the current reproduction entry points (scripts/EXPERIMENTAL_SUITE_README.md).
- zeroproof.autodiff.graph is explicitly documented as experimental.
Fixed
- GitLab CI coverage regex now matches pytest-cov's TOTAL … % output (coverage badge).
- ONNX export CI now installs onnxscript, and export_bundle(...) defaults to opset 18 for Torch 2.8 compatibility.
- Ubuntu CI jobs now install Python deps in a venv to avoid PEP 668 externally-managed-environment failures.
- CI now installs CPU-only PyTorch wheels on Linux to avoid pulling CUDA libraries (disk exhaustion).
[0.4.2] - 2026-03-01¶
Release focused on observability & interop (NumPy parity, export bundles, and lightweight plotting/logging utilities).
Added
- NumPy parity for projective tooling: lift_targets_numpy(...) and NumPy dispatch for strict SCM inference.
- ONNX “deployment bundle” helper export_bundle(...) that writes model.onnx plus a metadata.json sidecar.
- Optional visualization helpers (zeroproof.utils.viz) and first-party log hooks (zeroproof.utils.logging) including JSONL readers / DataFrame loader (via the viz extra).
- Trainer config additions: TrainingConfig.val_loader, use_amp alias for AMP, and gradient_policy override; validation summaries are stored in SCMTrainer.val_history.
- GitLab CI pipeline with a manual, tag-only PyPI publish job (.gitlab-ci.yml).
Changed
- gradient_policy(...) now overrides registered per-layer defaults while the context is active.
- Repository/documentation links updated to GitLab (gitlab.com/domezsolt/zeroproofml).
[0.4.0] - 2025-12-11¶
Release completing the SCM migration and preparing PyPI publication.
Added
- SCM-first documentation and README refresh describing weak sign tracking, projective tuples, and absorptive arithmetic.
- Benchmark regression gate (scripts/ci/benchmark_gate.py) and CI wiring for SCM-specific pytest markers and coverage thresholds.
- License headers across the v0.4 codebase to match release compliance requirements.
Changed
- CI pipelines now run strict linting (black, ruff, isort), mypy --strict, and publish SCM-suite coverage reports/artifacts.
- Benchmarks workflow uses the SCM benchmark runner and fails on malformed or slow results.
- README development commands align with the tightened release checks.
[0.2.0] - 2025-10-29¶
Integration-focused release with unified runner, parity, and docs.
Added
- Unified integration runner: scripts/run_integration_suite.py
- Runs focused unit tests (Torch/JAX + parity), writes a consolidated log, and saves parity_results.json.
- CPU‑friendly defaults (threads capped on ≤4‑thread CPUs), headless plotting, disables CUDA visibility, enables JAX x64.
- Parity helpers: zeroproof/utils/parity.py
- run_backend_parity and parity_within_tolerance across NumPy, PyTorch, and JAX.
- Torch parity enabled; JAX parity enabled on CPU.
- Determinism test for Torch: tests/unit/test_torch_determinism.py.
- Documentation: docs/11_integrations.md (framework integrations & runner) and index link. PyPI install instructions now primary.
- README simplified with vision/goal and links to full docs and robotics example (Zenodo).
Changed
- JAX bridge (custom_vjp): backward signatures updated for cross‑version compatibility (removed nondiff_argnums; pass scalars via residuals).
- JAX deterministic sum tolerance relaxed to atol=1e-5 for CPU order stability.
- Integration runner hardens imports (ensures repo root on sys.path) and environment (JAX x64, thread caps).
- Torch parity uses non‑grad inputs to avoid graph reuse/backward issues in integration context.
Fixed
- JAX tr_div backward now masks gradients when denominator is non‑REAL or zero; gradient clipping rewritten using JAX arrays to avoid tracer boolean conversion.
- Parity “No module named 'zeroproof'” import issue resolved by inserting repo root into sys.path in the runner.
- JAX dtype warnings suppressed by enabling x64 in runner and parity path.
[0.1.0] - 2025-10-01¶
First external-ready release candidate (repo hardening).
Added
- PEP 621 packaging via pyproject.toml with extras: torch, jax, all, dev.
- py.typed marker to ship type information for the public API.
- CI release workflow to build wheels/sdist and publish on tag.
- Dependabot configuration for dependency updates.
- Initial mypy configuration scoped to the public API.
Changed
- CI matrix aligned to Python 3.9–3.12; lint job installs black, ruff, isort, mypy.
- README installation section clarifies install-from-source and extras.
Fixed
- Minor import in evaluator utilities to avoid NameError when generating default evaluation grid.
- Hybrid trainer now aggregates per‑sample losses via a balanced pairwise
reduction to bound graph depth and avoid recursion issues during backprop.
Notes
- Until PyPI publication, use pip install -e .[dev] for development and pip install -e .[all] for full features.
- Torch/JAX remain optional dependencies; import zeroproof should work without them.