Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]¶

[0.6.0] - 2026-05-18¶

Migration notes - InferenceConfig.provenance_fault_threshold is deprecated but still accepted as an alias for numerical_hazard_threshold in strict-provenance construction. Finite tiny denominators are no longer folded into fault_mask; this threshold is monitor-only metadata, not ignored, and never contributes to bottom_mask, fault_mask, or semantic_bottom_mask unless a consumer separately promotes that diagnostic. - Previously exported bundles are validated under their recorded schema and metadata, including strict_inference_schema_version, strict_inference_exports, and any inference-output schema sidecar; they are not silently reinterpreted under the new hardened schema. - New hardening-release opt-ins are explicit: SimplificationMode with simplification_mode="field_rational", experimental structural validity factors for public FRU provenance, soft_coverage_loss(...), lift_semantic_targets(...), monitor-only numerical_hazard_threshold diagnostics, and explicit GaugePolicy / ProjectiveNormalize(...) projective gauge helpers. - New hardening-release raises/refusals are explicit: non-finite strict P/Q payloads and validity factors bottom through isfinite into fault_mask; bottom-capability mismatches raise unless allow_bottom_unreachable=True; strict FRU flattening refuses configured depth/degree-bound violations rather than using field-rational rescue; and strict cancellation refuses unsafe symbolic factor cancellation without a proven or declared nonzero assumption. - Most hardening-release changes are additive, but strict-mode flattening may change masks at singular edge cases and may refuse expressions that field-rational simplification previously accepted. - Known workflows to re-check when upgrading to v0.6.0: examples/fru_strict_check_demo.py, the RR-IK reference deployment, and the DOSE matrix/artifact path. Strict-mode flattening, bottom-capability checks, and fail-closed masks can alter singular-edge behavior; DOSE and reference-robotics workflows already consume fault/semantic split routing in v0.6.x and benefit from the promotion with no opt-in flag in the hardened release.

Changed - Made sign_consistency_loss and SCMTrainingLoss singular-only by default when mask_singular is omitted, using abs(Y_d) <= epsilon_sing. - Added an opt-in reduction="conditional" mode to margin_loss while keeping the default population-style masked batch mean. - Added a gauge-policy design-review note comparing canonical_denominator, unit_l2_projective, and angular_unit_circle without changing strict-inference behavior. - Added the public GaugePolicy enum and opt-in ProjectiveNormalize helper for canonical-denominator, unit-L2 projective, and angular unit-circle gauge conventions. - Recorded canonical_denominator as the default gauge policy. - Clarified that current post-hoc tau_infer sweeps over cached |Q| distributions remain gauge-correct for the specific head and magnitude convention that produced those distributions. - Added a monitor-only numerical_hazard_rate axis to StrictInferenceMonitor / strict_inference_rates(...) and documented that fault_rate, semantic_bottom_rate, and finite-hazard alerts should be thresholded separately. - Clarified that frequent fault_mask triggers from strict-flattened heads are an implementation-hygiene signal rather than the normal semantic-bottom path, while intentional IEEE-arithmetic heads may use fault_mask as the correct fail-closed bottom path. - Promoted eager strict-inference provenance outputs to stable result attributes (fault_mask, semantic_bottom_mask, and bottom_provenance) while preserving three-field unpacking as (decoded, bottom_mask, gap_mask). - Recorded the Q2 provenance gate as superseded for the v0.6.x fault/semantic mask set; its criteria are not retroactively re-applied to the now-stable fault_mask, semantic_bottom_mask, and bottom_provenance result attributes. - Renamed the provenance bundle metadata sidecar to inference_output_schema; the old experimental_inference_output_schema key remains a deprecated alias that validates with DeprecationWarning. - Made bundle smoke-test parity compare decoded values only on non-bottom entries after validating bottom_mask, keeping bottom payload sentinels non-semantic. - Added a deployment-doc warning that consumers must honor bottom_mask instead of treating decoded NaN bottom payloads as semantic data. - Added a bad-consumer contract test covering decoded-only, bottom-mask-only, and fault-mask-only strict-inference misuse. - Made strict inference fail closed for all non-finite P/Q payloads and route those bottoms to fault_mask. - Made censored-direction decoding preserve fault-like bottom orientation only through explicit finite orientation side channels or direction heads, not IEEE-infinity scalar signs. - Preserved strict SCM fracterm flattening bottom behavior by keeping default flattening uncancelled and carrying divisor denominator factors through division. - Added the public SimplificationMode literal alias for strict SCM versus unsafe field-rational fracterm/FRU simplification modes. - Gated field-rational fracterm simplification behind explicit unsafe simplification_mode="field_rational". - Documented field_rational as an explicit opt-in that is unsafe for strict bottom-preserving semantics. - Threaded fracterm/FRU simplification mode through internal flattening helpers so strict pipelines reject unsafe field-rational simplifiers before flattening starts. - Documented and tested common-meadow anchor identities as semantic checks rather than required emitted fracterm normal forms. - Added explicit strict-preserving target tags for pure strict Fracterm artifacts and guarded strict FlattenedFRU artifacts. - Allowed pure strict Fracterm zero-numerator collapse when a denominator is covered by explicit nonzero domain assumptions. - Limited strict fracterm cancellation to safe numeric constants and assumption-covered monomial factors, while guarded FRU cancellation requires retained validity factors.

[0.5.1] - 2026-04-30¶

Added - Added a dedicated published-docs "Reproduce the Paper" page and linked it from the docs hub plus the experiments landing page. - Added an examples inventory page that labels every file under examples/ as a quickstart, supported example, benchmark helper, or archival/experimental path. - Added a promoted examples tutorials page that turns the maintained quickstart, coverage-control, bundle-export, and FRU strict-check scripts into docs-backed walkthroughs. - Added examples/README.md and examples/robotics/README.md so the maintained examples, benchmark helpers, and archival robotics/Transreal paths are clearly separated in-tree. - Added examples/deployment/ with docs-backed end-to-end workflows for bundle runtime, reference robotics fallback routing, benchmark report regeneration, and ROS 2 launch. - Added focused docs guides for choosing tau_infer, namespace imports, masks/provenance, deployment bundles, ROS 2 integration, and visualization/report generation. - Bootstrapped the ROS 2 companion workspace with the first ament_python package at integrations/ros2/src/zeroproofml_ros, re-exporting the stable Python strict-inference and ONNX runtime helpers for the initial node path. - Added the ROS 2 interface package integrations/ros2/src/zeroproofml_msgs with StrictInferenceResult.msg for decoded payloads, masks, bundle/schema metadata, threshold metadata, and optional provenance fields. - Added the first ROS 2 strict_inference_node entry point, loading one startup-selected bundle, consuming Float64MultiArray inputs, publishing StrictInferenceResult, and emitting DiagnosticArray summaries through StrictInferenceMonitor. - Added ROS 2 strict-inference QoS presets for low-latency control loops and reliable offline/batch replay. - Added a ROS 2 Float64MultiArray visualization telemetry topic for Foxglove/PlotJuggler-friendly strict-inference mask, provenance, and fallback metrics. - Added ROS 2 telemetry CSV exporters that flatten strict-inference diagnostics into stable named columns for PlotJuggler-friendly replay and ops traces. - Added ROS 2 RViz marker helpers that convert workspace heatmap summaries into MarkerArray/CUBE_LIST overlays for RR-style debugging. - Added RR IK RViz result overlays that render the current arm pose, requested displacement, and accepted end-effector target for single-sample spatial debugging. - Added a managed ROS 2 lifecycle_strict_inference_node variant that reuses the strict inference bundle/message contract while loading bundles in on_configure and only processing batches after activation. - Added the first RR IK ROS 2 launch/demo path, including a packaged rr_ik_strict_inference.launch.py, default RR topic/provider YAML, a sample Float64MultiArray payload, and an rr_ik_demo runtime smoke command for reference robotics bundles. - Added an RR IK ROS 2 graph-composition launch example that runs the strict-inference node with one-shot sample publishing plus result and telemetry echo subscribers. - Added a DOSE offline-batch ROS 2 launch path with packaged topic/provider YAML and a sample Float64MultiArray payload using offline_batch_replay QoS. - Added deterministic ROS 2 rosbag-export replay fixtures and golden-output tests for the RR IK strict-inference beta path. - Added local ROS 2 container recipes under integrations/ros2/containers/ plus a small manifest for Humble/Jammy and Jazzy/Noble CPU images that preinstall ZeroProofML, onnxruntime, and a built companion colcon workspace overlay. - Added StrictInferenceMonitor.export_state(...) so deployment monitoring can export running counts/update coverage and optional per-batch histograms alongside aggregated rates. - Added optional structured strict-inference event logging for bottom/gap triggers, fallback routing, and acceptance-rate drift via StrictInferenceMonitor and route_to_analytic_solver(...). - Added load_onnx_runtime_bundle(...) / ONNXRuntimeBundle so validated ONNX export bundles can be opened through onnxruntime and still return the stable (decoded, bottom_mask, gap_mask) inference contract. - Added run_bundle_reference_smoke_test(...), a pure-Python parity check that runs a smoke sample through both a wrapped model and its exported bundle and raises on decoded/mask drift. - Added examples/cpp/minimal_bundle_consumer.cpp, a reference ONNX Runtime C++ consumer for the current stable merged-mask bundle contract. - Added examples/cpp/zeroproofml_bundle.hpp, a header-only C++ wrapper around stable strict-inference ONNX bundles for robotics/embedded adopters that already use ONNX Runtime from C++. - Added strict_inference_audit.json to reference deployment runs so operators get a versioned per-batch provenance audit sidecar alongside inference_summary.json. - Documented the current fracterm flattening utility inventory, including the expression classes supported by zeroproofml.scm.fracterm and the coefficient-list-only boundary of FractermRationalUnit.flatten(...). - Added an experimental FRU expression library in zeroproofml.layers.fru with typed nodes, local P/Q flattening, symbolic simplification hooks, denominator provenance, and bound validation through FractermRationalUnit.flatten_expression(...). - Added FRU equivalence tests that compare composed rational expressions against their flattened P/Q outputs, including symbolic cancellation cases. - Added a runnable FRU strict-check demo that composes multiple rational stages, flattens them into one checked P/Q pair, and feeds that pair into strict_inference(...). - Added an experimental downstream pipeline simulator (zeroproofml.downstream_pipeline) with configurable stage boundaries and built-in 1-step, 3-step, and 5-step variants for composability studies. - Added built-in bad-downstream toggles to the experimental downstream pipeline simulator, covering nan_to_num, missing-flag drops, clipping, default fills, JSON/CSV round-trips, and mean reductions. - Added built-in downstream-pipeline comparison presets for scalar-only, abstention/uncertainty, strict-SCM, and strict-SCM-plus-direction-head composability baselines. - Documented the examples/robotics/ inventory, marking legacy RR scripts as archive-only, rr_ik_dataset.py as a benchmark-input wrapper, and the 3R/6R paths as experimental examples. - Added a formal DOSE experiment-matrix doc that maps the head/curriculum/balancing/direction-head follow-up rows onto the existing benchmark scripts. - Added a compact REPRODUCIBILITY.md and linked it from the README plus the docs landing page. - Recorded the published paper artifact DOI (10.5281/zenodo.18944465) in the paper archive metadata, reproducibility docs, and README. - Added benchmark --skip-complete-seeds and --force-rerun controls for explicit reuse of existing run directories. - Added scripts/ci/generate_license_report.py to emit a dependency license report for built wheels, including selected extras. - Added a short design memo for the Q1 bottom-mask provenance decision, choosing a heuristic-first experimental rollout over an immediate strict-output contract change. - Added an opt-in experimental provenance split for strict_inference(...) and SCMInferenceWrapper, exposing heuristic fault_mask / semantic_bottom_mask diagnostics without changing the stable three-field output contract. - Added a dedicated guide for the experimental provenance representations, including how they refine bottom_mask while leaving gap_mask as a separate stable signal. - Added a Q2 provenance decision memo that keeps the opt-in provenance path experimental because the published review-artifact and rerun gate has not been met. - Added a synthetic RR trajectory-evaluation dataset generator (zeroproofml.reference_robotics_trajectory_data plus scripts/generate_reference_robotics_trajectory_data.py) that stratifies trajectories by workspace quadrant and singularity proximity. - The RR trajectory dataset generator now also writes per-stratum stress-test subset JSONs keyed by workspace quadrant and |det(J)| bin. - Added an importable RR trajectory-evaluation helper (zeroproofml.reference_robotics_trajectory_eval) so near-singularity policies can be replayed in closed loop and summarized by start singularity bin, fallback usage, and tracking error. - The RR trajectory evaluator summary now also reports per-step fallback-rate traces, joint-limit/chattering events, and optional latency-budget violations for rollout-level robotics metrics. - Added 2D/3D strict-inference mask-map helpers for scattering bottom/gap decisions over workspace or sample coordinates. - Added plot_workspace_rate_heatmaps(...) to the experimental viz helpers so RR-style workspace points can be binned into bottom/gap/fallback heatmaps, with provenance-aware coloring for fault-vs-semantic bottom/fallback regions when available. - Added plot_route_to_solver_overlay(...) and plot_detj_stratified_metrics(...) to the experimental viz helpers so robotics fallback routes/rejects and |det(J)|-bucketed rollout metrics can be visualized directly. - Added fallback-route timeline and per-batch monitoring-summary viz helpers, with provenance tag series when batch artifacts include fault/semantic route or monitor breakdowns. - Added provenance-aware coloring to plot_denominator_hist(...) so denominator histograms can split finite, fault, and semantic samples while retaining tau_infer / tau_train overlays. - Added plot_tau_infer_sweep(...) and plot_tau_train_sweep(...) to the experimental viz helpers for threshold-sweep curves, including aggregated mean/std sweep artifacts. - Added plot_safety_accuracy_pareto(...) to the experimental benchmark viz helpers for higher-is-better safety vs accuracy trade-off fronts. - Added plot_confusion_matrix(...) and plot_categorical_reliability(...) to the experimental benchmark viz helpers for categorical-head diagnostics. - Added a versioned zeroproofml.metric_log JSONL schema for trainer/eval metric logs, including canonical nested metrics plus flat-key compatibility for existing readers. - Added JSONL metric-log aggregation helpers for multi-seed and multi-run summaries with source/run/seed context annotations. - Added experimental metric-log wide/long row converters so dashboards and downstream tooling can flatten versioned JSONL logs without bespoke schema parsing. - Added an optional interactive extra with Plotly-backed training-log report HTML, so python -m zeroproofml.report training-log ... can emit interactive metric traces without changing the default benchmark or bundle report dependencies. - Validation reports now save a concise *.summary.json sidecar with bundle settings, provenance diagnostics, and any benchmark mask-rate/model-summary highlights. - Validation reports now also pick up optional runtime/fallback summaries from deployment inference_summary.json artifacts plus DOSE calibration split provenance from aggregated/dose_operating_points.json when those sidecars are present. - Added python -m zeroproofml.report to regenerate benchmark RUN_REPORT.md / optional RUN_REPORT.html from existing artifact directories. - Extended python -m zeroproofml.report to accept deployment bundle directories, JSONL training logs, and paired benchmark baseline run directories as report inputs. - Standard report generation now writes SVG summary figures for benchmark metrics, deployment validation summaries, and JSONL training logs; optional benchmark HTML embeds the generated figures. - DOSE benchmark report regeneration now writes a domain-specific SVG figure pack for threshold sweeps, macro-F1/finite-MAE trade-offs, censor-direction confusion, assay-limit edge cases, and operating-point bottom provenance when the saved diagnostics provide those inputs. - IK benchmark report regeneration now writes a robotics SVG figure pack for workspace heatmaps, |det(J)|-stratified error/fallback plots, analytic-fallback route maps, and fallback timelines when saved RR IK diagnostics provide those inputs.

Changed - GitLab Pages publishes the latest main docs at the site root. - ROS 2 strict-inference telemetry now exports explicit batch bottom/gap rates plus fallback_rate / fault_fallback_rate / semantic_fallback_rate fields so live Foxglove/PlotJuggler charts do not need client-side rate reconstruction. - Strengthened tests/utils/test_viz.py so the experimental viz contract now checks saved artifacts, axis metadata, and non-empty rendered plot data. - Added a small hash-based visual regression suite for representative threshold-sweep, confusion-matrix, workspace-heatmap, and Pareto plots so CI catches accidental plotting drift without relying on backend-specific golden images. - Added a portable viz API contract test that locks the documented zeroproofml.utils.viz exports and call signatures without depending on backend rendering details. - Kept optional extras lazy at import time by making the zeroproofml compatibility aliases and experimental viz helper exports resolve on first use instead of walking/importing child modules eagerly. - GitLab CI now runs the ROS 2 companion workspace through colcon build, colcon test, and colcon test-result --verbose on both Humble/Jammy and Jazzy/Noble base images. - Documented the ROS 2 layout decision: keep the first beta as an in-repo optional companion workspace under integrations/ros2/ instead of creating a separate repository up front. - Documented the ROS 2 bundle-loading decision: the first beta node should load bundles from startup ROS params, with any reload service deferred until a later lifecycle-managed path. - The ROS 2 beta now selects CycloneDDS (rmw_cyclonedds_cpp) as its first validated RMW across launch defaults, package metadata, container images, and CI, with a second RMW deferred until the beta path is stable. - Documented the bundle-service decision: an optional minimal REST adapter around validated ONNX bundles is worthwhile, while gRPC remains deferred until a concrete protobuf/streaming need appears. - Documented the Triton-style inference-server decision: treat Triton as an optional downstream recipe after ONNX Runtime bundle stability, not as a first-party runtime path yet. - Documented ROS 2 Kilted as experimental/manual-only until the existing Humble/Jazzy beta coverage is stable, keeping the supported distro matrix aligned with the current CI/container scope. - ROS 2 strict-inference diagnostics now expose fallback-routing telemetry fields when bundle metadata selects route_to_analytic_solver, including routed counts/rates and provenance-aware route-vs-reject breakdowns when available. - ONNX bundle metadata.json now carries its own schema_name / schema_version declaration in addition to format_version and strict_inference_schema_version, while bundle validation remains backward-compatible with older sidecars. - Documented the FRU flattening placement decision: keep training on the projective path, run symbolic flattening as a post-training analysis pass, and only reuse/revalidate that artifact at export time. - Documented the FRU symbolic blow-up limits (2 ** (L - 1) * d, capped at 16 * d by default) and the cases where flattening should be refused instead of expanding unsupported or out-of-bounds heads. - Experimental downstream-pipeline stage summaries now also report propagated reject fidelity, downstream decision accuracy, and a payload/flag-consistency calibration proxy for corruption stress tests. - Experimental downstream-pipeline stage summaries now also split provenance fidelity by label, so fault-vs-semantic survival can be tracked per stage whenever provenance data is present. - Experimental downstream-pipeline reports now render Markdown/HTML stage-loss summaries, calling out the exact stage where reject/provenance information is first lost and splitting fault-vs-semantic provenance fidelity when available. - Experimental downstream-pipeline report writes now also emit DOWNSTREAM_PIPELINE_REPORT.json, capturing machine-readable per-stage sample snapshots alongside the Markdown/HTML artifacts. - Documented the decision to keep examples/robotics/rrr_ik_* and ik6r_* as example-level workflows until they have a maintained zeroproofml.* surface, artifact contract, and CI coverage. - Harmonized the experimental robotics example naming and CLI defaults so 3R now uses the rrr_ik artifact stem, 3R/6R dataset generators accept RR-style bucket flags, and the experiments doc records the shared convention. - route_to_analytic_solver(...) now collapses batched invalid masks to per-sample routing decisions, coerces fallback outputs onto the decoded tensor dtype/device, and powers the reference robotics RR DLS fallback path. - The robotics reference deployment now has an importable zeroproofml.reference_robotics_deployment API returning structured artifact paths, and scripts/reference_robotics_deployment.py is a thin wrapper over that module. - The reference robotics deployment artifact API now also exposes the exported ONNX path, parsed bundle metadata, validation-report text, and a load_reference_robotics_deployment_artifacts(...) helper for completed run directories. - Reference robotics deployment run directories now include a versioned output_contract.json, and the loader validates that contract while preserving legacy fixed-layout fallback for older runs. - Reference robotics deployment summaries now record a provenance-backed comparison between merged-mask routing and provenance-aware routing, keeping fault-like bottoms on the analytic fallback path while rejecting semantic bottoms. - Reference robotics deployment summaries now also emit hybrid_path_metrics, tracking fallback frequency plus finite-sample accuracy/runtime deltas for merged-mask and provenance-aware hybrid routing. - Reference robotics deployment summaries now also compare merged-mask and provenance-aware hybrid routing against both the strict SCM-only path and an unconstrained decode baseline. - Reference robotics deployment summaries now also emit provenance_routing_materiality, quantifying unsafe-accept / semantic-misroute reduction plus the coverage/runtime guardrails from the robotics provenance promotion gate. - RF benchmark helper scripts now live under importable zeroproofml.benchmarks.domains.rf_* modules, with the scripts/rf/*.py entrypoints kept as deprecation-warning compatibility wrappers and the RF paper-suite staying in-process when it dispatches the per-seed runner. - RF benchmark seed artifacts now stamp explicit synthetic-dataset metadata (dataset_name, dataset_version, dataset_generator) into data_config, and provenance fingerprints prefer that generator/version marker when present. - RF benchmark seed artifacts now also stamp canonical split definitions plus artifact_naming, so the train/validation/test/extrapolation contract and checkpoint/result filenames are recorded in-band. - RF benchmark schema parsing now preserves the domain-specific model_meta, artifact_naming, per-run train_log, and wall_time_s fields without changing the shared outer per-seed run layout. - RF benchmark summaries now expose standardized peak-retention yield, hallucination-rate, in-band vs extrapolation error, coverage/strict-trigger, and denominator-minimum metrics from the per-seed RF artifacts. - RF peak-clipping artifacts now include shared-denominator axis diagnostics covering minimum-frequency offset, near-minimum sweep width, and edge-hit rate when a model exposes Q(jw). - RF seed directories now also emit rf_signal_traces.json with deterministic representative per-seed response traces for later figure regeneration. - RF seed directories now also emit rf_frequency_response.svg, turning the saved RF trace sidecar into a first-class Bode-style artifact for debugging and paper figures. - rf_frequency_response.svg now overlays peak annotations, model-specific strict-trigger bands, and shared-denominator minimum guides on the saved RF response traces. - RF seed directories now also emit rf_qualitative_figure_pack/, packaging saved washout and invented-peak baseline examples into targeted SVGs plus manifest/README sidecars. - Benchmark runs now emit a root-level RUN_REPORT.md, and the benchmark CLI/API can optionally add RUN_REPORT.html via --html-report / BenchmarkConfig.html_report. - Standard report regeneration now avoids host-specific absolute paths, refreshes benchmark summary Markdown from stored JSON, uses deterministic paired-stat bootstrap seeds, and auto-discovers parent bundle benchmark/calibration sidecars. - Standard benchmark reports now include a fault-vs-semantic bottom breakdown whenever provenance split rates are available. - Standardized the DOSE benchmark metric contract: false-censored / false-in-range rates are now explicitly conditional on their target subsets, summaries expose accept_rate and gap_rate, and the benchmark docs record the canonical operating-point metric names. - DOSE benchmark runs now emit aggregated/dose_operating_points.{json,md} to back the named safety_first / direction_aware / accuracy_first presets with recorded metrics and threshold values, while only surfacing balanced when it is actually distinct on the run. - DOSE operating-point reports now also persist the deterministic calibration/evaluation split provenance per seed, so the sample selection behind each operating-point artifact is auditable. - DOSE benchmark runs now emit seed_*/dose_diagnostics.json and aggregated/dose_diagnostics.json with per-seed plus aggregated confusion matrices, threshold sweeps, |Q| histograms, borderline examples, and censored-subset direction diagnostics. - DOSE follow-up variants now record canonical config_id snapshots (variant_config.json / dirhead-only config) and the experiment-matrix doc lists the current IDs for reproducible plots and tables. - DOSE nextsteps curriculum variants now serialize a reusable curriculum_schedule object inside variant_config.json, replacing duplicated one-off schedule assembly with named presets plus resolved lambda targets. - DOSE nextsteps now include strict-SCM mirrors for the DirBalance and finite-MSE rows, plus a paired issue-separation catalog that labels optimization-only vs representation-only comparisons. - Added explicit regression coverage for the DOSE mixed-objective angular_curriculum_fmse follow-up so CI now checks that the safe-censoring curriculum remains intact while the finite-MSE anchor is wired through end to end. - InferenceConfig now owns experimental provenance controls, including opt-in mode selection, an optional fault-threshold override, and split-mask vs bottom_provenance representation selection while keeping the stable three-field inference contract unchanged by default. - Experimental provenance results now keep the stable (decoded, bottom_mask, gap_mask) unpacking/export contract and surface richer diagnostics as backward-compatible attributes on the same object. - The API reference stability map now marks opt-in provenance outputs and schema sidecars as experimental until the Q2 promotion decision, while keeping the core inference tuple contract stable. - docs/06_inference_deployment.md now spells out the opt-in provenance experiment contract and the measurable Q2 promotion gate for any future stable contract change. - ONNX bundle metadata.json now also records per-input/per-output tensor signatures, explicit batch-axis semantics, optional preprocessing/postprocessing IDs, optional normalization metadata, and a structured mask-semantics block for deployment consumers. - StrictInferenceMonitor and strict_inference_rates(...) now report fault_rate and semantic_bottom_rate whenever experimental provenance diagnostics are supplied. - decode_strict_censored_3way(...) now accepts optional experimental provenance diagnostics so DOSE-style direction heads can stay focused on semantic bottoms while fault-like bottoms fall back to sign(P). - route_to_analytic_solver(...) now accepts optional experimental provenance diagnostics so robotics-style analytic fallback can skip semantic bottoms while still routing fault-like bottoms and other invalid samples. - generate_validation_report(...) now shows experimental provenance schema details plus fault-vs-semantic bottom-rate breakdowns whenever bundle metadata or benchmark artifacts provide them. - DOSE operating-point calibration now down-weights semantic bottoms relative to fault bottoms whenever experimental provenance split rates are present, so provenance-bearing runs can choose tau_infer with a finer-grained bottom-cost signal. - Experimentally configured bundle sidecars now declare a versioned experimental_inference_output_schema so tooling can distinguish the split_masks and bottom_provenance diagnostic layouts without changing stable ONNX outputs. - ONNX bundle metadata.json now declares strict_inference_exports so deployment tooling can distinguish current merged-mask bundles from future provenance-bearing output contracts. - Clarified the Q1 bottom-mask provenance-source decision: use inference/diagnostic signals first and keep any model/training-time provenance path behind the Q2 evidence gate. - Defined the bottom-mask provenance rollout stages explicitly in the design memo: Q1 opt-in diagnostics, Q2 promotion review, and an explicit post-Q2 disposition. - Added measurable Q2 promotion criteria to the bottom-mask provenance design memo, including value thresholds, non-regression gates, and repeatability requirements. - Added committed golden bundle/report fixtures for the strict inference export contract, with snapshot tests that fail on accidental metadata or validation-report drift. - Added fixture-backed compatibility tests that validate bundles exported by the v0.4.2 and v0.4.3 release lines. - Added focused inference bundle tests that round-trip export_bundle(...) metadata through the loader/validator and assert the expected bundle file structure. - Added strict inference contract tests covering output ordering, merged bottom-mask provenance semantics, and bundle metadata/order validation. - Added golden-fixture tests that snapshot the opt-in provenance result contract for strict_inference(...) and SCMInferenceWrapper. - Added explicit wheel/sdist install smoke tests in isolated virtualenvs and wired them into the build:dist CI job. - Build/release packaging now emits a CycloneDX SBOM from the built wheel and keeps Twine checks/uploads scoped to the actual distribution artifacts. - Release checklist guidance now requires release notes to cite generated-artifact provenance (artifacts/paper_2026/manifest.json, run-local manifest.json / provenance.json) plus third-party dependency reports (*.sbom.cdx.json, *.licenses.json). - Added a scheduled/manual GitLab security:vulnerability-audit job that runs pip-audit against the repo requirement manifests, and documented that live advisory scanning stays out of the tag-triggered release lane. - Added a test:viz-extra GitLab CI lane that installs the viz extra and runs the optional plotting/logging smoke tests. - Added a minimal test:jax-extra GitLab CI lane that installs the jax extra and runs focused JAX SCM smoke tests on CPU. - Re-audited the API stability map in docs/08_api_reference.md to use canonical zeroproofml.* module names and capture the currently supported metrics, projective-rational, and benchmark entry points. - Documented the visualization architecture decision: keep zeroproofml.utils.viz as experimental lightweight primitives, reserve higher-level reports for a separate layer, and keep zeroproof.utils.viz as a compatibility import. - Split the experimental zeroproofml.utils.viz helpers into grouped training, strict_inference, benchmarks, and domains submodules while preserving the existing top-level imports. - Documented the utility support boundary: zeroproofml.utils.logging now has a stable JSONL core plus experimental reporting conveniences, zeroproofml.utils.viz remains experimental, and zeroproof.* paths are called out as compatibility imports. - Marked experimental autodiff/logging/viz modules more aggressively in docs and public docstrings. - Marked zeroproofml.inference.script_module(...) as a legacy TorchScript compatibility helper in the API/deployment docs; ONNX remains the preferred deployment path. - Package version metadata now comes from zeroproof/_version.py and both zeroproof / zeroproofml import paths use that shared source (removed hard-coded fallback strings). - Documented the public namespace policy in the README and getting-started guide: zeroproofml is canonical, while zeroproof remains a supported compatibility namespace. - Clarified that the zeroproof compatibility namespace remains supported through the M4 core 1.0-or-stay-0.x decision, with no planned namespace deprecation warning before a concrete migration plan exists. - Documented the namespace layering rule for new code: SCM implementation stays under zeroproof.*, zeroproofml.* remains the canonical public import/docs surface, and zeroproofml.*-only packages are reserved for product-level entry points such as zeroproofml.benchmarks. - Added explicit pytest coverage for the supported zeroproofml and zeroproof namespace import paths. - Added a stable-surface API compatibility test table that keeps the documented zeroproofml.* exports aligned with their supported zeroproof.* compatibility imports. - Synced the release version references across packaging/runtime/docs metadata (pyproject.toml, changelogs, CITATION.cff, and MkDocs version banner/footer fields). - Added scripts/ci/check_version_sync.py and CI wiring so merges fail on version metadata drift (with release-tag validation when CI_COMMIT_TAG is set). - Added an explicit release checklist in CONTRIBUTING.md covering namespace consistency, docs-link alignment, and install smoke checks. - Documented the benchmark layout decision: scientific claim benchmarks stay in zeroproofml/benchmarks/, while the top-level performance suite is slated for perf/. - Renamed GitLab benchmark jobs to claim-benchmark:* and clarified in the README/docs that the DOSE/RF/IK scientific claim benchmarks are separate from the performance microbenchmark suite. - Benchmark domain adapters now tell the DOSE/RF paper-suite runners to skip their legacy aggregate summary write, so the canonical benchmark aggregated/summary.json no longer needs a summary_script.json backup. - Benchmark domain dispatch now invokes the benchmark Python entrypoints in-process from zeroproofml.benchmarks.domains instead of shelling out with subprocess. - Split the benchmark runners into importable domain modules under zeroproofml.benchmarks.domains.{dose,rf,ik} while keeping the existing runner API stable. - The benchmark CLI now dispatches through public run_dose_benchmark(...), run_rf_benchmark(...), and run_ik_benchmark(...) helpers instead of assembling BenchmarkConfig directly. - Corrected the CI coverage note: GitLab reports SCM-suite coverage and updates the badge, but does not enforce --cov-fail-under=90. - GitLab CI now keeps merge-request benchmark jobs in smoke mode and exposes 10-seed paper-mode benchmark runs as scheduled/manual jobs on the default branch. - GitLab CI now includes a scheduled/manual ONNX bundle compatibility job that runs tests/inference/test_export_compatibility.py with onnxruntime. - GitLab CI now fans the scheduled/manual ONNX export compatibility lane out across curated Torch, ONNX, and onnxruntime version triples, running both ONNX bundle validation and onnxruntime roundtrip tests. - Added a committed artifacts/paper_2026/ replay bundle with pinned commands, config snapshots, expected outputs, tolerance bands, and SHA256 inputs for the current paper-facing runs. - Added make reproduce-paper plus per-domain make reproduce-* shortcuts for the frozen paper replay commands and the reference robotics deployment run. - Added pinned CPU/GPU paper-rerun container recipes to artifacts/paper_2026/manifest.json, including base image digests and hashed build inputs. - Added explicit schema versions to run manifests and unified provenance validation on shared benchmark schema constants. - Benchmark provenance/manifests now record per-seed dataset fingerprints for generated configs and file-backed datasets. - Benchmark provenance/manifests now record SHA256 hashes for saved checkpoints, discovered bundle directories, and post-processed summary files. - Benchmark provenance now snapshots whether the git worktree was dirty when a benchmark run started, rather than evaluating that flag only after artifacts were written. - Benchmark provenance now records CPU/GPU model details, OS/Python runtime info, Torch and ONNX/onnxruntime versions, and installed optional backend versions. - Added validate_run_dir(...) to verify benchmark run directories have the required artifacts, valid JSON schemas, and on-disk hash consistency. - Paper-mode benchmark runs now support --resume, persist resume_state.json at run start, and record original plus resumed attempt metadata in provenance.json. - Moved the Phase 17 dose/RF paper-suite runners plus the RR IK dataset/comparison implementations into importable zeroproofml.benchmarks.domains.* modules, with the legacy entrypoint files kept as thin compatibility wrappers over those Python APIs. - CI now exercises the importable smoke-mode benchmark path directly in pytest, and the legacy compatibility wrappers emit runtime DeprecationWarning messages that point to the supported Python APIs. - Expanded the GitLab test stage with an explicit Python 3.10-3.13 matrix so CI matches the versions advertised in packaging metadata. - Marked DOI-backed code-artifact archival as non-executable locally because minting the snapshot DOI requires an authenticated external archive upload. - Added typed benchmark schema dataclasses for per-seed raw results, summaries, paired stats, claim audits, manifests, and provenance artifacts. - Benchmark per-seed raw result artifacts now carry explicit schema names/versions, and the domain adapters return/write canonical per-seed payloads before benchmark metadata generation (with runner backfill kept only for legacy test doubles that still return paths alone). - Benchmark artifact loaders now fail fast with a clear compatibility error when asked to read older run schemas; migration loaders are still deferred. - Benchmark smoke and paper runs now use seed_{n} directories consistently across the DOSE, RF, and IK domains; the RF suite still reads legacy quick_s{n} folders when aggregating older artifacts. - Benchmark smoke and paper runs now normalize all per-seed raw artifacts to seed_*/per_seed_result.json, while transparently migrating legacy domain-specific filenames during harness execution and resume.

[0.4.3] - 2026-03-03¶

Release focused on documentation coherence & reproducibility entry points.

Added - Experiments & reproduction landing page (docs/21_experiments.md) with benchmark commands, artifact contract, and provenance-based baseline comparison. - API reference stability map (stable vs experimental) in docs/08_api_reference.md.

Changed - Legacy experimental suite guide is now explicitly archived and points to the current reproduction entry points (scripts/EXPERIMENTAL_SUITE_README.md). - zeroproof.autodiff.graph is explicitly documented as experimental.

Fixed - GitLab CI coverage regex now matches pytest-cov's TOTAL … % output (coverage badge). - ONNX export CI now installs onnxscript, and export_bundle(...) defaults to opset 18 for Torch 2.8 compatibility. - Ubuntu CI jobs now install Python deps in a venv to avoid PEP 668 externally-managed-environment failures. - CI now installs CPU-only PyTorch wheels on Linux to avoid pulling CUDA libraries (disk exhaustion).

[0.4.2] - 2026-03-01¶

Release focused on observability & interop (NumPy parity, export bundles, and lightweight plotting/logging utilities).

Added - NumPy parity for projective tooling: lift_targets_numpy(...) and NumPy dispatch for strict SCM inference. - ONNX “deployment bundle” helper export_bundle(...) that writes model.onnx plus a metadata.json sidecar. - Optional visualization helpers (zeroproof.utils.viz) and first-party log hooks (zeroproof.utils.logging) including JSONL readers / DataFrame loader (via the viz extra). - Trainer config additions: TrainingConfig.val_loader, use_amp alias for AMP, and gradient_policy override; validation summaries are stored in SCMTrainer.val_history. - GitLab CI pipeline with a manual, tag-only PyPI publish job (.gitlab-ci.yml).

Changed - gradient_policy(...) now overrides registered per-layer defaults while the context is active. - Repository/documentation links updated to GitLab (gitlab.com/domezsolt/zeroproofml).

[0.4.0] - 2025-12-11¶

Release completing the SCM migration and preparing PyPI publication.

Added - SCM-first documentation and README refresh describing weak sign tracking, projective tuples, and absorptive arithmetic. - Benchmark regression gate (scripts/ci/benchmark_gate.py) and CI wiring for SCM-specific pytest markers and coverage thresholds. - License headers across the v0.4 codebase to match release compliance requirements.

Changed - CI pipelines now run strict linting (black, ruff, isort), mypy --strict, and publish SCM-suite coverage reports/artifacts. - Benchmarks workflow uses the SCM benchmark runner and fails on malformed or slow results. - README development commands align with the tightened release checks.

[0.2.0] - 2025-10-29¶

Integration-focused release with unified runner, parity, and docs.

Added - Unified integration runner: scripts/run_integration_suite.py - Runs focused unit tests (Torch/JAX + parity), writes a consolidated log, and saves parity_results.json. - CPU‑friendly defaults (threads capped on ≤4‑thread CPUs), headless plotting, disables CUDA visibility, enables JAX x64. - Parity helpers: zeroproof/utils/parity.py - run_backend_parity and parity_within_tolerance across NumPy, PyTorch, and JAX. - Torch parity enabled; JAX parity enabled on CPU. - Determinism test for Torch: tests/unit/test_torch_determinism.py. - Documentation: docs/11_integrations.md (framework integrations & runner) and index link. PyPI install instructions now primary. - README simplified with vision/goal and links to full docs and robotics example (Zenodo).

Changed - JAX bridge (custom_vjp): backward signatures updated for cross‑version compatibility (removed nondiff_argnums; pass scalars via residuals). - JAX deterministic sum tolerance relaxed to atol=1e-5 for CPU order stability. - Integration runner hardens imports (ensures repo root on sys.path) and environment (JAX x64, thread caps). - Torch parity uses non‑grad inputs to avoid graph reuse/backward issues in integration context.

Fixed - JAX tr_div backward now masks gradients when denominator is non‑REAL or zero; gradient clipping rewritten using JAX arrays to avoid tracer boolean conversion. - Parity “No module named 'zeroproof'” import issue resolved by inserting repo root into sys.path in the runner. - JAX dtype warnings suppressed by enabling x64 in runner and parity path.

[0.1.0] - 2025-10-01¶

First external-ready release candidate (repo hardening).

Added - PEP 621 packaging via pyproject.toml with extras: torch, jax, all, dev. - py.typed marker to ship type information for the public API. - CI release workflow to build wheels/sdist and publish on tag. - Dependabot configuration for dependency updates. - Initial mypy configuration scoped to the public API.

Changed - CI matrix aligned to Python 3.9–3.12; lint job installs black, ruff, isort, mypy. - README installation section clarifies install-from-source and extras.

Fixed - Minor import in evaluator utilities to avoid NameError when generating default evaluation grid. - Hybrid trainer now aggregates per‑sample losses via a balanced pairwise reduction to bound graph depth and avoid recursion issues during backprop.

Notes - Until PyPI publication, use pip install -e .[dev] for development and pip install -e .[all] for full features. - Torch/JAX remain optional dependencies; import zeroproof should work without them.