Debug Logging (v0.6.0 SCM)

ZeroProofML v0.6.0 keeps logging lightweight: the core trainer exposes a per-step log_hook callback, and most examples rely on standard Python logging / printing.

zeroproofml.utils.logging is split into a stable core (JsonlLogger, read_jsonl) and experimental reporting conveniences (TensorBoardLogger, jsonl_to_dataframe). The legacy zeroproof.utils.logging path remains supported as a compatibility import, but the canonical docs/import path is zeroproofml.*. The same module also provides experimental metric_log_records_to_wide_rows(...) and metric_log_records_to_long_rows(...) helpers for dashboard/CSV/BI pipelines that need flattened metric rows.

Prefer the stable JSONL path (JsonlLogger, read_jsonl) in release-facing integrations; the TensorBoard and DataFrame helpers below remain experimental and may change without notice.

Python logging basics

import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("zeroproof").setLevel(logging.INFO)

Capturing per-step metrics from SCMTrainer

zeroproofml.training.trainer.TrainingConfig accepts a log_hook(metrics) callable. The trainer calls it with a versioned JSONL metric record:

  • schema_name: zeroproofml.metric_log
  • schema_version: 1
  • record_type: metric
  • phase: train or eval
  • step / epoch: trainer position when available
  • metrics: the canonical metric payload

For compatibility with older flat log readers, non-reserved metric keys are also mirrored at the record top level.

The metrics payload contains at least: - loss (float) - coverage (float; fraction of non-⊥ predictions) and may also include: - tau_train (float; sampled threshold for the current step) - bottom_frac (float; 1 - coverage) - denom_abs_min, denom_abs_mean (floats; available when the model outputs a projective (P, Q) tuple)

Validation records use phase="eval" and val_-prefixed metric names.

JSONL logger (stable)

from zeroproofml.training.trainer import TrainingConfig
from zeroproofml.utils.logging import JsonlLogger

cfg = TrainingConfig(log_hook=JsonlLogger("runs/scm_train_metrics.jsonl"))

Manual evaluation logs can use the same schema helper:

from zeroproofml.utils.logging import JsonlLogger, metric_log_record

logger = JsonlLogger("runs/evaluation_metrics.jsonl")
logger(metric_log_record({"mse": 0.012, "coverage": 0.99}, phase="eval", step=1))

Benchmark runs use the same stable JSONL schema. The harness writes aggregated/benchmark_metrics.jsonl with benchmark_seed, benchmark_summary, and optional benchmark_delta records:

from zeroproofml.utils.logging import JsonlLogger, metric_log_record

logger = JsonlLogger("results/benchmarks/dose/run_x/aggregated/benchmark_metrics.jsonl")
logger(
    metric_log_record(
        {"macro_f1": 0.91, "bottom_rate": 0.08},
        phase="benchmark_seed",
        step=1,
        context={"domain": "dose", "mode": "smoke", "model": "strict", "seed": 1},
        record_type="benchmark_seed",
    )
)

TensorBoard logger (experimental)

from zeroproofml.training.trainer import TrainingConfig
from zeroproofml.utils.logging import TensorBoardLogger

cfg = TrainingConfig(log_hook=TensorBoardLogger("runs/tensorboard"))

For benchmark debugging, use TensorBoard only as a local mirror and keep JsonlLogger as the release-facing artifact:

from zeroproofml.utils.logging import TensorBoardLogger

tb_logger = TensorBoardLogger("results/benchmarks/dose/run_x/tensorboard", step_key="step")
tb_logger({"step": 1, "macro_f1": 0.91, "bottom_rate": 0.08})

Loading JSONL logs

Read the JSONL file back into memory:

from zeroproofml.utils.logging import read_jsonl

records = read_jsonl("runs/scm_train_metrics.jsonl")

If you have pandas installed, the experimental DataFrame helper loads the same JSONL file as a DataFrame:

from zeroproofml.utils.logging import jsonl_to_dataframe

df = jsonl_to_dataframe("runs/scm_train_metrics.jsonl")

Aggregating multi-seed logs

Use aggregate_metric_logs(...) when several seeds or reruns emit the same JSONL metric schema. The helper reads nested metrics payloads and legacy flat records, keeps only finite numeric metrics, and returns rows with count, mean, sample std, min, max, first, and latest values.

from zeroproofml.utils.logging import aggregate_metric_logs

rows = aggregate_metric_logs(
    {
        "seed_1": "runs/seed_1/metrics.jsonl",
        "seed_2": "runs/seed_2/metrics.jsonl",
    },
    group_by=("phase", "step"),
)

Group keys are read from the record first and then from context, so group_by=("phase", "step") gives step-aligned mean/std curves across seeds, while group_by=("run_id", "phase") keeps reruns separate. For path lists, read_metric_log_runs(...) also annotates records with a path-derived run_id, source_path, and a seed value when the path includes common seed_{n} or quick_s{n} directories.

Converting logs for dashboards and downstream tooling

Use the experimental conversion helpers when you need one row per record or one row per metric value:

from zeroproofml.utils.logging import (
    metric_log_records_to_long_rows,
    metric_log_records_to_wide_rows,
    read_metric_log_runs,
)

records = read_metric_log_runs(["runs/seed_1/metrics.jsonl", "runs/seed_2/metrics.jsonl"])
wide_rows = metric_log_records_to_wide_rows(records)
long_rows = metric_log_records_to_long_rows(records)

wide_rows keeps one row per trainer/eval record and flattens metrics with a metric_ prefix plus context fields such as context_run_id / context_seed. long_rows emits one row per metric/value pair with metric and value columns, which is often easier for charting tools and SQL-style ingestion.

Minimal manual example (JSONL)

import json
from pathlib import Path

from zeroproofml.training.trainer import TrainingConfig, SCMTrainer

log_path = Path("runs/scm_train_metrics.jsonl")
log_path.parent.mkdir(parents=True, exist_ok=True)

def log_hook(metrics):
    with log_path.open("a", encoding="utf-8") as fh:
        fh.write(json.dumps({k: (float(v) if hasattr(v, "__float__") else v) for k, v in metrics.items()}) + "\n")

cfg = TrainingConfig(log_hook=log_hook)
# trainer = SCMTrainer(..., config=cfg)
# trainer.fit()

Debugging propagation

  • For Torch layers, inspect bottom_mask directly (e.g., from SCMRationalLayer.forward).
  • For vectorised backends (zeroproofml.scm.ops), log both (payload, mask); the payload is intentionally zeroed at mask=True positions to keep tensor math stable.