Training Guide¶

This guide covers the v0.6.0 trainer loop and how to combine SCM semantics with coverage-aware optimisation.

Trainer Overview¶

zeroproofml.training.trainer.SCMTrainer implements the reference loop with mixed precision, gradient accumulation, and coverage-based early stopping.
zeroproofml.training.targets.lift_targets is the legacy/simple sentinel path for finite values plus NaN/Inf bottom labels. It remains supported for compatibility and small demos, but it is not the preferred audit path because sentinels cannot distinguish censored, domain-invalid, missing, and fault labels.
For auditable datasets with explicit label semantics, prefer zeroproofml.training.targets.lift_semantic_targets(values, status_labels) with status labels finite, bottom, censored_below, censored_above, domain_invalid, missing, or fault. It returns SemanticTargets carrying (Y_n, Y_d), finite/bottom masks, censored orientation labels, and bottom-kind codes that distinguish semantic bottoms from faults.
SCMTrainer accepts typed batches as (inputs, values, status_labels) and passes a semantic_targets keyword argument to loss functions that opt into it. Losses should use that object to mask labels such as missing, which are neither finite training targets nor bottom targets.
Thresholds are perturbed per batch (perturbed_threshold) to reduce train/infer gaps.

zeroproofml.training.trainer.TrainingConfig controls the trainer loop:

Epochs/updates: max_epochs, gradient_accumulation_steps
AMP: mixed_precision (alias: use_amp) and amp_dtype
Thresholds: tau_train_min, tau_train_max, and strict tau_infer
Coverage early-stop: coverage_threshold, coverage_patience
Logging: log_hook(metrics) (see 15_debug_logging.md)
Validation: val_loader runs once per epoch; aggregated metrics are stored in trainer.val_history and emitted to log_hook with val_-prefixed keys.
Gradient policy override: gradient_policy applies a global GradientPolicy override during training steps (see 03_gradient_policies.md).
Bottom capability check: if lifted targets contain bottom labels (inf/NaN) and a projective head reports bottom_capability(tau_infer) == "unreachable_by_construction", the trainer raises before optimizing. Use allow_bottom_unreachable=True only when those labels are intentional noise or outside the current task.
Loss curricula (optional): loss_curriculum can produce per-epoch loss_weights that are passed into loss functions that accept loss_weights (and epoch / global_step).

Prepare data with explicit semantic status labels when available; the trainer lifts (inputs, values, status_labels) batches to (Y_n, Y_d).
Select gradient policy (usually CLAMP for SCM-only graphs or PROJECT for projective heads).
Assemble losses: implicit + margin + sign consistency + rejection (via SCMTrainingLoss).
Train loop:
forward pass (SCM or projective mode),
compute losses and coverage,
backprop using the active gradient policy,
update optimiser (supports AMP via torch.amp on PyTorch 2.x).
Monitor coverage; early stop when coverage stays below coverage_threshold for coverage_patience epochs.

Treat NaN outputs as ⊥ when computing coverage during training.
Keep τ_train_min and τ_train_max close unless you specifically need stronger perturbations.
Log last_thresholds from the trainer to understand how often the model sees near-singular regimes.