Training Guide

This guide covers the v0.4 trainer loop and how to combine SCM semantics with coverage-aware optimisation.

Trainer Overview

  • training.trainer.SCMTrainer implements the reference loop with mixed precision, gradient accumulation, and coverage-based early stopping.
  • Targets are lifted to projective tuples via training.targets.lift_targets to unify finite and infinite labels.
  • Thresholds are perturbed per batch (perturbed_threshold) to reduce train/infer gaps.

TrainingConfig

zeroproof.training.trainer.TrainingConfig controls the trainer loop:

  • Epochs/updates: max_epochs, gradient_accumulation_steps
  • AMP: mixed_precision (alias: use_amp) and amp_dtype
  • Thresholds: tau_train_min, tau_train_max
  • Coverage early-stop: coverage_threshold, coverage_patience
  • Logging: log_hook(metrics) (see 15_debug_logging.md)
  • Validation: val_loader runs once per epoch; aggregated metrics are stored in trainer.val_history and emitted to log_hook with val_-prefixed keys.
  • Gradient policy override: gradient_policy applies a global GradientPolicy override during training steps (see 03_gradient_policies.md).
  • Loss curricula (optional): loss_curriculum can produce per-epoch loss_weights that are passed into loss functions that accept loss_weights (and epoch / global_step).

Typical Flow

  1. Prepare data with finite/infinite labels; lift to (Y_n, Y_d) inside the trainer.
  2. Select gradient policy (usually CLAMP for SCM-only graphs or PROJECT for projective heads).
  3. Assemble losses: implicit + margin + sign consistency + rejection (via SCMTrainingLoss).
  4. Train loop:
  5. forward pass (SCM or projective mode),
  6. compute losses and coverage,
  7. backprop using the active gradient policy,
  8. update optimiser (supports AMP via torch.amp on PyTorch 2.x).
  9. Monitor coverage; early stop when coverage stays below coverage_threshold for coverage_patience epochs.

Tips

  • Treat NaN outputs as ⊥ when computing coverage during training.
  • Keep τ_train_min and τ_train_max close unless you specifically need stronger perturbations.
  • Log last_thresholds from the trainer to understand how often the model sees near-singular regimes.