Training Guide¶
This guide covers the v0.4 trainer loop and how to combine SCM semantics with coverage-aware optimisation.
Trainer Overview¶
training.trainer.SCMTrainerimplements the reference loop with mixed precision, gradient accumulation, and coverage-based early stopping.- Targets are lifted to projective tuples via
training.targets.lift_targetsto unify finite and infinite labels. - Thresholds are perturbed per batch (
perturbed_threshold) to reduce train/infer gaps.
TrainingConfig¶
zeroproof.training.trainer.TrainingConfig controls the trainer loop:
- Epochs/updates:
max_epochs,gradient_accumulation_steps - AMP:
mixed_precision(alias:use_amp) andamp_dtype - Thresholds:
tau_train_min,tau_train_max - Coverage early-stop:
coverage_threshold,coverage_patience - Logging:
log_hook(metrics)(see15_debug_logging.md) - Validation:
val_loaderruns once per epoch; aggregated metrics are stored intrainer.val_historyand emitted tolog_hookwithval_-prefixed keys. - Gradient policy override:
gradient_policyapplies a globalGradientPolicyoverride during training steps (see03_gradient_policies.md). - Loss curricula (optional):
loss_curriculumcan produce per-epochloss_weightsthat are passed into loss functions that acceptloss_weights(andepoch/global_step).
Typical Flow¶
- Prepare data with finite/infinite labels; lift to
(Y_n, Y_d)inside the trainer. - Select gradient policy (usually
CLAMPfor SCM-only graphs orPROJECTfor projective heads). - Assemble losses: implicit + margin + sign consistency + rejection (via
SCMTrainingLoss). - Train loop:
- forward pass (SCM or projective mode),
- compute losses and coverage,
- backprop using the active gradient policy,
- update optimiser (supports AMP via
torch.ampon PyTorch 2.x). - Monitor coverage; early stop when coverage stays below
coverage_thresholdforcoverage_patienceepochs.
Tips¶
- Treat NaN outputs as ⊥ when computing coverage during training.
- Keep
τ_train_minandτ_train_maxclose unless you specifically need stronger perturbations. - Log
last_thresholdsfrom the trainer to understand how often the model sees near-singular regimes.