Loss Functions in v0.6.0¶
ZeroProofML v0.6.0 combines multiple losses to stabilise training near singularities and to distinguish projective orientations.
Implicit Loss¶
Cross-product form that avoids direct division:
[ E = (P \cdot Y_d - Q \cdot Y_n)^2, \quad L = \operatorname{mean}\left( \frac{E}{Q^2 Y_d^2 + P^2 Y_n^2 + \gamma} \right) ]
Handles poles by keeping gradients defined even when Q → 0.
The default detach_scale=False keeps the scale factor attached in the
backward pass, preserving the ideal homogeneous loss's near-zero radial
gradient away from the γ floor. Pass detach_scale=True to recover the
legacy detached-scale heuristic, which treats the denominator as constant and
therefore creates an inward radial shrink force for nonzero residuals.
Margin Loss¶
Encourages denominators to stay away from zero during training:
[ L_{\text{margin}} = \operatorname{mean}(\max(0, \tau_{train} - |Q|)^2) ]
Optionally masked to finite paths. margin_loss(..., reduction="population")
keeps the default population risk by averaging the masked penalties over the
whole batch. reduction="conditional" averages over finite targets only and
returns zero when the finite-target mask is empty.
With a finite-target event (F), population reduction controls: [ \Pr(F\cap B) \leq \frac{L_{\text{margin,pop}}}{(\tau_{\text{train}}-\tau_{\text{infer}})^2} ]
Conditional finite-label reporting instead uses the finite-label probability: [ \Pr(B\mid F) \leq \frac{L_{\text{margin,pop}}} {\Pr(F)(\tau_{\text{train}}-\tau_{\text{infer}})^2}. ]
Sign Consistency Loss (Singular Orientation)¶
Projective cosine similarity to align (P, Q) with target orientation tuples
(Y_n, Y_d) on singular/censored labels or another explicitly signed task
domain. It is not a generic finite-regression loss; finite projective targets
that are equivalent up to a global sign should normally be handled by
implicit_loss rather than an orientation penalty.
When mask_singular is omitted, sign_consistency_loss and SCMTrainingLoss
apply the sign term only to targets with abs(Y_d) <= epsilon_sing. Pass
mask_singular explicitly to supervise finite orientations or another signed
task domain.
Coverage & Rejection¶
- Coverage metric: fraction of outputs that are finite (non-⊥).
- Hard rejection penalty:
rejection_loss(is_bottom, ...)penalises measured coverage below a target threshold from an already-thresholded mask. This hard-mask path is detached and should be treated as metric-style accounting, not as a differentiable gate-training signal. - Soft coverage surrogate:
soft_coverage_loss(Q, tau, temperature, target)approximates finite-output probability from denominator magnitude: [ p_{\text{finite}}(Q)=\sigma\left(\frac{|Q|-\tau}{T}\right). ] The defaultmode="under"penalises under-coverage as [ \max(0,c_{\text{target}}-\mathbb E[p_{\text{finite}}])^2. ]
Combined Training Objective¶
SCMTrainingLoss mixes the components:
[ L = L_{fit} + \lambda_{margin} L_{margin} + \lambda_{sign} L_{sign} + \lambda_{rej} L_{rej} ]
Default hyperparameters: γ=1e-9, τ_{train}=1e-4, ε_{sing}=1e-3, λ_{margin}=0.1, λ_{sign}=1.0.