Loss Functions in v0.6.0¶

ZeroProofML v0.6.0 combines multiple losses to stabilise training near singularities and to distinguish projective orientations.

Implicit Loss¶

Cross-product form that avoids direct division: [ E = (P \cdot Y_d - Q \cdot Y_n)^2, \quad L = \operatorname{mean}\left( \frac{E}{Q^2 Y_d^2 + P^2 Y_n^2 + \gamma} \right) ] Handles poles by keeping gradients defined even when Q → 0. The default detach_scale=False keeps the scale factor attached in the backward pass, preserving the ideal homogeneous loss's near-zero radial gradient away from the γ floor. Pass detach_scale=True to recover the legacy detached-scale heuristic, which treats the denominator as constant and therefore creates an inward radial shrink force for nonzero residuals.

Margin Loss¶

Encourages denominators to stay away from zero during training: [ L_{\text{margin}} = \operatorname{mean}(\max(0, \tau_{train} - |Q|)^2) ] Optionally masked to finite paths. margin_loss(..., reduction="population") keeps the default population risk by averaging the masked penalties over the whole batch. reduction="conditional" averages over finite targets only and returns zero when the finite-target mask is empty.

With a finite-target event (F), population reduction controls: [ \Pr(F\cap B) \leq \frac{L_{\text{margin,pop}}}{(\tau_{\text{train}}-\tau_{\text{infer}})^2} ]

Conditional finite-label reporting instead uses the finite-label probability: [ \Pr(B\mid F) \leq \frac{L_{\text{margin,pop}}} {\Pr(F)(\tau_{\text{train}}-\tau_{\text{infer}})^2}. ]

Sign Consistency Loss (Singular Orientation)¶

Projective cosine similarity to align (P, Q) with target orientation tuples (Y_n, Y_d) on singular/censored labels or another explicitly signed task domain. It is not a generic finite-regression loss; finite projective targets that are equivalent up to a global sign should normally be handled by implicit_loss rather than an orientation penalty.

When mask_singular is omitted, sign_consistency_loss and SCMTrainingLoss apply the sign term only to targets with abs(Y_d) <= epsilon_sing. Pass mask_singular explicitly to supervise finite orientations or another signed task domain.

Coverage & Rejection¶

Coverage metric: fraction of outputs that are finite (non-⊥).
Hard rejection penalty: rejection_loss(is_bottom, ...) penalises measured coverage below a target threshold from an already-thresholded mask. This hard-mask path is detached and should be treated as metric-style accounting, not as a differentiable gate-training signal.
Soft coverage surrogate: soft_coverage_loss(Q, tau, temperature, target) approximates finite-output probability from denominator magnitude: [ p_{\text{finite}}(Q)=\sigma\left(\frac{|Q|-\tau}{T}\right). ] The default mode="under" penalises under-coverage as [ \max(0,c_{\text{target}}-\mathbb E[p_{\text{finite}}])^2. ]

Combined Training Objective¶

SCMTrainingLoss mixes the components: [ L = L_{fit} + \lambda_{margin} L_{margin} + \lambda_{sign} L_{sign} + \lambda_{rej} L_{rej} ] Default hyperparameters: γ=1e-9, τ_{train}=1e-4, ε_{sing}=1e-3, λ_{margin}=0.1, λ_{sign}=1.0.