Projective Training Developer Guide¶
This guide shows how the (N, D) projective tuples flow through training and how they are decoded back to SCM values during inference.
Tuple Lifecycle¶
- Encoding: Lift finite SCM values with
φ(x) = (x, 1)and represent absorptive bottom asφ(⊥) = (1, 0)(or(s, 0)when sign metadata is tracked). - Detached renormalization: Keep tuples bounded with a stop-gradient scale:
python S = sg(torch.sqrt(N ** 2 + D ** 2)) + gamma N_hat, D_hat = N / S, D / Ssg(·)prevents gradients from leaking through the norm;gammaavoids division by zero when tuples sit on the equator. - Denominator anchoring (recommended): For finite-target regression heads, anchor denominators around 1 to avoid drift into tiny-but-nonzero values that amplify decoded ratios:
python Q = 1 + delta_Q P, Q = projective.renormalize(P, Q, gamma=1e-9)This keeps the projective direction learnable while biasing the representation toward the finite chart. - Gradient flow: Autograd runs on
(N_hat, D_hat). Losses that depend on decoded SCM values should decode after renormalization to avoid biasing the tuple scale.
Decoding Back to SCM¶
- Inference decode: Use the inverse map
φ⁻¹(N, D) = N / DwhenD ≠ 0; emit⊥whenD = 0. Apply any inference-time gap thresholds (e.g.,|Q| < τ_infer) after decoding. - Bridge boundaries: When projective regions hand off to SCM-only layers, decode immediately after the last renormalization step to maintain the training distribution seen by downstream components.
- Logging: Surface both the tuple norm
‖(N, D)‖and the decoded SCM value in debug logs so divergence between training tuples and inference values is visible.
Minimal Training Skeleton¶
from zeroproofml.autodiff import projective
# Forward: lift and renormalize
N, D = projective.encode(batch) # φ(x)
N, D = projective.renormalize(N, D, gamma=1e-9)
# Losses on tuples or decoded SCM values
outputs = projective.decode(N, D) # φ⁻¹(N, D)
loss = loss_fn(outputs, targets)
loss.backward()
# Inference: decode only
with torch.no_grad():
outputs = projective.decode(N, D)
Keep a single renormalization site per forward pass to avoid scale drift, and ensure inference uses the same decode path invoked during training.
Note on the Implicit Loss Scale¶
The implicit cross-product fit loss uses a squared-norm scale factor (not a square root):
cross = P * Y_d - Q * Y_n
scale_sq = Q**2 * Y_d**2 + P**2 * Y_n**2 + gamma
loss = mean((cross**2) / scale_sq)
With the default detach_scale=False, the scale participates in the backward
pass. In the well-conditioned regime where gamma is negligible, this matches
the projective invariance contract described in 02_projective_learning.md:
the radial derivative along (P, Q) is near zero.
Passing detach_scale=True applies sg(...) to the scale factor. The backward
pass then treats the denominator as constant: for a nonzero residual, scaling
(P, Q) outward makes the loss grow approximately like alpha**2. The radial
derivative is positive, so gradient descent applies an inward radial shrink force.
Treat detached scale as an explicit legacy/ablation heuristic.
Before any future default flip, run python scripts/run_complete_ablations.py
and use the default_policy/default_policy_ablation.json control/treatment
manifest as the recorded evidence for the detach-scale comparison.