Projective Learning Mode¶
Projective learning lifts selected subgraphs to homogeneous tuples (N, D) so training occurs on a smooth manifold while inference retains strict SCM semantics.
When to Use¶
- Rational heads that should avoid instantiating ⊥ during training.
- Safety-critical outputs where distinguishing
+∞vs−∞matters (use with sign consistency loss). - Scenarios where gradient dead zones around
Q ≈ 0hurt convergence.
Forward/Backward Contract¶
- Encoding:
φ(x) = (x, 1)for finite values;φ(⊥) = (1, 0). - Decoding:
φ⁻¹(N, D) = N/DwhenD ≠ 0, otherwise ⊥. - Detached renormalisation:
(N, D) ← (N, D) / sg(√(N² + D²) + γ)to keep tuples bounded without leaking gradients through the norm. - Gradients: Standard autograd on
(N, D); coverage/penalties computed after decoding.
Integration Steps¶
- For simple sentinel-encoded streams, lift targets with
training.targets.lift_targets; for audited target labels, prefertraining.targets.lift_semantic_targets. - Use
GradientPolicy.PROJECTinside projective regions to mask gradients when a path decodes to ⊥. - Combine implicit, margin, and sign-consistency losses to shape the tuple dynamics.
- Decode to SCM at boundaries and apply coverage/rejection losses there.
Gap Region¶
Training uses stochastic thresholds (τ_train_min, τ_train_max) to avoid learning a brittle boundary at exactly τ_train. Inference sets a fixed τ_infer and returns ⊥ when |Q| < τ_infer.
When τ_train > τ_infer, the interval τ_infer ≤ |Q| < τ_train is the gap region where inference still returns a finite value but the denominator is small enough to be numerically risky. Use strict_inference(..., InferenceConfig(tau_infer=τ_infer, tau_train=τ_train)) to obtain an explicit gap_mask for monitoring.
Gauge Policy¶
Projective tuples are scale-equivalent: (P, Q) and (alpha * P, alpha * Q)
decode to the same finite scalar when Q != 0. Use
zeroproofml.autodiff.projective.GaugePolicy metadata to state which magnitude
convention a head or exported bundle uses:
GaugePolicy.CANONICAL_DENOMINATORkeeps the denominator magnitude exactly as exported by the trained head. This is the compatibility default.GaugePolicy.UNIT_L2_PROJECTIVEappliesProjectiveNormalizeso denominator thresholds operate on(P, Q) / sqrt(P**2 + Q**2).GaugePolicy.ANGULAR_UNIT_CIRCLErecords the unit-circle convention used by angular heads;ProjectiveNormalizeenforces the same unit L2 magnitude but does not replace head-specific orientation or wrap rules.
from zeroproofml.autodiff.projective import GaugePolicy, ProjectiveNormalize
normalize = ProjectiveNormalize(GaugePolicy.UNIT_L2_PROJECTIVE)
P_hat, Q_hat = normalize(P, Q)
Angular parameterization (unit-circle tuples)¶
For some censoring / boundary problems, you can eliminate the free tuple magnitude by predicting an angle θ and emitting a unit-circle tuple:
P = cos(θ)Q = sin(θ)
This removes the projective scale gauge and can make strict gating around Q≈0 more symmetric/stable.
import torch
import torch.nn as nn
from zeroproofml.inference import InferenceConfig, SCMInferenceWrapper
from zeroproofml.layers.angular_projective import AngularProjectiveHead
from zeroproofml.losses.implicit import implicit_loss
from zeroproofml.training import SCMTrainer, TrainingConfig
from torch.utils.data import DataLoader, TensorDataset
class ToyAngularProjective(nn.Module):
def __init__(self) -> None:
super().__init__()
self.backbone = nn.Sequential(nn.Linear(1, 32), nn.ReLU(), nn.Linear(32, 32), nn.ReLU())
self.head = AngularProjectiveHead(input_dim=32, output_dim=1, theta_scale=1.0)
def forward(self, x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
h = self.backbone(x)
p, q = self.head(h)
return p.squeeze(-1), q.squeeze(-1)
model = ToyAngularProjective()
wrapped = SCMInferenceWrapper(model, config=InferenceConfig(tau_infer=1e-6, tau_train=1e-4))
# Train on smooth tuples (wrapper in train mode passes (P, Q) through unchanged).
x = torch.linspace(-1.0, 1.0, 512).unsqueeze(-1)
y = 1.0 / (x + 0.1) # may include large-magnitude targets near the pole
train_loader = DataLoader(TensorDataset(x, y.squeeze(-1)), batch_size=128, shuffle=True)
def loss_fn(outputs, lifted_targets):
p, q = outputs
y_n, y_d = lifted_targets
return implicit_loss(p, q, y_n, y_d)
opt = torch.optim.AdamW(wrapped.parameters(), lr=1e-3)
trainer = SCMTrainer(model=wrapped, optimizer=opt, loss_fn=loss_fn, train_loader=train_loader)
trainer.fit()
# Infer on strict (wrapper in eval mode decodes + emits masks).
wrapped.eval()
decoded, bottom_mask, gap_mask = wrapped(x)