Projective Learning Mode

Projective learning lifts selected subgraphs to homogeneous tuples (N, D) so training occurs on a smooth manifold while inference retains strict SCM semantics.

When to Use

  • Rational heads that should avoid instantiating ⊥ during training.
  • Safety-critical outputs where distinguishing +∞ vs −∞ matters (use with sign consistency loss).
  • Scenarios where gradient dead zones around Q ≈ 0 hurt convergence.

Forward/Backward Contract

  • Encoding: φ(x) = (x, 1) for finite values; φ(⊥) = (1, 0).
  • Decoding: φ⁻¹(N, D) = N/D when D ≠ 0, otherwise ⊥.
  • Detached renormalisation: (N, D) ← (N, D) / sg(√(N² + D²) + γ) to keep tuples bounded without leaking gradients through the norm.
  • Gradients: Standard autograd on (N, D); coverage/penalties computed after decoding.

Integration Steps

  1. For simple sentinel-encoded streams, lift targets with training.targets.lift_targets; for audited target labels, prefer training.targets.lift_semantic_targets.
  2. Use GradientPolicy.PROJECT inside projective regions to mask gradients when a path decodes to ⊥.
  3. Combine implicit, margin, and sign-consistency losses to shape the tuple dynamics.
  4. Decode to SCM at boundaries and apply coverage/rejection losses there.

Gap Region

Training uses stochastic thresholds (τ_train_min, τ_train_max) to avoid learning a brittle boundary at exactly τ_train. Inference sets a fixed τ_infer and returns ⊥ when |Q| < τ_infer.

When τ_train > τ_infer, the interval τ_infer ≤ |Q| < τ_train is the gap region where inference still returns a finite value but the denominator is small enough to be numerically risky. Use strict_inference(..., InferenceConfig(tau_infer=τ_infer, tau_train=τ_train)) to obtain an explicit gap_mask for monitoring.

Gauge Policy

Projective tuples are scale-equivalent: (P, Q) and (alpha * P, alpha * Q) decode to the same finite scalar when Q != 0. Use zeroproofml.autodiff.projective.GaugePolicy metadata to state which magnitude convention a head or exported bundle uses:

  • GaugePolicy.CANONICAL_DENOMINATOR keeps the denominator magnitude exactly as exported by the trained head. This is the compatibility default.
  • GaugePolicy.UNIT_L2_PROJECTIVE applies ProjectiveNormalize so denominator thresholds operate on (P, Q) / sqrt(P**2 + Q**2).
  • GaugePolicy.ANGULAR_UNIT_CIRCLE records the unit-circle convention used by angular heads; ProjectiveNormalize enforces the same unit L2 magnitude but does not replace head-specific orientation or wrap rules.
from zeroproofml.autodiff.projective import GaugePolicy, ProjectiveNormalize

normalize = ProjectiveNormalize(GaugePolicy.UNIT_L2_PROJECTIVE)
P_hat, Q_hat = normalize(P, Q)

Angular parameterization (unit-circle tuples)

For some censoring / boundary problems, you can eliminate the free tuple magnitude by predicting an angle θ and emitting a unit-circle tuple:

  • P = cos(θ)
  • Q = sin(θ)

This removes the projective scale gauge and can make strict gating around Q≈0 more symmetric/stable.

import torch
import torch.nn as nn

from zeroproofml.inference import InferenceConfig, SCMInferenceWrapper
from zeroproofml.layers.angular_projective import AngularProjectiveHead
from zeroproofml.losses.implicit import implicit_loss
from zeroproofml.training import SCMTrainer, TrainingConfig
from torch.utils.data import DataLoader, TensorDataset


class ToyAngularProjective(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.backbone = nn.Sequential(nn.Linear(1, 32), nn.ReLU(), nn.Linear(32, 32), nn.ReLU())
        self.head = AngularProjectiveHead(input_dim=32, output_dim=1, theta_scale=1.0)

    def forward(self, x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
        h = self.backbone(x)
        p, q = self.head(h)
        return p.squeeze(-1), q.squeeze(-1)


model = ToyAngularProjective()
wrapped = SCMInferenceWrapper(model, config=InferenceConfig(tau_infer=1e-6, tau_train=1e-4))

# Train on smooth tuples (wrapper in train mode passes (P, Q) through unchanged).
x = torch.linspace(-1.0, 1.0, 512).unsqueeze(-1)
y = 1.0 / (x + 0.1)  # may include large-magnitude targets near the pole
train_loader = DataLoader(TensorDataset(x, y.squeeze(-1)), batch_size=128, shuffle=True)


def loss_fn(outputs, lifted_targets):
    p, q = outputs
    y_n, y_d = lifted_targets
    return implicit_loss(p, q, y_n, y_d)


opt = torch.optim.AdamW(wrapped.parameters(), lr=1e-3)
trainer = SCMTrainer(model=wrapped, optimizer=opt, loss_fn=loss_fn, train_loader=train_loader)
trainer.fit()

# Infer on strict (wrapper in eval mode decodes + emits masks).
wrapped.eval()
decoded, bottom_mask, gap_mask = wrapped(x)