REST/gRPC Bundle Service Decision¶

Review date: 2026-04-19.

Decision¶

A minimal REST inference service around validated ONNX bundles is worthwhile as an optional integration surface, but it should stay a thin adapter over load_onnx_runtime_bundle(...) rather than becoming a new runtime stack. Do not add a gRPC service or protobuf contract yet. Keep any future HTTP server outside the base install path so the core package does not gain web-framework or serving dependencies.

REST Scope¶

The useful first service is deliberately small:

load exactly one bundle at startup from a configured bundle_dir,
fail fast through the existing bundle metadata validation path,
expose health and metadata readback endpoints for operators,
expose one inference endpoint that accepts named tensor inputs in bundle metadata order,
return the stable decoded, bottom_mask, and gap_mask fields plus any metadata needed to interpret the response, and
reuse StrictInferenceMonitor for aggregate mask/gap/provenance summaries.

The service should not include model registry behavior, live bundle reloads, auth, autoscaling, streaming, request batching, or training-stack imports. Those belong in deployment infrastructure or a later serving layer once there is a real consumer requirement.

Why REST First¶

REST covers non-ROS consumers that need a simple process boundary without linking against Python or ONNX Runtime directly.
The current bundle contract already provides the necessary runtime metadata, tensor order, mask semantics, and strict output fields.
A tiny HTTP adapter is easier to document and smoke-test than a stable protobuf API while the provenance-bearing bundle contract remains experimental.
ROS 2 remains the first-party robotics path; REST is for local services, batch jobs, notebooks, and operator handoff where ROS is unnecessary.
Keeping the server optional preserves the dependency-light core package.

Deferred gRPC Path¶

Revisit gRPC only when a concrete deployment needs one of the things REST does not provide well:

binary tensor transport with lower serialization overhead,
streaming or bidirectional request patterns,
a language-neutral IDL that multiple maintained clients will consume, or
strict integration with an existing gRPC control plane.

Until then, a protobuf surface would create another compatibility contract before the project has enough evidence that it is worth maintaining.

Relationship To Inference Servers¶

This decision does not replace the later Triton-or-similar evaluation. A minimal REST adapter is a convenience and interoperability path for one bundle per process. Triton-style serving should be evaluated separately for batching, multi-model hosting, GPU scheduling, metrics, and production deployment features.