REST/gRPC Bundle Service Decision¶
Review date: 2026-04-19.
Decision¶
A minimal REST inference service around validated ONNX bundles is worthwhile as
an optional integration surface, but it should stay a thin adapter over
load_onnx_runtime_bundle(...) rather than becoming a new runtime stack.
Do not add a gRPC service or protobuf contract yet.
Keep any future HTTP server outside the base install path so the core package
does not gain web-framework or serving dependencies.
REST Scope¶
The useful first service is deliberately small:
- load exactly one bundle at startup from a configured
bundle_dir, - fail fast through the existing bundle metadata validation path,
- expose health and metadata readback endpoints for operators,
- expose one inference endpoint that accepts named tensor inputs in bundle metadata order,
- return the stable
decoded,bottom_mask, andgap_maskfields plus any metadata needed to interpret the response, and - reuse
StrictInferenceMonitorfor aggregate mask/gap/provenance summaries.
The service should not include model registry behavior, live bundle reloads, auth, autoscaling, streaming, request batching, or training-stack imports. Those belong in deployment infrastructure or a later serving layer once there is a real consumer requirement.
Why REST First¶
- REST covers non-ROS consumers that need a simple process boundary without linking against Python or ONNX Runtime directly.
- The current bundle contract already provides the necessary runtime metadata, tensor order, mask semantics, and strict output fields.
- A tiny HTTP adapter is easier to document and smoke-test than a stable protobuf API while the provenance-bearing bundle contract remains experimental.
- ROS 2 remains the first-party robotics path; REST is for local services, batch jobs, notebooks, and operator handoff where ROS is unnecessary.
- Keeping the server optional preserves the dependency-light core package.
Deferred gRPC Path¶
Revisit gRPC only when a concrete deployment needs one of the things REST does not provide well:
- binary tensor transport with lower serialization overhead,
- streaming or bidirectional request patterns,
- a language-neutral IDL that multiple maintained clients will consume, or
- strict integration with an existing gRPC control plane.
Until then, a protobuf surface would create another compatibility contract before the project has enough evidence that it is worth maintaining.
Relationship To Inference Servers¶
This decision does not replace the later Triton-or-similar evaluation. A minimal REST adapter is a convenience and interoperability path for one bundle per process. Triton-style serving should be evaluated separately for batching, multi-model hosting, GPU scheduling, metrics, and production deployment features.