Marco Rinaldi

Posted on May 21

Routing Event-Camera Pipelines Through an LLM Gateway: A Field Report

#llm #machinelearning #computervision #mlops

TL;DR: We added a vision-language stage to an event-camera pipeline at Prophesee and the LLM provider routing became the messiest part. Bifrost handled the failover and the OpenAI-compatible surface without forcing us to rewrite the C++ side. Honest comparison vs LiteLLM and Portkey below.

So, the thing is, when you spend your day writing CUDA kernels for event cameras, you do not expect to also become an expert in LLM provider quotas. But here we are. A few weeks back our team at Prophesee built a small captioning service on top of our event-based object detector. The detector runs on the sensor itself, sub-1MB, quantised to int8. The captioning stage, obviously, does not. That part calls out to a vision-language model, and that is where things got annoying.

Let me give you the full picture here.

The setup

Our pipeline is the usual neuromorphic story. A Prophesee Gen4 sensor produces events, we accumulate them into time surfaces every 10ms, run a tiny YOLO-ish detector on a Jetson Orin Nano, and then for a subset of detections we want a natural-language description of what is happening. Think security analytics, where you want "person carrying a long object near the loading bay" instead of "bbox 0.87".

The captioning runs at maybe 2 Hz, not 200 Hz, so we can afford a cloud call. We started with Anthropic's Claude for the vision-language part because it handled our weird grayscale-ish event reconstructions better than the alternatives in our internal eval (37 test scenes, blind A-B with two annotators from the Milan office, Claude won 24 of them).

Then Anthropic had a regional outage in February, our pipeline went dark for 90 minutes, and the customer was not happy. That is when we started looking at gateways.

What we tried

Three candidates. I will be fair to all of them.

Tool	Language	Deploy	Failover	Built-in cache
LiteLLM	Python	pip / Docker	Yes	Redis-backed
Portkey	Hosted + OSS	SaaS-first	Yes	Yes
Bifrost	Go	npx / Docker	Yes	Semantic

LiteLLM is the one most of you know. It is fine. We ran it for two weeks. The Python process held up but our captioning service is Go (we share a binary with the on-device telemetry agent), and adding a Python sidecar just to talk to LiteLLM felt wrong. Memory footprint on our edge box mattered too.

Portkey is genuinely good if you want a hosted control plane. The dashboard is nice. But we have a contractual requirement to keep all inference routing inside our VPC. The self-hosted Portkey works, however the SaaS-first orientation showed up in small ways, and the docs assume the cloud path more often than not.

Bifrost won mostly because it is a single Go binary, the API is OpenAI-compatible end to end (so our existing OpenAI SDK code did not change), and npx -y @maximhq/bifrost got us a working gateway in about 40 seconds on the Orin. Not a marketing claim, I timed it while making espresso.

The actual config

Here is roughly what we shipped. Anthropic is primary, OpenAI is fallback, with two API keys per provider for load balancing.

providers:
  anthropic:
    keys:
      - value: env.ANTHROPIC_KEY_1
        weight: 0.5
      - value: env.ANTHROPIC_KEY_2
        weight: 0.5
    models:
      - claude-sonnet-4-6
  openai:
    keys:
      - value: env.OPENAI_KEY
    models:
      - gpt-4o
fallbacks:
  - from: anthropic/claude-sonnet-4-6
    to: openai/gpt-4o

Then on the client side, nothing changed. Same OpenAI SDK call, just pointed at http://gateway:8080/v1. The drop-in replacement claim in the README is accurate, we did not rewrite a single client.

The semantic caching was an unexpected win. Our captions repeat a lot ("person walks across frame" happens 200 times an hour in some deployments). Turning on semantic cache cut our Anthropic bill by roughly 31% over a two-week window. The cache lives in Redis, we already had one.

What broke

To be fair, not everything was smooth.

The MCP integration looked interesting on paper but we did not need it for this use case. Our tools are CUDA kernels, not filesystem helpers. We turned it off.

The Prometheus metrics endpoint worked but the default scrape interval in our Grafana setup was too aggressive and we briefly thought we had a memory leak. Operator error, not Bifrost's fault.

The web UI is genuinely useful for non-engineers on the team (our PM kept asking for cost breakdowns) but I personally edited the config file directly. Different strokes.

Trade-offs and Limitations

Bifrost is younger than LiteLLM. The community is smaller and Stack Overflow answers are sparse. If you get stuck at 2am, you are reading source code, which for me is fine, but might not be for everyone.

If your stack is already pure Python, LiteLLM has tighter integration with Python-native tools like LangChain. Bifrost speaks the OpenAI API, so it works, but it does not pretend to be Pythonic.

Portkey's analytics UI is more polished. If you care about that more than deployment shape, look there first.

And honestly, if you only call one provider and never hit quotas, you do not need any gateway. We did not, until we did.

The point

Event cameras are about doing more with less data. Adding a gateway between us and three different LLM vendors is the opposite philosophy, more layers, more moving parts. I resisted it for a while. But once the captioning service started going down for reasons completely unrelated to our computer vision work, the cost of not having a gateway became obvious. The cheapest model is the one you never had to call twice.

DEV Community