Centralising tool access for our prompt-assembly agent with Bifrost MCP gateway

#machinelearning #mlops #computervision #llm

TL;DR: Before our SDXL stack renders a single product photo, a small LLM agent assembles the request from product metadata, a template lookup, and a brand-colour database. Wiring those tools separately for each provider kept drifting. Bifrost's MCP gateway let us register the tools once and keep them when we fail over from GPT-4o-mini to Claude. Below is what it cost, and where LiteLLM and Portkey would honestly have served us better.

The step nobody benchmarks in a diffusion pipeline

At Photoroom I work on the diffusion stack that generates product photography. The denoising loop gets all the attention. The model is the easy part.

Before any of that runs, a small agent assembles the generation request. It reads a product metadata JSON off our object store, looks up a background template by category in Postgres, and sometimes runs a web search to pin down a brand's palette. Three tool calls, maybe 40 tokens of reasoning, then it emits a structured prompt and a set of conditioning parameters for SDXL.

To be precise, that agent is gpt-4o-mini about 90% of the time. It's cheap and the latency budget is roughly 600ms. The nuance here is that when OpenAI rate-limits us during a traffic spike, we fail over to claude-haiku, and the agent has to keep working with the exact same tools.

Why per-provider tool wiring broke

We had every tool defined twice. OpenAI function-calling schemas and Anthropic's tool blocks are close but not identical, and our 9 tools lived in both formats inside the agent service.

That duplication drifted. Someone would add a region argument to the template-lookup tool on the OpenAI path and forget the Anthropic one. The failover path then silently produced worse prompts, which we only caught because a reviewer noticed bland backgrounds in a sample set. Hard to test. Easy to miss.

The deeper issue: tool execution was glued into the agent process. Filesystem reads, the Postgres query, the web search, all ran inline, so each provider integration re-implemented the same plumbing.

What the MCP gateway changed

Bifrost exposes a Model Context Protocol gateway that lets models call external tools (filesystem, web search, databases) registered once at the gateway rather than per provider. We moved our tool definitions there. The agent now talks to one OpenAI-compatible endpoint, and the same tool set is presented whether the request lands on GPT-4o-mini or Claude.

Config looks like this:

providers:
  openai:
    keys:
      - value: env.OPENAI_KEY
  anthropic:
    keys:
      - value: env.ANTHROPIC_KEY

fallbacks:
  - openai/gpt-4o-mini
  - anthropic/claude-haiku

mcp:
  servers:
    - name: photoroom-tools
      tools: [template_lookup, brand_palette, metadata_read]

Failover and the tool surface are now described in the same file. When automatic fallback kicks in, the tools come along. No second schema to keep in sync.

One side benefit we didn't plan for: the Prometheus metrics Bifrost emits gave us per-tool call counts. We learned the web-search tool fired on only 4% of requests but accounted for a third of agent latency. We've since gated it behind a confidence check.

How it compares

I evaluated Bifrost against LiteLLM and Portkey, which we'd both used before. None of these is strictly best.

Concern	Bifrost	LiteLLM	Portkey
MCP tool gateway	Built in, registered once	Not a first-class feature	Limited
Provider breadth	23+	Largest provider list	Broad
Runtime	Go, low overhead	Python proxy	Hosted-first
Observability	Native Prometheus	Callbacks, more wiring	Strong hosted dashboard
Self-host simplicity	`npx` or Docker	pip + config	Possible, cloud-leaning

LiteLLM is the honest pick if your differentiator is provider coverage or you live entirely in Python and want callbacks in-process. Portkey's hosted analytics are more polished than anything I'd build on top of raw Prometheus, and their guardrails UI is genuinely nicer for non-engineers. We chose Bifrost because the MCP gateway matched our specific shape, an agent with shared tools across a failover path, and because the Go proxy added under a millisecond at p50 in our load tests.

Trade-offs and limitations

The MCP gateway centralises tool definitions, but it also centralises a failure mode. If the gateway is down, both the model call and the tools go with it. We run two replicas behind a load balancer and treat the gateway as a tier-1 dependency now, which is more operational weight than the old inline tools carried.

Debugging a tool call across the MCP boundary is harder than a local function. A bad Postgres query used to throw inside our process with a clean stack trace. Now I read gateway logs and correlate by request ID. Workable, not free.

And the comparison above is not eternal. LiteLLM was adding MCP support when I last checked, so the gap here may close. Evaluate against current versions, not this table.

Semantic caching, which Bifrost also offers, did nothing for us on this step. Each request is keyed on a unique product, so cache hit rate was near zero. We left it off. Worth saying plainly, since the feature is often pitched as a default win.