Debby McKinney

Posted on Jan 19

Why AI Agents Need MCP Gateways (And How Bifrost Solves Production Problems)

#programming #ai #tutorial #opensource

Model Context Protocol (MCP) changed how AI agents interact with external tools. Instead of writing custom integrations for every API, database, or service, MCP provides a standard protocol. AI models discover available tools at runtime and use them to complete tasks.

But there's a gap between MCP's promise and production reality. Running MCP servers directly exposes security vulnerabilities, creates observability blind spots, and introduces operational complexity that the base protocol doesn't address.

This is where MCP gateways become critical. They're not simple proxies - they're control planes that transform MCP from a protocol specification into production-ready infrastructure.

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

View on GitHub

The Production Problems MCP Doesn't Solve

The MCP specification handles communication between AI models and tool servers. What it doesn't handle:

Security isolation. MCP servers execute with whatever permissions you grant them. As agent deployments grow from a handful to dozens of tools across multiple environments, managing authentication, role-based access control, and security boundaries becomes unmanageable. A misconfigured MCP server can expose sensitive data or enable unauthorized actions.

Observability. Direct MCP connections provide zero insight into what agents do with tools. Without structured logging, performance metrics, and audit trails, debugging failures is guesswork. Production teams need to know which tools were called, what parameters were used, and why something failed.

Operational management. Deploying and managing dozens of MCP servers, handling connection failures, routing requests, and controlling access requires infrastructure that most teams end up building themselves.

MCP gateways solve these problems by sitting between AI models and MCP servers, providing security, observability, and operational controls.

How Bifrost Approaches MCP

Bifrost is an open-source LLM gateway built in Go, known for extremely low latency (11µs overhead at 5,000 RPS). The team added MCP support as a first-class feature rather than bolting it on.

The architecture positions Bifrost as both an MCP client (connecting to tool servers) and optionally as an MCP server (exposing tools to external clients like Claude Desktop).

Four Connection Types

Bifrost supports four ways to connect to MCP servers:

InProcess connections run tools directly in the gateway's memory. For teams using Go, this provides sub-millisecond latency (around 0.1ms) with compile-time type safety. Tools are registered with typed handlers, eliminating runtime surprises about parameter types or missing fields.

STDIO connections launch external processes and communicate via stdin/stdout. Latency is typically 1-10ms, making this suitable for local tools, scripts, or Python/Node.js MCP servers. The gateway manages process lifecycle and handles cleanup.

HTTP connections communicate with remote MCP servers over HTTP. Network-dependent latency usually falls in the 10-500ms range. This approach scales better than local connections - run multiple instances behind a load balancer, isolate tool execution, and handle failures gracefully.

SSE (Server-Sent Events) connections maintain persistent connections for streaming data. Real-time monitoring, market data feeds, and event-driven workflows benefit from this pattern.

This flexibility lets teams choose the right integration pattern for each tool rather than forcing everything through a single communication model.

Security Model

Bifrost's default behavior is critical: tool calls are suggestions, not automatic executions. When an AI model wants to use a tool, it returns the request to the application. The application decides whether to execute it.

This prevents production disasters. Agents can't accidentally delete databases, charge credit cards, or expose sensitive data without explicit approval.

For cases where automatic execution is safe, Bifrost supports "agent mode" with whitelisted tools. Configuration specifies which tools can auto-execute, keeping everything else under manual control.

Request-level filtering provides granular access control:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-mcp-include-clients: filesystem,websearch" \
  -H "x-bf-mcp-include-tools: filesystem/read_file,websearch/search" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

The agent only sees tools specified in headers. Customer-facing chatbots don't access admin tools. Financial applications can restrict agents to read-only database operations. Wildcards work too - database/* includes all database tools while admin/* stays hidden.

Code Mode vs Agent Mode

Two execution patterns handle different use cases:

Agent mode has the AI call one tool at a time. It sees the result, processes it, and calls the next tool. This works well for interactive agents where human oversight matters at each step.

Code mode lets the AI write TypeScript that orchestrates multiple tools in a single execution:

const files = await listFiles("/project");
const results = await Promise.all(
    files.map(file => analyzeFile(file))
);
return summarize(results);

The code runs atomically. This is significantly faster for batch operations or complex workflows where per-tool approval creates too much friction. The tradeoff is losing step-by-step visibility - the code either runs or it doesn't.

Production Observability

Every tool call generates structured logs. Full audit trails show what the agent did, which tools it used, what data it accessed, and what happened. Prometheus metrics track latency, success rates, and error patterns per tool. Distributed tracing visualizes the complete request flow from AI model through the gateway to MCP servers and back.

MCP-specific metrics include connection health, tool availability, and execution times. If an MCP server becomes unavailable, monitoring surfaces it immediately rather than surfacing as mysterious agent failures.

Performance Characteristics

Real-world performance data from teams running Bifrost with MCP:

STDIO connection overhead: 1-2ms average
HTTP connection overhead: 10-20ms average (local network)
InProcess tool calls: 0.05-0.1ms average
Tool discovery: Under 100ms for 50+ tools
Memory usage: ~2MB per active STDIO connection

These numbers are with Go's runtime. Python and Node.js MCP servers will show different characteristics, but the relative performance differences hold.

When Bifrost Makes Sense

Teams building production AI agents with specific requirements benefit most:

Low-latency applications where every millisecond of overhead compounds across multiple tool calls. InProcess connections at 0.1ms and STDIO at 1-2ms feel instant compared to alternatives adding 50-100ms overhead.

Security-conscious deployments need default-deny tool execution and request-level filtering. Financial services, healthcare, and regulated industries can't accept automatic tool execution without controls.

High-throughput systems processing thousands of requests per second need efficient connection management and minimal memory overhead. Bifrost's Go implementation handles concurrency well.

Multi-environment deployments benefit from supporting all four connection types. Development can use STDIO for fast iteration, staging can use HTTP for realistic testing, and production can optimize based on actual requirements.

Trade-offs

The InProcess connection type only works with Go. Teams using Python or JavaScript for tool development can't leverage the lowest-latency option.

Agent mode and code mode require thoughtful configuration. Too permissive auto-execution defeats security controls. Too restrictive configuration creates friction that slows agents down.

Setting up request-level filtering adds complexity to API calls. Applications need to determine which tools each request should access, which requires understanding tool dependencies and security boundaries.

Alternatives

Other approaches to MCP gateway functionality exist:

Direct MCP integration without a gateway works for prototypes and small deployments. Security and observability gaps become problems at scale.

Cloud-managed solutions (AWS, Azure, Google Cloud) integrate MCP with existing infrastructure but introduce vendor lock-in and network latency.

Purpose-built agent platforms include MCP support but often bundle features you might not need, increasing complexity and cost.

Bifrost's approach is being a standalone gateway focused specifically on routing, security, and observability without bundling agent development frameworks or deployment infrastructure.

Getting Started

MCP support is available in Bifrost's open-source release. The documentation covers setup for all connection types, agent mode configuration, and request-level filtering. Examples demonstrate common patterns for filesystem operations, database queries, and web APIs.

Teams can start with STDIO connections for prototyping - they're easiest to set up and test locally. Production deployments typically move to HTTP connections for better scalability and isolation.

The code is available on GitHub. Teams can review the implementation, contribute improvements, or fork for custom requirements.

Thanks for reading!

References: https://docs.getbifrost.ai/mcp/overview

DEV Community