TL;DR
MCP (Model Context Protocol) lets AI agents call external tools at runtime. But in enterprise, you need centralized control over which agents can access which tools, with budgets and rate limits per team. Bifrost is an open-source, Go-based LLM gateway that ships MCP support with multi-level RBAC through Virtual Keys, sub-3ms tool execution latency, and Code Mode that cuts token usage by 50%+.
Star Bifrost on GitHub | Read the docs
The Problem: MCP Without Governance is a Liability
MCP is an open standard that lets AI models discover and execute external tools at runtime. Instead of just generating text, your model can read files, query databases, search the web, and trigger business logic through MCP servers.
That is powerful. It is also dangerous in production without governance.
When you have 10 teams running AI agents, each connected to MCP servers for filesystem access, database queries, and third-party APIs, you need answers to basic questions. Which team can call which tools? What is the spending cap per agent? How do you revoke access instantly if something goes wrong?
Most setups today rely on per-agent configuration. Each agent has its own hardcoded list of allowed tools, its own API keys, its own rate limits scattered across config files. There is no centralized view, no single point of control, no audit trail.
We built Bifrost's MCP gateway to solve this.
How Bifrost Handles MCP
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Bifrost acts as both an MCP client (connecting to external tool servers) and optionally as an MCP server (exposing tools to clients like Claude Desktop).
The default pattern is stateless with explicit execution:
1. POST /v1/chat/completions
-> LLM returns tool call suggestions (NOT executed)
2. Your app reviews the tool calls
-> Apply security rules, get user approval
3. POST /v1/mcp/tool/execute
-> Execute approved tool calls explicitly
4. POST /v1/chat/completions
-> Continue conversation with tool results
Tools never execute without explicit approval. This is a deliberate design choice. By default, Bifrost treats tool calls from the LLM as suggestions only.
If you want autonomous execution for specific tools, you opt into Agent Mode with tools_to_auto_execute configuration. But you have to choose which tools, per MCP client. It is never a blanket "execute everything."
Try Bifrost - zero-config deployment via npx -y @maximhq/bifrost or Docker.
Multi-Level Tool Filtering
This is where the RBAC story comes together. Bifrost provides three levels of tool filtering:
1. Request-Level Filtering (HTTP Headers)
Per-request control over which MCP clients and tools are available:
# Include only specific MCP clients
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-clients: filesystem,websearch" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
# Include only specific tools
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-tools: filesystem/read_file,websearch/search" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
2. Configuration-Level Filtering
At the config level, you can blacklist dangerous tools permanently, map user roles to available tool sets, and set different tool availability for dev vs production environments.
3. Virtual Key-Level Filtering
Each Virtual Key can be restricted to specific providers and models, which in turn controls the MCP capabilities available to that key's users.
Virtual Keys: The Governance Entity
Virtual Keys are the central governance unit. Each key gets its own access controls, budgets, and rate limits.
Here is what a Virtual Key creation looks like through the API:
{
"name": "Engineering Team API",
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o-mini"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-sonnet-20240229"]
}
],
"team_id": "team-eng-001",
"budget": {
"max_limit": 100.00,
"reset_duration": "1M"
},
"rate_limit": {
"token_max_limit": 10000,
"token_reset_duration": "1h",
"request_max_limit": 100,
"request_reset_duration": "1m"
},
"is_active": true
}
Key capabilities per Virtual Key:
- Model/provider filtering: Restrict which providers and models a key can access
- Independent budgets: Dollar-amount caps with reset durations (1m, 1h, 1d, 1w, 1M)
- Token and request rate limiting: Separate limits for tokens consumed and requests made
- Active/inactive toggle: Kill access instantly without deleting configuration
- Key restrictions: Limit a VK to specific provider API keys
Four-Tier Budget Hierarchy
Bifrost enforces budgets at four levels, checked in order:
- Customer (highest): Organization-wide budget. Top-level cap.
- Team: Department-level budget, rolled up under a customer.
- Virtual Key: Per-key budget, checked alongside team and customer budgets.
- Provider Config: Per-provider spending controls.
A Virtual Key belongs to either one team OR one customer (mutually exclusive). When a request comes in, Bifrost checks budgets at every applicable level. If any tier is exceeded, the request is rejected.
Code Mode: 50%+ Token Reduction for MCP
When you connect 3+ MCP servers, the tool definitions sent to the LLM can consume thousands of tokens per request. Code Mode solves this.
Instead of sending 100+ tool definitions to the model, Bifrost generates a Virtual File System (VFS) with TypeScript declaration files for all connected tools:
servers/
filesystem.d.ts -> All filesystem tools
web_search.d.ts -> All web search tools
database.d.ts -> All database tools
The AI writes TypeScript code that orchestrates tools, executed in a sandboxed Goja VM. The result: 50%+ token reduction and 40-50% latency improvement compared to classic MCP tool calling.
MCP connection latency by type:
- InProcess: ~0.1ms
- STDIO: ~1-10ms
- HTTP: ~10-500ms
- SSE: real-time streaming
Overall MCP tool execution stays under 3ms for local connections.
Why Not Just Configure Per-Agent?
You can. Many teams do. The problem compounds at scale.
Per-agent configuration means no central audit trail, no single dashboard for budgets across teams, no way to instantly revoke access for a compromised key, and no consistent rate limiting policy. You end up building a governance layer yourself.
Bifrost is open source, written in Go with 11 microsecond latency overhead, and handles 5,000 requests per second. You can deploy it with a single command and start routing immediately.

Top comments (0)