Pranay Batta

Posted on Mar 10

Best Enterprise MCP Gateway for Centralized RBAC Management

#opensource #mcp #ai #llm

TL;DR

MCP (Model Context Protocol) lets AI agents call external tools at runtime. But in enterprise, you need centralized control over which agents can access which tools, with budgets and rate limits per team. Bifrost is an open-source, Go-based LLM gateway that ships MCP support with multi-level RBAC through Virtual Keys, sub-3ms tool execution latency, and Code Mode that cuts token usage by 50%+.

Star Bifrost on GitHub | Read the docs

The Problem: MCP Without Governance is a Liability

MCP is an open standard that lets AI models discover and execute external tools at runtime. Instead of just generating text, your model can read files, query databases, search the web, and trigger business logic through MCP servers.

That is powerful. It is also dangerous in production without governance.

When you have 10 teams running AI agents, each connected to MCP servers for filesystem access, database queries, and third-party APIs, you need answers to basic questions. Which team can call which tools? What is the spending cap per agent? How do you revoke access instantly if something goes wrong?

Most setups today rely on per-agent configuration. Each agent has its own hardcoded list of allowed tools, its own API keys, its own rate limits scattered across config files. There is no centralized view, no single point of control, no audit trail.

We built Bifrost's MCP gateway to solve this.

How Bifrost Handles MCP

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration…

View on GitHub

Bifrost acts as both an MCP client (connecting to external tool servers) and optionally as an MCP server (exposing tools to clients like Claude Desktop).

The default pattern is stateless with explicit execution:

1. POST /v1/chat/completions
   -> LLM returns tool call suggestions (NOT executed)

2. Your app reviews the tool calls
   -> Apply security rules, get user approval

3. POST /v1/mcp/tool/execute
   -> Execute approved tool calls explicitly

4. POST /v1/chat/completions
   -> Continue conversation with tool results

Tools never execute without explicit approval. This is a deliberate design choice. By default, Bifrost treats tool calls from the LLM as suggestions only.

If you want autonomous execution for specific tools, you opt into Agent Mode with tools_to_auto_execute configuration. But you have to choose which tools, per MCP client. It is never a blanket "execute everything."

Try Bifrost - zero-config deployment via npx -y @maximhq/bifrost or Docker.

Multi-Level Tool Filtering

This is where the RBAC story comes together. Bifrost provides three levels of tool filtering:

1. Request-Level Filtering (HTTP Headers)

Per-request control over which MCP clients and tools are available:

# Include only specific MCP clients
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-mcp-include-clients: filesystem,websearch" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

# Include only specific tools
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-mcp-include-tools: filesystem/read_file,websearch/search" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

2. Configuration-Level Filtering

At the config level, you can blacklist dangerous tools permanently, map user roles to available tool sets, and set different tool availability for dev vs production environments.

3. Virtual Key-Level Filtering

Each Virtual Key can be restricted to specific providers and models, which in turn controls the MCP capabilities available to that key's users.

Virtual Keys: The Governance Entity

Virtual Keys are the central governance unit. Each key gets its own access controls, budgets, and rate limits.

Here is what a Virtual Key creation looks like through the API:

{
  "name": "Engineering Team API",
  "provider_configs": [
    {
      "provider": "openai",
      "weight": 0.5,
      "allowed_models": ["gpt-4o-mini"]
    },
    {
      "provider": "anthropic",
      "weight": 0.5,
      "allowed_models": ["claude-3-sonnet-20240229"]
    }
  ],
  "team_id": "team-eng-001",
  "budget": {
    "max_limit": 100.00,
    "reset_duration": "1M"
  },
  "rate_limit": {
    "token_max_limit": 10000,
    "token_reset_duration": "1h",
    "request_max_limit": 100,
    "request_reset_duration": "1m"
  },
  "is_active": true
}

Key capabilities per Virtual Key:

Model/provider filtering: Restrict which providers and models a key can access
Independent budgets: Dollar-amount caps with reset durations (1m, 1h, 1d, 1w, 1M)
Token and request rate limiting: Separate limits for tokens consumed and requests made
Active/inactive toggle: Kill access instantly without deleting configuration
Key restrictions: Limit a VK to specific provider API keys

Four-Tier Budget Hierarchy

Bifrost enforces budgets at four levels, checked in order:

Customer (highest): Organization-wide budget. Top-level cap.
Team: Department-level budget, rolled up under a customer.
Virtual Key: Per-key budget, checked alongside team and customer budgets.
Provider Config: Per-provider spending controls.

A Virtual Key belongs to either one team OR one customer (mutually exclusive). When a request comes in, Bifrost checks budgets at every applicable level. If any tier is exceeded, the request is rejected.

Code Mode: 50%+ Token Reduction for MCP

When you connect 3+ MCP servers, the tool definitions sent to the LLM can consume thousands of tokens per request. Code Mode solves this.

Instead of sending 100+ tool definitions to the model, Bifrost generates a Virtual File System (VFS) with TypeScript declaration files for all connected tools:

servers/
  filesystem.d.ts    -> All filesystem tools
  web_search.d.ts    -> All web search tools
  database.d.ts      -> All database tools

The AI writes TypeScript code that orchestrates tools, executed in a sandboxed Goja VM. The result: 50%+ token reduction and 40-50% latency improvement compared to classic MCP tool calling.

MCP connection latency by type:

InProcess: ~0.1ms
STDIO: ~1-10ms
HTTP: ~10-500ms
SSE: real-time streaming

Overall MCP tool execution stays under 3ms for local connections.

Why Not Just Configure Per-Agent?

You can. Many teams do. The problem compounds at scale.

Per-agent configuration means no central audit trail, no single dashboard for budgets across teams, no way to instantly revoke access for a compromised key, and no consistent rate limiting policy. You end up building a governance layer yourself.

Bifrost is open source, written in Go with 11 microsecond latency overhead, and handles 5,000 requests per second. You can deploy it with a single command and start routing immediately.

GitHub | Docs | Website