TL;DR: Self-hosting an MCP gateway gives you control over auth, audit logging, and tool access in a way managed services do not. I set up Bifrost end to end on a single instance, walked through the security configuration (OAuth 2.0 with PKCE, virtual keys with deny-by-default), and pushed it to its documented sustained throughput. This post covers the setup, the security layer, and the scaling levers, plus the gotchas.
This post assumes familiarity with the Model Context Protocol, basic Docker or npx deployment, and how OAuth 2.0 PKCE flows work.
Why Self-Host an MCP Gateway
Direct MCP connections work for one developer on a laptop. They break in three ways once you scale.
Credentials get duplicated across every agent config. Each Claude Code or Cursor instance holds its own MCP server credentials.
Audit trails fragment. When a tool call modifies the wrong record, you cannot answer who called it from which agent session without grepping through agent logs.
Tool access is all-or-nothing. Every agent sees every tool from every connected server, including the dangerous ones.
A self-hosted gateway puts these behind one entry point. Bifrost is the option I worked with because it is open source, written in Go, and ships MCP gateway functionality alongside LLM routing in a single binary.
Step 1: Run the Gateway
npx -y @maximhq/bifrost
That starts Bifrost on port 8080 with a default config. For production, Docker is the path most teams take:
docker run -d \
-p 8080:8080 \
-v $(pwd)/config:/app/config \
-v $(pwd)/data:/app/data \
maximhq/bifrost:latest
The volume mounts persist configuration and the SQLite store across restarts. The setup docs cover Postgres if you need it for production observability.
Step 2: Register MCP Servers
Bifrost supports STDIO, HTTP, and SSE connection types. The configuration sits under the mcp block.
mcp:
servers:
- name: github-mcp
type: stdio
command: ["npx", "-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_TOKEN: ${GITHUB_TOKEN}
- name: linear-mcp
type: http
url: ${LINEAR_MCP_URL}
headers:
Authorization: "Bearer ${LINEAR_TOKEN}"
- name: filesystem
type: stdio
command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/data"]
env:
HOME: /root
PATH: /usr/local/bin:/usr/bin
Once saved, the gateway connects to each server and discovers its tool list. Bifrost runs a 10-second health check per server with exponential backoff retry on failure, so a flapping upstream does not take down the whole gateway path.
Step 3: Lock Down Auth With OAuth 2.0 and Virtual Keys
The security layer is where self-hosting earns its keep. Bifrost ships OAuth 2.0 with PKCE for upstream MCP servers that support it, plus virtual keys with deny-by-default access for downstream agents.
virtual_keys:
- key_name: claude-code-eng
key: vk-cc-eng
mcp_clients:
- name: github-mcp
allowed_tools: ["search_code", "get_pull_request", "get_issue"]
- name: filesystem
allowed_tools: ["read_text_file", "list_directory", "search_files"]
- key_name: claude-code-write
key: vk-cc-write
mcp_clients:
- name: github-mcp
allowed_tools: ["*"]
- name: linear-mcp
allowed_tools: ["*"]
The first key is read-only across both servers. The second key is broader. If a virtual key has no mcp_clients block, no MCP tools are exposed through that key. The default is deny.
The four-tier budget hierarchy (Customer, Team, Virtual Key, Provider Config) applies to MCP traffic as well as LLM traffic, documented on the Bifrost governance resource.
Step 4: Enable Code Mode for Token Reduction
For agentic workloads with more than a handful of tools, Code Mode is the single biggest cost lever. It replaces full tool definition injection with a Python stub generation flow.
mcp:
code_mode:
enabled: true
sandbox: starlark
Bifrost's published Code Mode benchmarks on the MCP gateway resource page:
| MCP tools connected | Token reduction | Pass rate |
|---|---|---|
| 96 tools | 58% | 100% |
| 251 tools | 84.5% | 100% |
| 508 tools | 92.8% | 100% |
The docs also call out the round-trip impact: at 5 servers and around 100 tools, Classic MCP runs ~6 LLM turns versus 3-4 turns under Code Mode, with documented "~50% cost reduction + 30-40% faster execution" in that flow. Source: docs.getbifrost.ai/mcp/code-mode.
Step 5: Connect Claude Code to the Gateway
claude mcp add-json bifrost '{
"type": "http",
"url": "http://localhost:8080/mcp",
"headers": {
"Authorization": "Bearer vk-cc-eng"
}
}'
Claude Code now sees only the tools allowed by vk-cc-eng, routed through Bifrost. The full integration walkthrough is on the Bifrost Claude Code resource.
Step 6: Scaling Levers
Bifrost is documented at 11 microsecond latency overhead per request and 5,000 RPS sustained throughput on a single instance, available on the Bifrost benchmarks resource. The Go runtime, async write logging via sync.Pool, and batch processing keep observability overhead under 0.1ms.
For production scaling:
- Move from SQLite to Postgres for the observability store
- Run multiple Bifrost instances behind a load balancer with session affinity by virtual key
- Use Redis (RediSearch), Weaviate, or Qdrant as a shared vector store for semantic caching
- Enable streaming response caching for chat workloads with long completions
Comparison
| Capability | Direct MCP | Cloudflare AI Gateway | Bifrost |
|---|---|---|---|
| Self-hosted | Yes | No | Yes |
| OAuth 2.0 with PKCE | Per-server | No | Yes |
| Per-tool audit logging | No | Basic | Yes |
| Tool groups | No | No | Yes |
| Code Mode token reduction | No | No | Yes |
| Latency overhead | None | Managed | 11Ξs |
Trade-offs and Limitations
Bifrost is self-hosted only. There is no managed cloud, so you take on ops cost.
The provider catalog for LLMs is smaller than LiteLLM. Major providers and custom endpoints are covered, but niche providers may not be.
Code Mode adds a small overhead on tiny tool catalogs. Below 5 to 10 tools, the upfront-definition path can be cheaper.
OpenRouter is incompatible because of a tool call streaming issue.
The project is newer than alternatives, so the community of plugins and Stack Overflow answers is still building up.
Quick Recap
- One binary covers MCP gateway, LLM routing, semantic caching, and observability
- OAuth 2.0 with PKCE, virtual keys with deny-by-default, and per-tool audit logging form the security layer
- Code Mode cuts tool definition tokens by 58% to 92.8% depending on catalog size, with ~50% cost reduction at the 100-tool example documented by Bifrost
- Scale by moving to Postgres, sharing vector stores, and load-balancing multiple instances by virtual key
GitHub: https://git.new/bifrost | Docs: https://getmax.im/bifrostdocs | Website: https://getmax.im/bifrost-home
Top comments (0)