Self-Hosting an Open Source MCP Gateway: Setup, Security, and Scaling Guide

#mcp #ai #infrastructure #tutorial

TL;DR: Self-hosting an MCP gateway gives you control over auth, audit logging, and tool access in a way managed services do not. I set up Bifrost end to end on a single instance, walked through the security configuration (OAuth 2.0 with PKCE, virtual keys with deny-by-default), and pushed it to its documented sustained throughput. This post covers the setup, the security layer, and the scaling levers, plus the gotchas.

This post assumes familiarity with the Model Context Protocol, basic Docker or npx deployment, and how OAuth 2.0 PKCE flows work.

Why Self-Host an MCP Gateway

Direct MCP connections work for one developer on a laptop. They break in three ways once you scale.

Credentials get duplicated across every agent config. Each Claude Code or Cursor instance holds its own MCP server credentials.

Audit trails fragment. When a tool call modifies the wrong record, you cannot answer who called it from which agent session without grepping through agent logs.

Tool access is all-or-nothing. Every agent sees every tool from every connected server, including the dangerous ones.

A self-hosted gateway puts these behind one entry point. Bifrost is the option I worked with because it is open source, written in Go, and ships MCP gateway functionality alongside LLM routing in a single binary.

Step 1: Run the Gateway

npx -y @maximhq/bifrost

That starts Bifrost on port 8080 with a default config. For production, Docker is the path most teams take:

docker run -d \
  -p 8080:8080 \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/data:/app/data \
  maximhq/bifrost:latest

The volume mounts persist configuration and the SQLite store across restarts. The setup docs cover Postgres if you need it for production observability.

Step 2: Register MCP Servers

Bifrost supports STDIO, HTTP, and SSE connection types. The configuration sits under the mcp block.

mcp:
  servers:
    - name: github-mcp
      type: stdio
      command: ["npx", "-y", "@modelcontextprotocol/server-github"]
      env:
        GITHUB_TOKEN: ${GITHUB_TOKEN}

    - name: linear-mcp
      type: http
      url: ${LINEAR_MCP_URL}
      headers:
        Authorization: "Bearer ${LINEAR_TOKEN}"

    - name: filesystem
      type: stdio
      command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/data"]
      env:
        HOME: /root
        PATH: /usr/local/bin:/usr/bin

Once saved, the gateway connects to each server and discovers its tool list. Bifrost runs a 10-second health check per server with exponential backoff retry on failure, so a flapping upstream does not take down the whole gateway path.

Step 3: Lock Down Auth With OAuth 2.0 and Virtual Keys

The security layer is where self-hosting earns its keep. Bifrost ships OAuth 2.0 with PKCE for upstream MCP servers that support it, plus virtual keys with deny-by-default access for downstream agents.

virtual_keys:
  - key_name: claude-code-eng
    key: vk-cc-eng
    mcp_clients:
      - name: github-mcp
        allowed_tools: ["search_code", "get_pull_request", "get_issue"]
      - name: filesystem
        allowed_tools: ["read_text_file", "list_directory", "search_files"]

  - key_name: claude-code-write
    key: vk-cc-write
    mcp_clients:
      - name: github-mcp
        allowed_tools: ["*"]
      - name: linear-mcp
        allowed_tools: ["*"]

The first key is read-only across both servers. The second key is broader. If a virtual key has no mcp_clients block, no MCP tools are exposed through that key. The default is deny.

The four-tier budget hierarchy (Customer, Team, Virtual Key, Provider Config) applies to MCP traffic as well as LLM traffic, documented on the Bifrost governance resource.

Step 4: Enable Code Mode for Token Reduction

For agentic workloads with more than a handful of tools, Code Mode is the single biggest cost lever. It replaces full tool definition injection with a Python stub generation flow.

mcp:
  code_mode:
    enabled: true
    sandbox: starlark

Bifrost's published Code Mode benchmarks on the MCP gateway resource page:

MCP tools connected	Token reduction	Pass rate
96 tools	58%	100%
251 tools	84.5%	100%
508 tools	92.8%	100%

The docs also call out the round-trip impact: at 5 servers and around 100 tools, Classic MCP runs ~6 LLM turns versus 3-4 turns under Code Mode, with documented "~50% cost reduction + 30-40% faster execution" in that flow. Source: docs.getbifrost.ai/mcp/code-mode.

Step 5: Connect Claude Code to the Gateway

claude mcp add-json bifrost '{
  "type": "http",
  "url": "http://localhost:8080/mcp",
  "headers": {
    "Authorization": "Bearer vk-cc-eng"
  }
}'

Claude Code now sees only the tools allowed by vk-cc-eng, routed through Bifrost. The full integration walkthrough is on the Bifrost Claude Code resource.

Step 6: Scaling Levers

Bifrost is documented at 11 microsecond latency overhead per request and 5,000 RPS sustained throughput on a single instance, available on the Bifrost benchmarks resource. The Go runtime, async write logging via sync.Pool, and batch processing keep observability overhead under 0.1ms.

For production scaling:

Move from SQLite to Postgres for the observability store
Run multiple Bifrost instances behind a load balancer with session affinity by virtual key
Use Redis (RediSearch), Weaviate, or Qdrant as a shared vector store for semantic caching
Enable streaming response caching for chat workloads with long completions

Comparison

Capability	Direct MCP	Cloudflare AI Gateway	Bifrost
Self-hosted	Yes	No	Yes
OAuth 2.0 with PKCE	Per-server	No	Yes
Per-tool audit logging	No	Basic	Yes
Tool groups	No	No	Yes
Code Mode token reduction	No	No	Yes
Latency overhead	None	Managed	11μs

Trade-offs and Limitations

Bifrost is self-hosted only. There is no managed cloud, so you take on ops cost.

The provider catalog for LLMs is smaller than LiteLLM. Major providers and custom endpoints are covered, but niche providers may not be.

Code Mode adds a small overhead on tiny tool catalogs. Below 5 to 10 tools, the upfront-definition path can be cheaper.

OpenRouter is incompatible because of a tool call streaming issue.

The project is newer than alternatives, so the community of plugins and Stack Overflow answers is still building up.

Quick Recap

One binary covers MCP gateway, LLM routing, semantic caching, and observability
OAuth 2.0 with PKCE, virtual keys with deny-by-default, and per-tool audit logging form the security layer
Code Mode cuts tool definition tokens by 58% to 92.8% depending on catalog size, with ~50% cost reduction at the 100-tool example documented by Bifrost
Scale by moving to Postgres, sharing vector stores, and load-balancing multiple instances by virtual key

GitHub: https://git.new/bifrost | Docs: https://getmax.im/bifrostdocs | Website: https://getmax.im/bifrost-home