DEV Community

Cover image for Self-Hosting an Open Source MCP Gateway: Setup, Security, and Scaling Guide
Pranay Batta
Pranay Batta

Posted on

Self-Hosting an Open Source MCP Gateway: Setup, Security, and Scaling Guide

TL;DR: Self-hosting an MCP gateway gives you control over auth, audit logging, and tool access in a way managed services do not. I set up Bifrost end to end on a single instance, walked through the security configuration (OAuth 2.0 with PKCE, virtual keys with deny-by-default), and pushed it to its documented sustained throughput. This post covers the setup, the security layer, and the scaling levers, plus the gotchas.

This post assumes familiarity with the Model Context Protocol, basic Docker or npx deployment, and how OAuth 2.0 PKCE flows work.

Why Self-Host an MCP Gateway

Direct MCP connections work for one developer on a laptop. They break in three ways once you scale.

Credentials get duplicated across every agent config. Each Claude Code or Cursor instance holds its own MCP server credentials.

Audit trails fragment. When a tool call modifies the wrong record, you cannot answer who called it from which agent session without grepping through agent logs.

Tool access is all-or-nothing. Every agent sees every tool from every connected server, including the dangerous ones.

A self-hosted gateway puts these behind one entry point. Bifrost is the option I worked with because it is open source, written in Go, and ships MCP gateway functionality alongside LLM routing in a single binary.

Step 1: Run the Gateway

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That starts Bifrost on port 8080 with a default config. For production, Docker is the path most teams take:

docker run -d \
  -p 8080:8080 \
  -v $(pwd)/config:/app/config \
  -v $(pwd)/data:/app/data \
  maximhq/bifrost:latest
Enter fullscreen mode Exit fullscreen mode

The volume mounts persist configuration and the SQLite store across restarts. The setup docs cover Postgres if you need it for production observability.

Step 2: Register MCP Servers

Bifrost supports STDIO, HTTP, and SSE connection types. The configuration sits under the mcp block.

mcp:
  servers:
    - name: github-mcp
      type: stdio
      command: ["npx", "-y", "@modelcontextprotocol/server-github"]
      env:
        GITHUB_TOKEN: ${GITHUB_TOKEN}

    - name: linear-mcp
      type: http
      url: ${LINEAR_MCP_URL}
      headers:
        Authorization: "Bearer ${LINEAR_TOKEN}"

    - name: filesystem
      type: stdio
      command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/data"]
      env:
        HOME: /root
        PATH: /usr/local/bin:/usr/bin
Enter fullscreen mode Exit fullscreen mode

Once saved, the gateway connects to each server and discovers its tool list. Bifrost runs a 10-second health check per server with exponential backoff retry on failure, so a flapping upstream does not take down the whole gateway path.

Step 3: Lock Down Auth With OAuth 2.0 and Virtual Keys

The security layer is where self-hosting earns its keep. Bifrost ships OAuth 2.0 with PKCE for upstream MCP servers that support it, plus virtual keys with deny-by-default access for downstream agents.

virtual_keys:
  - key_name: claude-code-eng
    key: vk-cc-eng
    mcp_clients:
      - name: github-mcp
        allowed_tools: ["search_code", "get_pull_request", "get_issue"]
      - name: filesystem
        allowed_tools: ["read_text_file", "list_directory", "search_files"]

  - key_name: claude-code-write
    key: vk-cc-write
    mcp_clients:
      - name: github-mcp
        allowed_tools: ["*"]
      - name: linear-mcp
        allowed_tools: ["*"]
Enter fullscreen mode Exit fullscreen mode

The first key is read-only across both servers. The second key is broader. If a virtual key has no mcp_clients block, no MCP tools are exposed through that key. The default is deny.

The four-tier budget hierarchy (Customer, Team, Virtual Key, Provider Config) applies to MCP traffic as well as LLM traffic, documented on the Bifrost governance resource.

Step 4: Enable Code Mode for Token Reduction

For agentic workloads with more than a handful of tools, Code Mode is the single biggest cost lever. It replaces full tool definition injection with a Python stub generation flow.

mcp:
  code_mode:
    enabled: true
    sandbox: starlark
Enter fullscreen mode Exit fullscreen mode

Bifrost's published Code Mode benchmarks on the MCP gateway resource page:

MCP tools connected Token reduction Pass rate
96 tools 58% 100%
251 tools 84.5% 100%
508 tools 92.8% 100%

The docs also call out the round-trip impact: at 5 servers and around 100 tools, Classic MCP runs ~6 LLM turns versus 3-4 turns under Code Mode, with documented "~50% cost reduction + 30-40% faster execution" in that flow. Source: docs.getbifrost.ai/mcp/code-mode.

Step 5: Connect Claude Code to the Gateway

claude mcp add-json bifrost '{
  "type": "http",
  "url": "http://localhost:8080/mcp",
  "headers": {
    "Authorization": "Bearer vk-cc-eng"
  }
}'
Enter fullscreen mode Exit fullscreen mode

Claude Code now sees only the tools allowed by vk-cc-eng, routed through Bifrost. The full integration walkthrough is on the Bifrost Claude Code resource.

Step 6: Scaling Levers

Bifrost is documented at 11 microsecond latency overhead per request and 5,000 RPS sustained throughput on a single instance, available on the Bifrost benchmarks resource. The Go runtime, async write logging via sync.Pool, and batch processing keep observability overhead under 0.1ms.

For production scaling:

  • Move from SQLite to Postgres for the observability store
  • Run multiple Bifrost instances behind a load balancer with session affinity by virtual key
  • Use Redis (RediSearch), Weaviate, or Qdrant as a shared vector store for semantic caching
  • Enable streaming response caching for chat workloads with long completions

Comparison

Capability Direct MCP Cloudflare AI Gateway Bifrost
Self-hosted Yes No Yes
OAuth 2.0 with PKCE Per-server No Yes
Per-tool audit logging No Basic Yes
Tool groups No No Yes
Code Mode token reduction No No Yes
Latency overhead None Managed 11Ξs

Trade-offs and Limitations

Bifrost is self-hosted only. There is no managed cloud, so you take on ops cost.

The provider catalog for LLMs is smaller than LiteLLM. Major providers and custom endpoints are covered, but niche providers may not be.

Code Mode adds a small overhead on tiny tool catalogs. Below 5 to 10 tools, the upfront-definition path can be cheaper.

OpenRouter is incompatible because of a tool call streaming issue.

The project is newer than alternatives, so the community of plugins and Stack Overflow answers is still building up.

Quick Recap

  • One binary covers MCP gateway, LLM routing, semantic caching, and observability
  • OAuth 2.0 with PKCE, virtual keys with deny-by-default, and per-tool audit logging form the security layer
  • Code Mode cuts tool definition tokens by 58% to 92.8% depending on catalog size, with ~50% cost reduction at the 100-tool example documented by Bifrost
  • Scale by moving to Postgres, sharing vector stores, and load-balancing multiple instances by virtual key

GitHub: https://git.new/bifrost | Docs: https://getmax.im/bifrostdocs | Website: https://getmax.im/bifrost-home

Top comments (0)