DEV Community: Nakul T Krishnan

Using an MCP Gateway with Claude Code: A Practical Guide

Nakul T Krishnan — Wed, 06 May 2026 07:52:25 +0000

Learn how to integrate an MCP gateway with Claude Code to consolidate tool access, enforce governance policies, and reduce token consumption across connected MCP servers.

Claude Code has emerged as a standard terminal-based coding agent for engineering teams. Its built-in support for the Model Context Protocol (MCP) enables interaction with filesystems, databases, GitHub, web search, Slack, internal APIs, and an expanding ecosystem of community-hosted tool servers. While connecting Claude Code to a small number of MCP servers is straightforward, scaling to dozens of servers introduces operational complexity. Each server requires separate credentials, configuration, and approval handling, leading to tool sprawl, fragmented governance, and limited visibility into costs. An MCP gateway resolves this by acting as a unified access layer in front of all upstream tool servers. Bifrost, an open-source AI gateway developed by Maxim AI, is designed specifically for this architecture.

Role of an MCP Gateway in Claude Code Environments

An MCP gateway functions as both an aggregation and governance layer between Claude Code and upstream MCP servers. It establishes a single connection to each tool server while exposing a consolidated /mcp endpoint to Claude Code. All tool invocations pass through this layer, where policies related to access control, observability, and routing are enforced before reaching the underlying systems.

In the absence of a gateway, each MCP server must be configured independently within Claude Code. The gateway model simplifies this by reducing multiple connections into a single interface, centralizing operational concerns such as authentication, auditing, budgeting, and tool filtering. The Model Context Protocol itself is an open standard that enables AI systems to dynamically discover and execute external tools. It was initially introduced by Anthropic in November 2024 and is now broadly adopted across AI platforms.

Limitations of Scaling MCP Servers Without a Gateway

As the number of connected MCP servers increases, several systemic issues arise:

Configuration overhead: Each server requires its own configuration entry, transport setup, and credentials. Replicating this setup across teams introduces friction and inconsistency.
Lack of centralized governance: Claude Code can access any tool exposed by connected servers without a unified policy layer to restrict usage by user, team, or project.
Inefficient token usage: Every MCP server contributes its full set of tool definitions to the model context on each request. For example, five servers with thirty tools each result in 150 tool definitions being injected into every prompt. Anthropic has reported scenarios where this leads to 150,000 tokens per agent interaction.

An MCP gateway mitigates these challenges by introducing a centralized control plane.

Architecture of Bifrost’s MCP Gateway with Claude Code

Bifrost operates as both an MCP client and server. It connects upstream to MCP-compatible services such as filesystems, databases, GitHub, web search, internal APIs, Notion, and Slack, then aggregates these tools into a single /mcp endpoint. From Claude Code’s perspective, Bifrost appears as a single MCP server, while internally it manages multiple upstream connections.

In addition to tool aggregation, Bifrost also acts as a unified inference gateway, allowing Claude Code to route requests to non-Anthropic models without any client-side changes. This enables teams to use providers such as OpenAI, Azure OpenAI, Vertex AI, or open-weight models behind the same interface, while preserving Claude Code’s native workflow.

The MCP gateway in Bifrost supports three transport mechanisms:

STDIO: Executes a subprocess and communicates via standard input and output, suitable for local tools.
HTTP: Uses JSON-RPC to communicate with remote MCP servers, typically for cloud-hosted services.
SSE: Maintains persistent connections through Server-Sent Events for streaming use cases.

Upon registering a new upstream server, Bifrost automatically discovers available tools and synchronizes them. Claude Code does not require updates when new tools are added. Additional configuration guidance is available in the Claude Code integration resource.

Setting Up an MCP Gateway with Claude Code

The setup process is minimal and assumes Node.js 18+ and an authenticated Claude Code environment.

Step 1: Launch Bifrost

Bifrost can be started locally using NPX or Docker:

npx -y @maximhq/bifrost
# or
docker run -p 8080:8080 maximhq/bifrost

Access the dashboard at http://localhost:8080. Deployment is also supported on Kubernetes, Docker Swarm, and bare metal environments.

Step 2: Register Upstream MCP Servers

Within the Bifrost dashboard, navigate to the MCP section and add each upstream server. Specify the connection type and provide the required endpoint or command. For HTTP-based servers, authentication headers such as API keys can be configured directly. Bifrost handles tool discovery and synchronization automatically. Detailed instructions are available in the MCP connection documentation.

Step 3: Define Virtual Keys with Scoped Permissions

Virtual keys serve as the primary governance mechanism. Each key defines which tools are accessible, along with constraints such as budgets, rate limits, and routing policies. Tool access is scoped at a granular level, enabling selective permissions within the same server. For example, a key may allow crm_lookup_customer while restricting crm_delete_customer. Refer to the virtual keys documentation for implementation details.

Step 4: Connect Claude Code to the Gateway

Add Bifrost as an MCP server in Claude Code:

claude mcp add --transport http bifrost http://localhost:8080/mcp

Verify the connection using /mcp within Claude Code. All permitted tools associated with the virtual key will be available. Future additions to Bifrost are automatically reflected without further configuration.

Governance and Access Control in Production

In production environments, unrestricted tool access is rarely acceptable. Bifrost enforces governance through two mechanisms:

Virtual key scoping: Each key restricts access to a defined set of tools.
MCP Tool Groups: Logical groupings of tools that can be assigned to users, teams, or services, enabling scalable permission management.

Every tool invocation is logged with metadata including tool name, server origin, input parameters, output, latency, associated virtual key, and the originating LLM request. This level of observability supports compliance requirements such as SOC 2 Type II, GDPR, HIPAA, and ISO 27001, as outlined in the Bifrost governance layer.

This centralized control plane is particularly critical in regulated industries, where access to sensitive systems must be tightly scoped, audited, and attributable to specific users or services. By enforcing policy at the gateway layer, teams can ensure consistent compliance across all MCP-connected tools without relying on per-server controls

For external deployments, Bifrost supports OAuth 2.1 with automatic client discovery and per-user identity mapping via OAuth authentication. This aligns with the MCP specification update released in March 2025.

Reducing Token Usage with Code Mode

A significant cost factor in multi-server MCP environments is context inflation caused by large tool catalogs. By default, all tool definitions are included in every request, increasing token usage substantially.

Bifrost’s Code Mode addresses this by representing MCP tools as a Python API. Instead of preloading all tool definitions, Claude Code dynamically invokes only the required tools for a given task. This approach minimizes context size, filters outputs before they reach the model, and consolidates multi-step workflows into a single execution cycle.

In environments with multiple MCP servers, this results in approximately 50 percent reduction in token usage and 30 to 40 percent improvement in latency. Additional insights on reducing token usage are detailed in the article on reducing token costs using Bifrost.

Recommended Operational Practices

Effective MCP gateway deployments typically follow these practices:

Enable enforce_auth_on_inference to ensure all requests are authenticated via virtual keys.
Deploy Bifrost behind HTTPS using a reverse proxy such as nginx or Cloudflare.
Activate Code Mode when managing large tool catalogs to optimize cost and performance.
Route both LLM and tool traffic through the same gateway to maintain unified observability and governance.
Configure routing rules to support fallback across providers such as Vertex AI, AWS Bedrock, or Azure.

Bifrost introduces minimal latency overhead, measured at approximately 11 microseconds per request under sustained load of 5,000 RPS, ensuring it does not become a bottleneck.

Getting Started with Bifrost and Claude Code

Adopting an MCP gateway with Claude Code requires minimal configuration yet delivers significant operational benefits. Bifrost consolidates multiple tool connections into a single endpoint, introduces structured governance through virtual keys, and reduces token costs via Code Mode.

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration…

View on GitHub

To evaluate this architecture within your own infrastructure, book a demo with the Bifrost team.

A Practical Guide to Running Claude for a Team Without Hitting Quotas

Nakul T Krishnan — Thu, 26 Feb 2026 17:09:25 +0000

I’m a marketer working in a tech startup and my days involve creating content strategies, competitor analysis, generating email campaigns. Me and my team run Claude through a custom internal web app backed by a centralized, Dockerized server connected to the Anthropic API. The backend routes structured tool calls over MCP to Notion, which has our entire knowledge base and Figma, which lets Claude inspect design files and suggest layout changes based on the design principles.

On paper, the architecture was clean: single org key, audited tool access, and deterministic workflows. When I'm building out a campaign, Claude isn't just answering questions in a chat window, it's reading Notion databases, cross-referencing brand documents and pulling in Figma files for design references. This type of deep context-rich sessions evaporates the shared quota fast. What follows is a wall of rate-limit errors that stall pipelines and becomes a bottleneck.

Temporary Solution

A solution I figured out was to offload the simpler tasks like quick copy edits, summarization, subject line generation to a different model. Open source models like Llama and GPT OSS provide impressive output quality for structured tasks. And I decided to use them via Groq. For things like "rewrite this paragraph to be more direct" or "generate five variations of this CTA," they hold up really well.

The problem wasn’t open source models, it was the overhead I did not account for. For every “simple” task I had to re-inject context, be it brand voice guidelines, audience definitions, or campaign objectives. What initially felt like a five-minute task turned out into a fifteen minute prompt scaffolding just to get an output that made sense with respect to the context. The mental load of switching between two different models running independently was a bit too much that I slowly gave up on the idea.

Understanding the Problem

The real issue was that I had no intelligent layer between me and the models, something that could route requests based on complexity, manage quota consumption, maintain context, and fail over gracefully when one provider was hitting rate-limits.

After some more research, I figured out that LLM gateways could solve this problem comprehensively. LLM gateways sit between your client (in my case, my workflow and MCP-connected tools) and the model providers. They handle routing logic, usage tracking, and provider fallback. They also let you define rules for routing specific request types to designated models, with automatic fallback to a backup model if one fails.

I looked at several options. Each had merits. But Bifrost stood out for a few specific reasons that mattered to my situation.

Why Bifrost?

Bifrost provides adaptive load balancing which tracks real-time performance across providers and API keys. This is done through a comprehensive weight calculation that happens every 5 seconds. Based on this weight, a provider is decided first and then the API key to use within the provider. As a result, if multiple API keys are configured under same provider, based on error rates and latency, Bifrost automatically routes a request coming in to a key that provides optimum performance.

With Bifrost I could send complex, context-heavy requests, anything that will touch MCP connected Notion or Figma, to Claude while lighter requests like summarization, reformatting, copy variations to open source models through Groq.

Bifrost ships with a built-in MCP gateway. I'd built integrations that worked well and didn't want to rebuild them. Bifrost MCP enables AI models discover and execute external tools seamlessly. My Notion and Figma connections could route through the gateway without losing their context or behaviour.

There is a fallback feature in Bifrost that provides automatic failover in case the primary provider faces any outage, model unavailability or the model hits rate-limits. It automatically tries backup providers in the order the user specifies until one succeeds.

Bifrost also features semantic caching, which reduces unnecessary LLM calls while delivering faster responses. Unlike traditional caching, it understands the intent behind the queries, so two differently worded questions that mean the same thing, like "What are the key buyer personas for my product in the US?" and "What is the demographic of people looking for my product in the US?", are treated as equivalent. If one has already been cached, Bifrost will serve that cached response for the other as well.

Setting it up was also pretty easy compared to others.

You can either run

npx -y @maximhq/bifrost

docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost

and you should be able to see bifrost at https://localhost:8080.

Setting Up - Bifrost

Get Bifrost running as an HTTP API gateway in 30 seconds with zero configuration. Perfect for any programming language.

docs.getbifrost.ai

The observability layer gave me visibility into my token consumption by model and task type. It also showed the success rate, latency, tokens used for each request. The screenshot below shows the dashboard while I was testing out Bifrost.

Current Workflow

The Notion and Figma integrations remain intact. Rate limits still exist, but they no longer affect my workflow because fallback routing catches the overflow.

What really changed for me was the reliability. I am at a much better headspace now that I don’t have to worry “what will I do when Claude hits rate limits?”

If you have LLM workflows with multiple integrations and context dependencies, you should definitely look into LLM gateways.