DEV Community

Kuldeep Paul
Kuldeep Paul

Posted on

Bifrost MCP Gateway: Cutting Token Costs in Claude Code and Codex CLI by 92%

Bifrost MCP Gateway cuts token costs in Claude Code and Codex CLI by up to 92% through Code Mode, tool filtering, and unified governance.

Claude Code, Codex CLI, and every other coding agent on the market share one expensive habit: they consume tokens at an alarming rate. Plug in a handful of MCP servers for filesystem access, GitHub operations, internal APIs, or database tooling, and the full tool catalog gets serialized into the agent's context on every loop iteration. Most engineering teams notice the damage only after the monthly bill lands. Bifrost MCP Gateway addresses the underlying problem by rethinking how tools reach the model, pairing Code Mode with per-consumer virtual keys plus fine-grained tool filtering, so coding agents burn a small portion of what they would otherwise waste. In controlled tests spanning 508 tools across 16 MCP servers, token usage collapsed by 92.8% while the pass rate stayed pinned at 100%.

Why Tool Bloat in MCP Drains Coding Agent Tokens

The default behavior of classic MCP is costly: every tool schema from every connected server gets pushed into the model's prompt on every request. For a coding agent fronted by five MCP servers carrying thirty tools apiece, that means 150 tool schemas land before the model has parsed the first line of your instruction. Push the setup further, to 16 servers with roughly 500 tools, and the problem compounds, because classic MCP resends every definition on every call regardless of which tools the model will invoke.

Anthropic's own engineering team called this out directly. A recent writeup on code execution with MCP walked through a Drive-to-Salesforce workflow where context fell from 150,000 tokens down to 2,000 once tool definitions were loaded lazily instead of upfront. The same dynamic bites anyone driving Claude Code or Codex CLI against many MCP servers, since the bulk of token spend goes to catalogs the model never touches on that particular turn.

Two downstream effects follow. First, inference cost scales with the size of your MCP footprint rather than with the work you want the agent to accomplish. Second, coding agents slow down as their tool catalog expands, because the model spends more of its context budget digesting schemas instead of reasoning through code. Claude Code's own docs note that tool search is on by default specifically to dampen this effect, but client-side patches do not fix the problem when many teams, agents, and customers share a common tool fleet.

The Hidden Token Math Behind Claude Code and Codex CLI

A familiar pattern keeps surfacing in coding agent deployments:

  • A developer wires Claude Code or Codex CLI to a filesystem MCP server, a GitHub server, and several internal tool servers.
  • Each server publishes between ten and fifty tools.
  • Completing a non-trivial task takes the agent loop six to ten turns.
  • Every turn reinjects the full tool list into the prompt.

With 150 tool schemas running a few hundred tokens apiece, a single ten-turn coding task can readily consume 300K input tokens before producing a useful response. Multiply across hundreds of daily runs per engineer and the math compounds into thousands of dollars per month in raw schema overhead. Tool selection accuracy also suffers, since the model has to pick the right option out of dozens of irrelevant candidates.

How Bifrost MCP Gateway Attacks Token Costs at the Root

Bifrost is Maxim AI's open-source AI gateway, written in Go and adding only 11 microseconds of overhead at 5,000 requests per second. It plays both sides of the MCP protocol: it acts as an MCP client against upstream tool servers and as an MCP server that exposes a single /mcp endpoint to Claude Code, Codex CLI, Cursor, and other clients. Cost reduction for coding agents flows from three layers working in concert.

Code Mode: stubs on demand, not full schema dumps

Code Mode is the core engine. Rather than pushing every tool schema into context, Bifrost presents upstream MCP servers as a virtual filesystem of lightweight Python stub files. Four meta-tools let the model walk that catalog lazily:

  • listToolFiles: see which servers and tools are reachable
  • readToolFile: pull compact Python function signatures for a specific server or tool
  • getToolDocs: retrieve detailed documentation for a tool before invoking it
  • executeToolCode: execute an orchestration script against live tool bindings inside a sandboxed Starlark runtime

The model loads only the stubs actually relevant to the current task, composes a short script to chain the tools, and submits that script through executeToolCode. Bifrost runs it in the sandbox, chains the underlying calls, and hands back only the final result. Intermediate outputs never round-trip through the prompt.

Code Mode offers two binding granularities. Server-level binding bundles all tools from a server into one stub file, well-suited to servers carrying a modest number of tools. Tool-level binding gives each tool its own stub, which helps when a server ships thirty-plus tools with dense schemas. Both modes rely on the same four meta-tools. Teams evaluating broader options can also review Bifrost's dedicated MCP gateway resources on centralized tool discovery and governance.

Tool filtering: narrow what each coding agent can see

Claude Code and Codex CLI rarely need unrestricted access to every tool behind the gateway. Bifrost's tool filtering lets you define, per virtual key, the exact MCP tool set exposed. A key provisioned for a CI agent might be restricted to read-only operations. A key issued to a human developer's Claude Code session might cover the full catalog. Whatever scope you choose, the model only ever sees tools it is cleared to invoke, keeping context size and blast radius tight.

One /mcp endpoint for centralized discovery

Instead of registering multiple MCP servers inside every coding agent's config, teams point Claude Code or Codex CLI at Bifrost's single /mcp endpoint. Every connected server is discovered and governed centrally. Add a new MCP server to Bifrost and it becomes available to every connected coding agent automatically, with no client-side config edits required.

Benchmark Results: 92% Cost Reduction at Scale

Bifrost ran three rounds of controlled benchmarks, toggling Code Mode on and off while stepping tool count upward between rounds to measure how savings behave as MCP footprints grow:

Round Tools × Servers Input Tokens (OFF) Input Tokens (ON) Token Reduction Cost Reduction Pass Rate
1 96 tools · 6 servers 19.9M 8.3M −58.2% −55.7% 100%
2 251 tools · 11 servers 35.7M 5.5M −84.5% −83.4% 100%
3 508 tools · 16 servers 75.1M 5.4M −92.8% −92.2% 100%

Two observations stand out. Savings are not linear, they compound as MCP footprint grows, because classic MCP ships every schema on every call while Code Mode's cost is bounded by what the model actively reads. Accuracy holds too: pass rate sits at 100% in every round. The complete report lives in the Bifrost MCP Code Mode benchmarks repo, and further performance benchmarks document Bifrost's overhead profile under production load.

For a deeper look at how Code Mode sits alongside governance and audit, the Bifrost MCP Gateway overview post walks through access control, cost attribution, and tool groups in detail.

Putting Bifrost MCP Gateway in Front of Claude Code and Codex CLI

Placing Claude Code or Codex CLI behind Bifrost takes only a few minutes. The Claude Code integration guide and Codex CLI integration guide cover the full configuration. The essential steps:

  1. Run Bifrost locally or inside your VPC, then attach upstream MCP servers through the dashboard (HTTP, SSE, and STDIO transports are all supported).
  2. Turn Code Mode on per MCP client; no schema changes or redeployment are needed.
  3. Issue a virtual key for each consumer (human developer, CI pipeline, customer integration) and bind it to the tool set it is cleared to call.
  4. Point Claude Code or Codex CLI at Bifrost's /mcp endpoint, passing the virtual key as credential.
  5. Where team-wide or customer-wide scope matters more than per-key scope, reach for MCP Tool Groups instead.

Once the agent is wired up, each tool call is captured as a first-class log entry containing tool name, source server, arguments, result, latency, virtual key, and the parent LLM request that triggered the loop. That puts token-level cost tracking and per-tool cost tracking side by side, making spend attribution straightforward. Teams onboarding multiple terminal-based coding agents can also reference Bifrost's broader CLI coding agent resources for integration patterns.

What You Gain Beyond Token Savings

Token cost reduction is the headline outcome, but coding agents running through Bifrost MCP Gateway also inherit capabilities most teams otherwise build internally:

  • Scoped access that restricts each coding agent to the tools it genuinely needs.
  • Audit trails where every tool execution is recorded with full arguments and results, which accelerates security reviews and debugging.
  • Health monitoring covering automatic reconnection when upstream servers fail, plus periodic refresh to surface newly published tools.
  • OAuth 2.0 with PKCE for MCP servers that demand user-scoped auth, including dynamic client registration and automatic token refresh.
  • Unified model routing, since the same gateway that governs MCP traffic also handles provider routing, failover, and load balancing across 20+ LLM providers.

For teams running Claude Code or Codex CLI at scale, the Bifrost MCP gateway resource page and the Claude Code integration resource cover deployment patterns and cost-saving configurations in greater depth.

Start Reducing Coding Agent Token Costs Today

Token cost in coding agents stops being a rounding error once you hit production scale. When Claude Code, Codex CLI, and every other agent in the fleet push full tool catalogs on every turn, the invoice outruns the value delivered. Bifrost MCP Gateway brings those token costs back to heel by loading tool definitions lazily, scoping access through virtual keys, and consolidating every MCP server behind a single endpoint, without trading capability or accuracy.

To see how Bifrost can cut token costs across your coding agent fleet, schedule a demo with the Bifrost team.

Top comments (0)