Pranay Batta

Posted on Apr 14

MCP at Scale: Access Control, Cost Governance, and 92% Lower Token Costs

#ai #mcp #llm #devops

The Hidden Tax on Every MCP Request

Here is something nobody talks about when they demo MCP integrations: token costs at scale.

I have been running MCP setups with increasing numbers of connected servers. The pattern is always the same. You connect a few servers, everything works brilliantly. You connect a dozen, costs start climbing. You connect sixteen servers with 500+ tools, and suddenly your token budget is gone before the model even starts thinking about your actual query.

Why? Every tool definition from every connected server gets injected into the model's context on every single request. 150+ tool definitions can consume the majority of your token budget. And there is zero access control. Any consumer can call any tool. No cost tracking at tool level.

This is unsustainable for production deployments.

I Tested Bifrost's Code Mode Approach

Bifrost takes a fundamentally different approach to this problem. Instead of dumping all tool definitions into the context window, it exposes a virtual filesystem of Python stub files. The model discovers tools on-demand through four meta-tools:

listToolFiles - discover available servers and tools
readToolFile - load specific function signatures
getToolDocs - fetch detailed documentation only when needed
executeToolCode - run scripts in a sandboxed Starlark interpreter

The key insight: the model only loads what it actually needs for the current query. If you ask it to read a file, it does not need to know about your Slack, GitHub, Jira, and database tools all at once.

Here is what a typical tool discovery flow looks like:

# Model calls listToolFiles to see available servers
available = listToolFiles()
# Returns: ["filesystem/", "github/", "slack/", "jira/", ...]

# Model identifies it needs filesystem tools for this query
tools = readToolFile("filesystem/read.py")
# Returns only the function signature for filesystem_read

# Model fetches docs only if needed
docs = getToolDocs("filesystem", "read")

# Executes with full sandboxing
result = executeToolCode("filesystem/read.py", {"path": "/src/main.go"})

This is lazy loading for LLM tool contexts. Simple idea. Massive impact.

Benchmark Results: 3 Controlled Rounds

I ran three controlled rounds, scaling from 6 servers to 16 servers. Every round maintained a 100% task pass rate. The model completed every task correctly while using dramatically fewer tokens.

Round	Tools	Servers	Token Reduction	Cost Savings
1	96	6	58.2%	55.7%
2	251	11	84.5%	83.4%
3	508	16	92.8%	92.2%

At roughly 500 tools, Code Mode reduces per-query token usage by about 14x. From 1.15M tokens down to 83K. That is not an incremental improvement. That is a different cost structure entirely.

The savings compound non-linearly. As you add more tools, the percentage saved increases because Code Mode's overhead stays roughly constant while traditional mode scales linearly with tool count.

For full benchmark methodology, check the benchmarking docs.

Access Control That Actually Works

Token savings are great, but production MCP deployments need governance. Bifrost handles this through two mechanisms.

Virtual Keys let you create scoped credentials per user, team, or customer. You can scope at the tool level:

virtual_key:
  name: "data-team-key"
  allowed_tools:
    - database_read
    - database_query
  blocked_tools:
    - database_delete
    - filesystem_write

Allow filesystem_read, block filesystem_write. Allow database_query, block database_delete. Fine-grained, declarative, no code changes needed.

MCP Tool Groups are named collections of tools from multiple servers. You create a group, attach it to keys, teams, or users. No database queries at resolve time. This is important when you are running at 5000 RPS and cannot afford lookup latency.

Per-Tool Observability

Every tool execution gets logged with:

Tool name and server source
Arguments passed and results returned
Execution latency
Virtual key that initiated the call
Parent LLM request context

You can track cost at the tool level alongside LLM token costs. This matters when your finance team asks why the AI bill doubled last month. You can point to exactly which tools, which teams, and which queries drove the spend.

Budget and limits let you set spending caps per virtual key, so no single team can blow through the monthly allocation.

Connection Flexibility

Bifrost supports four MCP connection types: STDIO, HTTP, SSE, and in-process via the Go SDK. OAuth 2.0 with PKCE and automatic token refresh is built in. Health monitoring with automatic reconnects keeps things running without manual intervention.

You can run it in manual approval mode where a human reviews tool calls, or in autonomous agent loop mode where the model chains tool calls independently.

For Claude Code and Cursor users, the /mcp endpoint integrates directly. Setup takes minutes.

Honest Trade-offs

No tool is perfect. Here is what I noticed:

Learning curve for Code Mode. The virtual filesystem abstraction is elegant, but it is a new mental model. Teams used to traditional MCP tool injection will need to understand why their tools are now "files" the model reads on demand.

Meta-tool overhead on simple queries. If you only have 10-20 tools, the overhead of the four meta-tools (listToolFiles, readToolFile, etc.) might not save you much. The real wins kick in above 50-100 tools. Below that threshold, traditional mode works fine.

Starlark sandbox limitations. The sandboxed Starlark interpreter is secure by design, but it means tool code runs in a restricted environment. Complex tool implementations may need adjustments.

Dependency on gateway availability. Adding a gateway layer means one more component to monitor. Bifrost's 11 microsecond latency and Go-based architecture make this a non-issue in practice, but it is still an additional piece of infrastructure.

Who Should Care

If you are running fewer than 50 MCP tools, you probably do not need Code Mode yet. Traditional tool injection works fine at that scale.

If you are running 100+ tools across multiple servers, or if you need per-team access control, or if your CFO is asking questions about AI infrastructure costs, this is worth evaluating.

The 92% cost reduction at 500+ tools is the headline number, but the governance features (virtual keys, tool groups, audit logging) are what make it production-ready.

Try It

Bifrost is open-source and written in Go.

GitHub repo - star it if this is useful
MCP documentation - full setup guide
Governance docs - virtual keys, tool groups, budgets
Getting started - up and running in minutes

I have been testing a lot of MCP tooling lately. Bifrost's approach to the context window problem is the most practical solution I have seen. The lazy loading pattern for tool definitions should honestly be how all MCP gateways work.

Check the docs and give it a spin. Happy to discuss benchmarks or setup in the comments.

DEV Community