<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kuldeep Paul</title>
    <description>The latest articles on DEV Community by Kuldeep Paul (@kuldeep_paul).</description>
    <link>https://dev.to/kuldeep_paul</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2945723%2F40d70f4f-01f5-49ae-b4b5-2a1c2f77c64f.jpeg</url>
      <title>DEV Community: Kuldeep Paul</title>
      <link>https://dev.to/kuldeep_paul</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kuldeep_paul"/>
    <language>en</language>
    <item>
      <title>MCP Governance Layer: Access, Audit, and Cost Control</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Mon, 20 Apr 2026 20:09:46 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/mcp-governance-layer-access-audit-and-cost-control-5009</link>
      <guid>https://dev.to/kuldeep_paul/mcp-governance-layer-access-audit-and-cost-control-5009</guid>
      <description>&lt;p&gt;&lt;em&gt;Enterprise AI teams need an MCP governance layer to enforce tool-level access, capture audit trails, and manage cost across every Model Context Protocol server.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Enterprise teams are adopting Model Context Protocol (MCP) faster than they are building the controls to run it safely. A file access server goes in first, then one for search, then a few for internal APIs, and before long an AI agent has reach across systems no junior engineer would be trusted with on day one. An MCP governance layer closes that gap. It acts as the single control plane for deciding which tools each agent can invoke, who is invoking them, what the call returns, and what the workflow costs. Bifrost, the open-source AI gateway from Maxim AI, delivers this layer behind one &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; that sits between your models and every upstream MCP server.&lt;/p&gt;

&lt;p&gt;These risks are well documented. &lt;a href="https://genai.owasp.org/llmrisk/llm062025-excessive-agency/" rel="noopener noreferrer"&gt;Excessive Agency&lt;/a&gt; sits on the 2025 OWASP Top 10 for LLM Applications as one of the more serious production concerns, with three root causes: too much functionality, too many permissions, and too much autonomy. MCP makes every one of them easier to slip into.&lt;/p&gt;

&lt;h2&gt;
  
  
  An MCP Governance Layer, Defined
&lt;/h2&gt;

&lt;p&gt;An MCP governance layer is the infrastructure that sits between your AI agents and the MCP servers they depend on. Its job is threefold: enforce per-tool access policies, capture every tool invocation as an auditable event, and roll up token and tool-level spend across every connected server. Instead of spreading security and visibility across each individual MCP server, the layer consolidates them into one policy and observability plane. That consolidation is what lets a team move from a single MCP server to several dozen without losing operational control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Ungoverned MCP Falls Apart in Production
&lt;/h2&gt;

&lt;p&gt;Once MCP leaves a developer laptop and enters a shared production environment, three structural failure modes show up quickly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too much agency.&lt;/strong&gt; Agents routinely end up with more tools than the task justifies, and with privileges well beyond what any single run requires. OWASP files this exact pattern under LLM06.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool poisoning and indirect injection.&lt;/strong&gt; Compromised or outright malicious MCP servers can slip hidden directives into tool descriptions, which the model then reads and trusts as part of its legitimate instructions. Microsoft's developer team has published a detailed walkthrough of &lt;a href="https://developer.microsoft.com/blog/protecting-against-indirect-injection-attacks-mcp" rel="noopener noreferrer"&gt;how tool poisoning plays out in MCP&lt;/a&gt; and why leaving defense to the client alone leaves gaps open.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runaway token consumption.&lt;/strong&gt; By default, every tool definition from every attached MCP server is loaded into the model's context on each request. In a 2025 breakdown, &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic engineers described how code execution with MCP&lt;/a&gt; moved one Google Drive to Salesforce workflow from 150,000 tokens down to 2,000 tokens once tool definitions stopped being shipped on every turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With no governance layer in place, none of these problems has a single place to be fixed. Every MCP server becomes its own isolated island of policy, logging, and cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Access Control: Defining What Each Agent Can Reach
&lt;/h2&gt;

&lt;p&gt;The right granularity for MCP access control is the tool, not the server. Consider a single MCP server that exposes both &lt;code&gt;filesystem_read&lt;/code&gt; and &lt;code&gt;filesystem_write&lt;/code&gt;, or that carries &lt;code&gt;crm_lookup_customer&lt;/code&gt; next to &lt;code&gt;crm_delete_customer&lt;/code&gt;. A server-level allowlist can only permit or deny the whole bundle, which destroys the principle of least privilege before any request is ever made.&lt;/p&gt;

&lt;p&gt;Bifrost addresses this with two primitives: &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; and MCP Tool Groups.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Virtual keys&lt;/strong&gt; function as scoped credentials for every gateway consumer: a user, a team, an internal service, or a customer integration. Each key carries an explicit allowlist of MCP tools it may invoke, enforced by &lt;a href="https://docs.getbifrost.ai/mcp/filtering" rel="noopener noreferrer"&gt;per-key tool filtering&lt;/a&gt;. Any model operating behind that key is blind to definitions outside its allowlist, so there is no prompt-level loophole to exploit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Tool Groups&lt;/strong&gt; are named bundles attachable to any mix of keys, teams, customers, or providers. Bifrost resolves the applicable set in memory at request time, with no database round trip, and deterministically merges overlapping groups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach tracks where the wider MCP ecosystem is moving. The specification was recently revised to mandate &lt;a href="https://docs.getbifrost.ai/mcp/oauth" rel="noopener noreferrer"&gt;OAuth 2.1 with PKCE&lt;/a&gt;, and identity providers have started treating MCP as a first-class authorization surface. The governance layer is where those standards get applied uniformly, even when individual upstream servers do not implement them natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Audit Logging: Keeping Every Agent Action on the Record
&lt;/h2&gt;

&lt;p&gt;Once an AI agent can invoke production tools, each call needs to live as a first-class audit event, not a byproduct of general request logs.&lt;/p&gt;

&lt;p&gt;For every MCP tool execution, Bifrost records:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool name and the source MCP server&lt;/li&gt;
&lt;li&gt;Input arguments sent to the tool and the payload returned&lt;/li&gt;
&lt;li&gt;End-to-end latency for the invocation&lt;/li&gt;
&lt;li&gt;The virtual key that authorized the call&lt;/li&gt;
&lt;li&gt;The parent LLM request that kicked off the agent loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From there, any agent run can be opened and traced through its exact sequence of tool calls. Filtering by virtual key shows what a specific team or customer has been running in production. When arguments or results contain sensitive data, content logging can be toggled off per environment while metadata (tool name, server, latency, status) still gets captured.&lt;/p&gt;

&lt;p&gt;Debugging is only part of the value. SOC 2, HIPAA, GDPR, and ISO 27001 programs all require immutable audit trails, and auditors now expect those trails to extend to AI tool invocations, not just traditional API calls. For that exact scope, Bifrost ships &lt;a href="https://docs.getbifrost.ai/enterprise/audit-logs" rel="noopener noreferrer"&gt;enterprise audit logs&lt;/a&gt; with per-environment retention and export paths into downstream SIEM and data lake systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Control: Taming Token Bloat and Tool Call Spend
&lt;/h2&gt;

&lt;p&gt;Two separate cost problems sit inside any MCP deployment, and a governance layer has to handle both. The first is the token cost incurred by loading tool definitions. The second is the actual money spent when tools run.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Context Window Problem
&lt;/h3&gt;

&lt;p&gt;By default, MCP execution loads the full tool catalog from every connected server into model context on each request. With five servers at thirty tools apiece, that is 150 tool definitions sitting in the prompt before the user's actual message is even read. The industry's working answer is to flip the pattern: let the agent write code against the tool catalog rather than receive the entire catalog each turn. This approach is covered in depth by both &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic's engineering team&lt;/a&gt; and &lt;a href="https://blog.cloudflare.com/code-mode/" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Bifrost ships this pattern natively as &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt;. Rather than pushing every tool definition into the context, Code Mode surfaces MCP servers as a virtual filesystem made up of small Python stubs. The model pulls only what it needs through four meta-tools (&lt;code&gt;listToolFiles&lt;/code&gt;, &lt;code&gt;readToolFile&lt;/code&gt;, &lt;code&gt;getToolDocs&lt;/code&gt;, &lt;code&gt;executeToolCode&lt;/code&gt;), and Bifrost runs the resulting script inside a sandboxed Starlark interpreter. Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;controlled MCP benchmarks&lt;/a&gt; recorded a 92.8% reduction in input tokens at 508 tools across 16 servers, with pass rate holding at 100%.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Per-Tool Pricing Problem
&lt;/h3&gt;

&lt;p&gt;Tools themselves cost real money. Paid data providers, search APIs, enrichment vendors, and code execution services each bill per invocation. Bifrost captures these costs at the tool level, driven by a pricing config you define per MCP client, and renders them in the same log view as LLM token spend. The result is a complete cost picture for every agent run, not just the model portion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Traits of an MCP Governance Layer That Scales
&lt;/h2&gt;

&lt;p&gt;Five properties separate an MCP governance layer that actually scales from one that merely exists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One endpoint for the whole MCP fleet.&lt;/strong&gt; Every connected MCP server sits behind a single &lt;code&gt;/mcp&lt;/code&gt; URL that agents target. Adding servers requires no client-side changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization at the tool level.&lt;/strong&gt; Scoping happens per individual tool, is enforced inside the gateway, and remains invisible to anything outside the scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One audit model for models and tools.&lt;/strong&gt; Both LLM calls and tool calls land in a single log schema and are correlated by request ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual-layer cost reporting.&lt;/strong&gt; Token spend and tool spend surface together, with breakdowns by virtual key, by team, and by provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication built on standards.&lt;/strong&gt; PKCE-backed OAuth 2.0, identity-provider hooks, and automatic token refresh replace the pattern of sharing static bearer tokens across services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bifrost rolls all five into its &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;governance stack&lt;/a&gt;, which spans MCP and model traffic alongside the identity and budget primitives connecting them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Move Your MCP Deployments Behind Bifrost
&lt;/h2&gt;

&lt;p&gt;At this point, MCP is the default interface between AI agents and the systems enterprises actually run on, and ungoverned deployments tend to announce themselves loudly. Access drift, audit holes, and unexpected token bills each get worse as the connected server count climbs. The path from early experiments to production-grade AI infrastructure runs through an MCP governance layer, one that preserves control over what agents can reach, what workflows cost, and how each step gets recorded. To see how Bifrost's MCP governance layer works against your current agents and servers, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Classic MCP vs Code Mode: How the Two Patterns Stack Up</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Mon, 20 Apr 2026 20:07:49 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/classic-mcp-vs-code-mode-how-the-two-patterns-stack-up-2fdo</link>
      <guid>https://dev.to/kuldeep_paul/classic-mcp-vs-code-mode-how-the-two-patterns-stack-up-2fdo</guid>
      <description>&lt;p&gt;&lt;em&gt;A side-by-side look at Classic MCP vs MCP Code Mode on context footprint, token cost, latency, and accuracy, plus how Bifrost runs Code Mode at scale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most production agents aren't wired to a single MCP server. A typical stack stitches together search, filesystem, CRM, and several more, with each connected server pushing tool definitions into the model's context window on every turn. Every Classic MCP vs MCP Code Mode conversation eventually lands on the same three questions: when are tools loaded, how are they invoked, and what happens to intermediate results? Bifrost, the open-source AI gateway built by Maxim AI, runs both patterns through its &lt;a href="https://docs.getbifrost.ai/mcp/overview" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt;, enabled per client. The comparison below is anchored in the &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/server/tools" rel="noopener noreferrer"&gt;Model Context Protocol specification&lt;/a&gt; along with published work from Anthropic and Cloudflare.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Classic MCP Tool Calling Works
&lt;/h2&gt;

&lt;p&gt;The default execution model in the Model Context Protocol is what most teams call Classic MCP. Discovery happens up front: the client sends &lt;code&gt;tools/list&lt;/code&gt; to each connected server, gets back JSON Schema definitions for every tool, and loads those definitions into the model's context. Invocation follows the same loop turn by turn. When the model picks a tool, the client issues a &lt;code&gt;tools/call&lt;/code&gt; with the chosen arguments, waits for the result, and writes that result back into the conversation before the next turn. The full discovery-and-invocation sequence is laid out in the &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/server/tools" rel="noopener noreferrer"&gt;MCP tools specification&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With a small tool catalog, the pattern is tidy. As the catalog grows, costs climb quickly. The defining traits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every tool definition stays in context, every turn&lt;/strong&gt;: all connected servers' tools are loaded before the loop starts and remain resident through the entire agent run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serial tool invocation&lt;/strong&gt;: each model turn produces exactly one tool call, the client runs it, and the result has to return before the model can pick a second tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The model sees every intermediate payload&lt;/strong&gt;: results of any size are serialized straight back into the conversation, large ones included.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token cost compounds with server count&lt;/strong&gt;: wire in ten servers of fifteen tools each and the model is carrying 150 tool definitions on every request.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The MCP Code Mode Pattern
&lt;/h2&gt;

&lt;p&gt;In MCP Code Mode, the "one call per turn" loop is replaced by a "write code that orchestrates tools" loop. The gateway no longer hands the model a full tool catalog. Instead, it presents a small set of meta-tools that let the model discover what's available on demand and then submit a single script that chains multiple calls together inside a sandbox. Cloudflare introduced the pattern in their &lt;a href="https://blog.cloudflare.com/code-mode/" rel="noopener noreferrer"&gt;Code Mode post&lt;/a&gt;, built around a straightforward premise: LLMs have been trained on far more real-world code than synthetic tool-calling sequences, which is why they handle messy multi-step workflows more dependably when asked to write code. Anthropic's engineering team ran a &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;parallel study on code execution with MCP&lt;/a&gt;, and their results showed a Google Drive to Salesforce workflow collapsing from about 150,000 tokens to 2,000 under the same approach.&lt;/p&gt;

&lt;p&gt;Mechanically, Code Mode rests on four ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lazy tool discovery&lt;/strong&gt;: servers are listed first, and only the tools the model actually plans to call get their compact stub signatures loaded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox-side orchestration&lt;/strong&gt;: a short script written by the model chains multiple tool calls server-side, keeping the sequence off the conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local intermediate results&lt;/strong&gt;: the model's context only receives the final output; everything between stays inside the sandbox.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded context footprint&lt;/strong&gt;: total cost tracks what the model reads, not how large the underlying tool catalog happens to be.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Classic MCP vs Code Mode, Dimension by Dimension
&lt;/h2&gt;

&lt;p&gt;On every axis that matters to production economics, the two patterns diverge. The breakdown below reflects how &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Bifrost's Code Mode&lt;/a&gt; is built along with the public measurements from Anthropic and Cloudflare:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Classic MCP&lt;/th&gt;
&lt;th&gt;MCP Code Mode&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tools loaded into context&lt;/td&gt;
&lt;td&gt;Entire catalog, on every turn&lt;/td&gt;
&lt;td&gt;Four meta-tools plus stubs fetched on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration pattern&lt;/td&gt;
&lt;td&gt;Single tool call per turn&lt;/td&gt;
&lt;td&gt;One script calling many tools, in one turn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intermediate results&lt;/td&gt;
&lt;td&gt;Routed through model context&lt;/td&gt;
&lt;td&gt;Kept inside the sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical round trips (multi-step)&lt;/td&gt;
&lt;td&gt;Roughly 6 to 10 turns&lt;/td&gt;
&lt;td&gt;Roughly 3 to 4 turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How token cost scales&lt;/td&gt;
&lt;td&gt;Linearly with server count&lt;/td&gt;
&lt;td&gt;Flat; tied to reads, not catalog size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Common failure modes&lt;/td&gt;
&lt;td&gt;Tool misselection, context overflow&lt;/td&gt;
&lt;td&gt;Script errors, sandbox timeouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where it fits best&lt;/td&gt;
&lt;td&gt;1 or 2 small servers, direct calls&lt;/td&gt;
&lt;td&gt;3 or more servers, chained workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap is narrow at small scale and opens quickly as workloads grow. Controlled &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;MCP gateway benchmarks&lt;/a&gt; from Bifrost recorded a 58 percent drop in token usage at 96 tools and a 92 percent drop at 508 tools, while pass rate stayed at 100 percent across all three rounds. On the API side, Cloudflare measured something similar: their &lt;a href="https://blog.cloudflare.com/code-mode-mcp/" rel="noopener noreferrer"&gt;Code Mode MCP server&lt;/a&gt; exposed 2,500 endpoints in roughly 1,000 tokens, against more than 1.17 million under the classic pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Classic MCP Still Makes Sense
&lt;/h2&gt;

&lt;p&gt;Classic MCP has not been retired. For workloads that match its shape, it's often the simpler and faster option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A couple of small servers&lt;/strong&gt;: the fixed cost of spinning through Code Mode's meta-tool cycle isn't justified for a handful of tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-shot, direct calls&lt;/strong&gt;: a weather lookup or a single record fetch is exactly one invocation, and code orchestration adds no value there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard latency budgets&lt;/strong&gt;: Code Mode is generally faster on multi-step work, but for a simple one-shot call Classic MCP skips the extra parse and sandbox step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflows that require explicit per-call human approval&lt;/strong&gt;: Classic MCP lines up cleanly with manual approval gates, without the extra validation that Code Mode layers on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small utility servers can stay on Classic MCP while heavier ones move to Code Mode. Because Bifrost enables Code Mode per client rather than globally, that trade-off is made server by server, not once for the whole gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP Code Mode Pulls Ahead
&lt;/h2&gt;

&lt;p&gt;As the tool surface expands, Code Mode starts earning its added complexity. It becomes the stronger default when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Three or more MCP servers are connected at once&lt;/strong&gt;: under Classic MCP, each added server adds definitions to every request linearly. Code Mode holds the cost flat regardless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflows chain multiple tools together&lt;/strong&gt;: a lookup into a join into a filter into a write takes four round trips in Classic MCP and often a single script execution in Code Mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intermediate payloads are large&lt;/strong&gt;: reading a document and writing to another is the exact scenario behind &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic's 150,000-to-2,000-token benchmark&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bills are dominated by token spend&lt;/strong&gt;: when tool definitions are eating more of the request budget than actual reasoning, Code Mode goes after that waste head-on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Efficiency does not come at the expense of accuracy here. Pass rate held at 100 percent in Bifrost's controlled benchmarks, with Code Mode on and off, across every tool-count tier tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inside Bifrost's MCP Code Mode Implementation
&lt;/h2&gt;

&lt;p&gt;Code Mode in Bifrost is native to the gateway, not bolted on as a plugin or wrapper. Four meta-tools are exposed to the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;listToolFiles&lt;/code&gt;: enumerate the virtual &lt;code&gt;.pyi&lt;/code&gt; stub files across every connected Code Mode server.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;readToolFile&lt;/code&gt;: pull the compact Python function signatures for a chosen server or tool.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getToolDocs&lt;/code&gt;: retrieve full documentation for a single tool when the compact signature isn't sufficient.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;executeToolCode&lt;/code&gt;: execute the orchestration script against live tool bindings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Execution happens inside an embedded Starlark interpreter, a deterministic Python subset with no imports, no file I/O, and no network access. The constraint is intentional: the sandbox exists to call tools and process their outputs, and nothing else. Bindings can be configured at either the server or tool level, so a single stub per server works for compact discovery, while one stub per tool helps when servers carry dozens of tools and per-read context budgets get tight. Code Mode plays well with the rest of Bifrost's MCP stack, including &lt;a href="https://docs.getbifrost.ai/mcp/agent-mode" rel="noopener noreferrer"&gt;Agent Mode auto-execution&lt;/a&gt;, &lt;a href="https://docs.getbifrost.ai/mcp/filtering" rel="noopener noreferrer"&gt;tool filtering&lt;/a&gt;, and per-consumer scoping via &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Auto-execution rules are stricter in Code Mode than in Classic MCP. The submitted Python is parsed, every tool call is extracted, and each one is checked against the per-server auto-execute allowlist. A single call outside that allowlist routes the whole script to manual approval. This closes the obvious loophole where the sandbox could otherwise be used to run tool invocations that would have been rejected under Classic MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking the Right MCP Pattern for Your Agent Stack
&lt;/h2&gt;

&lt;p&gt;The practical read on this Classic MCP vs Code Mode comparison is that the two patterns complement each other rather than compete. For small tool catalogs and one-shot workflows, Classic MCP is still the correct default. Once token cost, latency, and context bloat start to dominate in multi-server agent workflows, MCP Code Mode becomes the better default. Bifrost runs both, which lets teams flip the switch per client and migrate gradually as their MCP footprint keeps growing. Teams evaluating the broader gateway trade-offs alongside this MCP-level choice can walk through the &lt;a href="https://www.getmaxim.ai/bifrost/resources/buyers-guide" rel="noopener noreferrer"&gt;LLM Gateway Buyer's Guide&lt;/a&gt; for a full capability matrix.&lt;/p&gt;

&lt;p&gt;To watch Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; run Code Mode over your own tool catalog, with access control, audit logging, and per-tool cost tracking in place, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI Governance Explained: A Reference Guide for Enterprise Teams</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:55:35 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/ai-governance-explained-a-reference-guide-for-enterprise-teams-1gge</link>
      <guid>https://dev.to/kuldeep_paul/ai-governance-explained-a-reference-guide-for-enterprise-teams-1gge</guid>
      <description>&lt;p&gt;&lt;em&gt;AI governance shapes how enterprises design, run, and oversee AI responsibly. Explore the frameworks, standards, and runtime controls behind safe enterprise AI.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI governance is the combination of policies, processes, and technical enforcement that determines how an organization designs, deploys, and operates artificial intelligence with accountability. Once AI moves from isolated pilots into production systems handling customer records, financial decisions, and regulated workflows, governance becomes the discipline keeping adoption aligned with business risk tolerance, regulatory duties, and ethical commitments. Bifrost, the open-source AI gateway from Maxim AI, is built so that AI governance is enforceable at the infrastructure layer, with every LLM request, tool invocation, and agent action subject to consistent policy rather than ad hoc configuration.&lt;/p&gt;

&lt;p&gt;This guide walks through what AI governance looks like in 2026, why boards now treat it as a priority, the frameworks that give it shape, and how platform teams operationalize it using a gateway-centric approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Governance Actually Means
&lt;/h2&gt;

&lt;p&gt;AI governance is a structured discipline for handling the risks, accountabilities, and lifecycle of AI systems deployed inside an enterprise. Its scope covers who (people, agents, or applications) may invoke which models, what data those models are allowed to see, how outputs are evaluated, how decisions are logged, and how responsibility is assigned when something breaks. Effective AI governance is never a single document or tool; it is a layered mix of policy, process, and runtime enforcement that runs continuously, from the moment a model is selected through its life in production.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.nist.gov/itl/ai-risk-management-framework" rel="noopener noreferrer"&gt;National Institute of Standards and Technology (NIST) AI Risk Management Framework&lt;/a&gt; organizes AI governance around four interlocking functions: Govern, Map, Measure, and Manage. The Govern function sets culture, roles, and policy. Map places each AI system in context alongside its risks. Measure applies both quantitative and qualitative assessment techniques. Manage treats identified risks with concrete controls and response playbooks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The building blocks of an AI governance program
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Access control&lt;/strong&gt;: defining which people, agents, and applications are permitted to call which models and tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy enforcement&lt;/strong&gt;: runtime rules that block, redact, or reroute requests based on their content or context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost and usage limits&lt;/strong&gt;: budgets, quotas, and rate limits applied at the individual, team, and organizational tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability and audit&lt;/strong&gt;: end-to-end logs covering prompts, responses, tool calls, and decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data protection&lt;/strong&gt;: controls over what data reaches external models and how that data is handled downstream&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance mapping&lt;/strong&gt;: alignment with regulatory regimes such as GDPR, HIPAA, SOC 2, and the EU AI Act&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lifecycle management&lt;/strong&gt;: processes governing model onboarding, evaluation, rollout, and retirement&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why AI Governance Has Become a Board-Level Priority
&lt;/h2&gt;

&lt;p&gt;AI governance matters because AI already sits inside most organizations, often without oversight attached. IBM's 2026 study of enterprise AI adoption found that &lt;a href="https://www.ibm.com/think/insights/rising-ai-adoption-creating-shadow-risks" rel="noopener noreferrer"&gt;35% of surveyed Gen Z employees indicated they are likely to rely only on personal AI applications rather than company-approved alternatives&lt;/a&gt;, a pattern that sharply widens the attack surface for data leakage and compliance violations. Shadow AI, meaning the use of unsanctioned tools on corporate data, has become one of the most urgent governance problems facing enterprise teams.&lt;/p&gt;

&lt;p&gt;Meanwhile, the regulatory landscape has hardened in parallel. The &lt;a href="https://artificialintelligenceact.eu/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt; came into force in August 2024. Unacceptable-risk systems have been prohibited since February 2025, general-purpose AI model duties took effect in August 2025, and the core obligations for Annex III high-risk systems become enforceable on August 2, 2026. Fines can reach €35 million, or 7% of worldwide annual turnover, for the most severe violations, a ceiling that sits well above GDPR's.&lt;/p&gt;

&lt;p&gt;Three converging pressures push AI governance onto the board agenda:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory exposure&lt;/strong&gt;: the EU AI Act, state-level AI laws across the United States, and sector rules in financial services and healthcare now demand documented controls, not statements of intent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security and data risk&lt;/strong&gt;: prompt injection attempts, supply chain incidents targeting models, and accidental data exposure are no longer hypothetical scenarios; they surface as recurring production incidents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost sprawl&lt;/strong&gt;: without hard budget controls, multi-provider LLM spend expands faster than most FinOps teams can keep pace with, and reconstructing usage attribution across teams after the fact becomes nearly impossible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Platform teams designing enterprise AI infrastructure can explore &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;Bifrost's approach to enterprise governance&lt;/a&gt; for a detailed view of how gateway-level controls address each of these pressures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Global Standards That Shape AI Governance
&lt;/h2&gt;

&lt;p&gt;Most mature AI governance programs anchor themselves to one or more established frameworks. Four have risen to prominence as the dominant reference points.&lt;/p&gt;

&lt;h3&gt;
  
  
  NIST AI Risk Management Framework
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.nist.gov/itl/ai-risk-management-framework" rel="noopener noreferrer"&gt;NIST AI RMF 1.0&lt;/a&gt;, released in January 2023, is a voluntary framework developed through a multi-stakeholder, consensus-driven process. The framework groups trustworthy AI around a set of characteristics: validity and reliability, safety, security, resilience, accountability, transparency and explainability, privacy, and fairness. For U.S. enterprises and federal contractors, it has become the most widely adopted starting point.&lt;/p&gt;

&lt;h3&gt;
  
  
  EU AI Act
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://artificialintelligenceact.eu/" rel="noopener noreferrer"&gt;EU AI Act, formally Regulation (EU) 2024/1689&lt;/a&gt;, ranks as the first comprehensive horizontal AI law adopted anywhere in the world. Systems are classified into four risk tiers: unacceptable (prohibited), high (Annex III), limited (transparency duties), and minimal. High-risk system providers and deployers have to operate a risk management process, apply data governance controls, keep logs, provide human oversight, and complete conformity assessments. Its extraterritorial scope reaches any organization whose AI outputs reach EU users, wherever the organization itself happens to sit.&lt;/p&gt;

&lt;h3&gt;
  
  
  ISO/IEC 42001
&lt;/h3&gt;

&lt;p&gt;Released in December 2023, &lt;a href="https://www.iso.org/standard/42001" rel="noopener noreferrer"&gt;ISO/IEC 42001&lt;/a&gt; became the first international certifiable management-system standard written specifically for AI. What it specifies is how organizations establish, implement, maintain, and progressively improve an AI Management System (commonly abbreviated AIMS). Much as ISO 27001 became the default certification signal for information security, ISO 42001 is quickly emerging as the way organizations demonstrate to customers and regulators that they govern AI in a disciplined, repeatable fashion.&lt;/p&gt;

&lt;h3&gt;
  
  
  OECD AI Principles
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.oecd.org/en/topics/sub-issues/ai-principles.html" rel="noopener noreferrer"&gt;OECD AI Principles&lt;/a&gt; were adopted in 2019 and refreshed in 2024, and they stand as the first intergovernmental standard aimed at trustworthy AI. They rest on five values-based principles: inclusive growth and well-being; human-centered values and fairness; transparency and explainability; robustness, security, and safety; and accountability. These principles underpin many national AI strategies and align closely with the risk-based logic behind the EU AI Act.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Bifrost Turns AI Governance Into Runtime Enforcement
&lt;/h2&gt;

&lt;p&gt;AI governance policies matter only when they are applied at runtime, on every request, before any provider is contacted. Bifrost makes this possible through a gateway layer positioned between applications and the 20+ LLM providers Bifrost connects to, providing platform teams one consistent choke point for access, budget, and policy rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Virtual keys as the governance primitive
&lt;/h3&gt;

&lt;p&gt;Bifrost's central governance object is the &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual key&lt;/a&gt;. Every developer, team, application, or external customer receives a distinct virtual key that carries its own access policy. The underlying provider API credentials remain locked inside the gateway and are never handed to individual consumers, which removes key sprawl and credential-rotation overhead in one step.&lt;/p&gt;

&lt;p&gt;Virtual keys enforce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model access rules&lt;/strong&gt;: which providers and models a given key is permitted to call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget caps&lt;/strong&gt;: hard spending ceilings with configurable reset windows (daily, weekly, monthly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits&lt;/strong&gt;: per-minute and per-hour maximums on both requests and tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP tool filtering&lt;/strong&gt;: which Model Context Protocol tools are exposed to that key&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hierarchical budget controls
&lt;/h3&gt;

&lt;p&gt;Real enterprises need cost discipline at multiple levels simultaneously. Bifrost supports a hierarchical model that tracks budgets independently at the customer, team, and virtual key tiers. A group of engineers can share a monthly team budget while each developer's personal key also carries an individual cap, giving platform teams two layers of financial guardrails. Teams pairing Bifrost with coding agents can find a detailed example in the &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost MCP gateway writeup&lt;/a&gt; covering access control, cost governance, and token-reduction patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Content safety and guardrails
&lt;/h3&gt;

&lt;p&gt;Access control only covers half of governance. Protecting what goes in and what comes out is the other half. Bifrost's &lt;a href="https://docs.getbifrost.ai/enterprise/guardrails" rel="noopener noreferrer"&gt;enterprise guardrails&lt;/a&gt; integrate with AWS Bedrock Guardrails, Azure Content Safety, and Patronus AI to apply content policies, PII redaction, and safety classification on both request and response paths. Because policies attach to virtual keys, the same rules are applied consistently no matter which application or agent is making the call. For a broader view of these patterns, teams can consult the Bifrost &lt;a href="https://www.getmaxim.ai/bifrost/resources/guardrails" rel="noopener noreferrer"&gt;guardrails resource page&lt;/a&gt;, which covers the enforcement surface in more depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identity, RBAC, and compliance
&lt;/h3&gt;

&lt;p&gt;Enterprise deployments demand that every governance decision trace back to a real identity. For single sign-on, Bifrost plugs into OpenID Connect providers such as Okta and Entra. Role-based access control with custom role definitions is built in, and the gateway produces immutable audit logs that line up with the evidence expectations of SOC 2, GDPR, HIPAA, and ISO 27001. Credential handling can be delegated to backends like AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, or Google Secret Manager, which means secret material never needs to live inside configuration files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability that underpins audit
&lt;/h3&gt;

&lt;p&gt;Governance cannot be verified without observability beneath it. Bifrost emits native Prometheus metrics, supports OpenTelemetry (OTLP) distributed tracing, and exposes request-level logs that can flow into data lakes and SIEMs for long-term retention. Every request is tagged with the virtual key, user ID, provider, model, and token count, which is the baseline metadata regulators and internal audit teams expect to see.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting a Practical AI Governance Program Into Motion
&lt;/h2&gt;

&lt;p&gt;Frameworks and tools are necessary but not sufficient on their own. A workable AI governance program usually moves through five phases.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inventory&lt;/strong&gt;: catalog every AI system, model, and integration in use, shadow AI included. This aligns directly with the "Map" function in the NIST RMF.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classify&lt;/strong&gt;: rank each system by risk tier using criteria drawn from the EU AI Act or an internal rubric. The highest-risk systems attract the tightest controls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralize access&lt;/strong&gt;: route all LLM and agent traffic through a single governed entry point so policy can apply uniformly. At this step, a gateway stops being optional and turns structural.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforce and evaluate&lt;/strong&gt;: apply runtime controls (budgets, rate limits, guardrails, tool filtering) and continuously assess model quality, safety, and compliance outcomes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document and audit&lt;/strong&gt;: maintain evidence of the controls, decisions, and incidents involved. Both ISO/IEC 42001 and the EU AI Act expect records that can be demonstrated, not assertions that cannot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teams in regulated verticals such as &lt;a href="https://www.getmaxim.ai/bifrost/industry-pages/financial-services-and-banking" rel="noopener noreferrer"&gt;financial services&lt;/a&gt; and &lt;a href="https://www.getmaxim.ai/bifrost/industry-pages/healthcare-life-sciences" rel="noopener noreferrer"&gt;healthcare and life sciences&lt;/a&gt; will find deployment patterns that address sector-specific compliance obligations on top of the general governance baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started With AI Governance on Bifrost
&lt;/h2&gt;

&lt;p&gt;AI governance is no longer a policy document tucked away in a risk register. It is now a runtime property of the infrastructure carrying AI traffic through an organization. By consolidating model access, budgets, guardrails, observability, and audit logging inside a single open-source gateway, Bifrost turns governance from a set of intentions into enforced behavior on every request. To see how Bifrost supports enterprise AI governance in production, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a Bifrost demo&lt;/a&gt; with the team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Hidden Cost of Multiple MCP Servers in Agent Infrastructure</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:54:39 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/the-hidden-cost-of-multiple-mcp-servers-in-agent-infrastructure-286b</link>
      <guid>https://dev.to/kuldeep_paul/the-hidden-cost-of-multiple-mcp-servers-in-agent-infrastructure-286b</guid>
      <description>&lt;p&gt;&lt;em&gt;Every MCP server you connect adds hidden cost to agent workloads. See where token bloat, result ping-pong, and governance drift actually come from.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There is a familiar pattern in production AI engineering. An agent ships, more tools get wired in over time, and one day the finance team flags the model-API bill. The hidden cost of multiple MCP servers rarely lands as a single obvious charge. It leaks in through input tokens that creep up by the week, latencies that drift higher, audit trails that never quite come together, and tool-level API invoices that no one attributed to a workflow when they were approved. None of this is accidental. The classic MCP execution model scales these costs in direct proportion to how many servers a team connects. Bifrost, the open-source AI gateway by Maxim AI, targets this as an infrastructure-layer problem, not an application one, and removes the pressure at its source.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mechanics of Multiple MCP Servers in an Agent Runtime
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; is an open standard that lets AI agents reach external systems and tools through a uniform interface. In the default runtime, connecting an agent to an MCP server means injecting that server's entire tool catalog into the model's context window on each request. Five servers with thirty tools apiece means one hundred fifty tool definitions shipped to the model before the user prompt is even parsed. That mechanic is the root of the hidden cost of MCP servers.&lt;/p&gt;

&lt;p&gt;Anthropic's engineering team laid out this pattern in their work on &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;code execution with MCP&lt;/a&gt;, pointing out that rising tool counts and intermediate results moving through context are what drive agent cost and latency upward. Cloudflare reported the same observation in their &lt;a href="https://blog.cloudflare.com/code-mode/" rel="noopener noreferrer"&gt;Code Mode research&lt;/a&gt;, describing how directly exposing MCP tools to the model burns tokens every time the agent has to chain calls together. The two write-ups arrive at the same conclusion from different angles: the protocol itself is sound, but the default way of running it does not hold up at production scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Driver One: Tool Definitions Injected on Every Request
&lt;/h2&gt;

&lt;p&gt;The most direct cost is the token tax on tool definitions. Classic MCP sends the full catalog of tools from every connected server into the context window of each request, whether or not any of them will actually be used in that turn.&lt;/p&gt;

&lt;p&gt;Here is how quickly the math turns against you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Individual MCP servers typically ship twenty to fifty tools each&lt;/li&gt;
&lt;li&gt;Production agents often sit on top of five or more servers&lt;/li&gt;
&lt;li&gt;A single tool definition can range from fifty tokens for a simple function to several hundred for a complex schema&lt;/li&gt;
&lt;li&gt;The entire catalog is re-loaded on every turn of the agent loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic's team has noted that agents with access to thousands of tools may burn hundreds of thousands of tokens on definitions alone before reading the user's input. For a ten-server setup sitting at around one hundred fifty tools, definition overhead frequently becomes the majority of the input token footprint. The cost is folded into the input token line, so it looks like noise on a per-request basis. It only becomes obvious when someone divides the monthly bill by the number of requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Driver Two: Intermediate Results Round-Trip Through the Model
&lt;/h2&gt;

&lt;p&gt;The second cost is less visible but often larger. In the standard MCP execution loop, every tool result is routed back through the model, even when the model's only job is to forward that data into the next tool. A Google Drive to Salesforce workflow is the canonical example from Anthropic's research: the entire meeting transcript flows through the model once on retrieval from Drive, then a second time on the way into Salesforce.&lt;/p&gt;

&lt;p&gt;Anthropic reported that moving the same workflow from direct tool calls to code-based execution dropped input token usage from roughly 150,000 tokens to 2,000 tokens, around a 98.7% reduction. The exact figure belongs to one benchmark, but the ratio illustrates how much of the total MCP cost is simply data transiting the model instead of being handled next to it.&lt;/p&gt;

&lt;p&gt;The pattern compounds as workflows grow longer. Each additional tool call adds another round trip. Every intermediate payload (spreadsheet rows, document bodies, API responses) gets re-serialized into context. Input token counts grow in step with payload size while reasoning quality does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Driver Three: Governance and Observability Gaps
&lt;/h2&gt;

&lt;p&gt;Token cost is the easiest piece of the hidden cost of multiple MCP servers to measure. It is not the most expensive one. Running MCP servers without a centralized gateway pushes several other costs into the system that rarely show up in a budget line:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Credential sprawl&lt;/strong&gt;: Each MCP server connection carries its own credentials, so every new connection widens the security surface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Split audit trails&lt;/strong&gt;: Tool executions are logged by the agent host rather than a single system that ties each call back to the caller, their permissions, and the parent LLM request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opaque tool-level spend&lt;/strong&gt;: Paid third-party APIs invoked through MCP tools generate charges that are hard to pin to a specific agent, workflow, or customer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment drift&lt;/strong&gt;: Every agent maintains its own server list, so tool access ends up inconsistent across dev, staging, and production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope without limits&lt;/strong&gt;: In the absence of tool-level access controls, every agent can reach every tool on every connected server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not hypothetical concerns. A customer-facing agent that can reach internal admin tooling is a real governance incident. An enterprise AI rollout without first-class audit logs on tool calls will not survive a SOC 2 review. Retroactive fixes typically cost more than building the right layer on day one. Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;governance capabilities&lt;/a&gt; show how access control, scoped credentials, and audit trails fit together in one infrastructure layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cutting the Tool List Is a Trade-Off, Not a Fix
&lt;/h2&gt;

&lt;p&gt;The standard advice for shrinking MCP token usage is to cut the tool list. Fewer tools means fewer definitions in context, and per-request overhead drops. Arithmetically that works. As engineering, it does not. Pruning is a trade-off, not a resolution. Every removed tool narrows agent capability, and the team ends up choosing between cost and what the agent can actually do.&lt;/p&gt;

&lt;p&gt;Anthropic's engineering team described the same mismatch. The scaling problem is rooted in the execution architecture, not in the tool count. Tool-list length is only a symptom. A fix that holds up over time has to change how tools get exposed and composed, not just how many of them are on the table at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bifrost's Approach to the Hidden Cost of MCP Servers
&lt;/h2&gt;

&lt;p&gt;Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; puts every connected server behind a single entry point and introduces a different execution path called &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt;. Code Mode removes the token tax without asking teams to drop tools. Rather than pushing every tool definition into context on every request, it presents connected servers as a virtual filesystem of Python stub files. The model reads only the definitions relevant to the task at hand, writes a short orchestration script, and Bifrost runs that script inside a sandboxed Starlark interpreter. The full approach, benchmarks, and operational details live in our deep dive on &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost's MCP gateway and 92% lower token costs at scale&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In Code Mode, the model interacts with four meta-tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;listToolFiles&lt;/strong&gt;: Enumerate the servers and tools available to this agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;readToolFile&lt;/strong&gt;: Pull the Python function signatures for a specific server or tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;getToolDocs&lt;/strong&gt;: Retrieve detailed documentation for a single tool on demand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;executeToolCode&lt;/strong&gt;: Execute the orchestration script against live tool bindings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/benchmarks" rel="noopener noreferrer"&gt;controlled benchmarks&lt;/a&gt; captured how the savings scale with tool count. At 96 tools across 6 servers, input tokens fell 58%. At 251 tools across 11 servers, they fell 84%. At 508 tools across 16 servers, they fell 92%. Pass rate stayed at 100% in every round. Savings compound with MCP footprint instead of eroding, the reverse of the classic MCP curve, and that direction is consistent with what Anthropic and Cloudflare have reported in their own evaluations.&lt;/p&gt;

&lt;p&gt;On the governance axis, Bifrost's &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; give teams a way to issue scoped credentials per consumer, with access enforced at the tool level rather than only the server level. Every tool execution becomes a first-class &lt;a href="https://docs.getbifrost.ai/enterprise/audit-logs" rel="noopener noreferrer"&gt;audit entry&lt;/a&gt; capturing tool name, arguments, result, latency, the virtual key that called it, and the parent LLM request. Per-tool cost tracking sits alongside LLM token cost in the same pane, making spend attribution to a specific agent, customer, or workflow straightforward.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Checklist for Production MCP Infrastructure
&lt;/h2&gt;

&lt;p&gt;A short scoring rubric helps separate durable MCP infrastructure from a collection of server connections held together by config files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context efficiency&lt;/strong&gt;: Does execution cost stay roughly flat as tools are added, or does every new server raise the price of every request?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scoped access&lt;/strong&gt;: Can permissions be set at the tool level, not just the server level, and can they vary between consumers?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit completeness&lt;/strong&gt;: Is each tool call a first-class log record that traces back to a credential and a parent LLM request?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost attribution&lt;/strong&gt;: Can tool API spend be reconciled against LLM token spend per workflow, customer, or agent?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational consolidation&lt;/strong&gt;: Do agents share one MCP entry point, or does each one maintain an independent server list?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A system that answers yes across the board treats multiple MCP servers as a managed fleet rather than a loose set of integrations. For teams comparing gateway options more directly, the &lt;a href="https://www.getmaxim.ai/bifrost/resources/buyers-guide" rel="noopener noreferrer"&gt;LLM Gateway Buyer's Guide&lt;/a&gt; covers MCP, governance, observability, and performance side by side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run Your Agent on Bifrost
&lt;/h2&gt;

&lt;p&gt;The hidden cost of multiple MCP servers is not an edge-case failure mode. It is the expected result of running classic MCP at production scale, and it grows a little more each time a team wires in another server. Bifrost's MCP gateway neutralizes the token tax, the round-trip overhead, and the governance drift at the gateway level, so agent capability stops costing more than it should. To see the hidden cost of MCP infrastructure come down in your own environment, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>MCP Token Costs at Scale: How Code Mode Drives a 92% Reduction</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:52:38 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/mcp-token-costs-at-scale-how-code-mode-drives-a-92-reduction-2fgn</link>
      <guid>https://dev.to/kuldeep_paul/mcp-token-costs-at-scale-how-code-mode-drives-a-92-reduction-2fgn</guid>
      <description>&lt;p&gt;&lt;em&gt;Scaling MCP tools inflates context windows fast. See how Bifrost Code Mode cuts MCP token costs by up to 92.8% at 500+ tools in verified benchmarks.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For teams running production agents, MCP token costs are the infrastructure line item that nobody sees coming. An agent that behaved fine with three connected Model Context Protocol servers turns into a four-figure daily bill once the connected server count passes twenty and the tool catalog crosses a few hundred entries. Neither the model nor the prompt is responsible. The root cause is the default MCP execution path, which ships every tool definition from every connected server into the model's context on every request. Bifrost, the open-source AI gateway from Maxim AI, tackles the problem at its source with a Code Mode execution path that, in controlled benchmarks, dropped input tokens by 92.8% at 508 tools without sacrificing any accuracy.&lt;/p&gt;

&lt;p&gt;What follows is a breakdown of why MCP token costs scale so sharply, why teams across the industry are coalescing around code execution as the answer, and what the Bifrost benchmark tells us about the true cost of running agents in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes MCP Token Costs Blow Up at Scale
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol, originally introduced by Anthropic, is the standard that defines how AI applications wire up to external tools. By default, MCP clients serialize every tool definition from every connected server into the model's context on every single turn. Anthropic's own engineering team laid out the consequence in their writeup on &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;code execution with MCP&lt;/a&gt;: as connected tool counts grow, upfront tool loading combined with intermediate-result round-trips drives up latency and pushes cost in the wrong direction.&lt;/p&gt;

&lt;p&gt;The math is brutal. With five MCP servers at thirty tools each, the model takes in 150 tool schemas before it ever sees the actual user request. Multi-turn workflows make the situation worse because large intermediate payloads (documents, datasets, API responses) loop back through the context more than once. A single cross-system workflow that pulls a transcript from Google Drive and writes it to Salesforce, for instance, can drop from around 150,000 tokens under the default flow to roughly 2,000 tokens when the same job is expressed as code execution, roughly a 98.7% savings.&lt;/p&gt;

&lt;p&gt;Four distinct forces turn MCP adoption into a climbing cost curve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entire tool catalog sent on every turn&lt;/strong&gt;: all tool schemas are serialized into the prompt regardless of whether the model will actually invoke them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round-tripped intermediate outputs&lt;/strong&gt;: tool results travel back through context before being fed into the next tool call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeated catalog loads across agent loops&lt;/strong&gt;: each new turn re-injects the full tool list into context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost scaling with server fanout&lt;/strong&gt;: each new MCP server is roughly linear integration work but extracts a proportional token tax per request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The obvious workaround, pruning the tool list, is not actually a fix. It swaps capability for cost and calls the tradeoff an optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Execution Is Becoming the Default Pattern for MCP
&lt;/h2&gt;

&lt;p&gt;Over the past several months, a cleaner pattern has been emerging across AI infrastructure teams. Rather than presenting tools to the model as a flat set of function-call schemas, the tools are exposed as a typed API, and the model is asked to write a compact program that orchestrates the calls. Documentation is pulled on demand, logic runs locally, and only the final result is handed back.&lt;/p&gt;

&lt;p&gt;Cloudflare took this public first with Code Mode. In the &lt;a href="https://blog.cloudflare.com/code-mode/" rel="noopener noreferrer"&gt;Code Mode announcement&lt;/a&gt;, they reported that rendering an MCP server as a TypeScript API and asking the model to write code against it produced roughly an 81% drop in token usage versus direct tool calling. A follow-up implementation went even further: the &lt;a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/" rel="noopener noreferrer"&gt;Cloudflare MCP server&lt;/a&gt; now fronts the full Cloudflare API, more than 2,500 endpoints spanning DNS, Workers, R2, and Zero Trust, behind two meta-tools (&lt;code&gt;search()&lt;/code&gt; and &lt;code&gt;execute()&lt;/code&gt;) that together consume about 1,000 tokens regardless of catalog size. Wrapping the same surface as a flat tool list would blow past a million tokens, which is larger than the context window of most foundation models.&lt;/p&gt;

&lt;p&gt;Anthropic's engineering team arrived at the same design independently, describing it as a way to give agents more tools while spending fewer tokens. The pattern now has two common names in the wild: code execution with MCP and Code Mode. It has three defining properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP tools are presented to the model as a filesystem of typed API stubs rather than as a flat tool list&lt;/li&gt;
&lt;li&gt;The model only loads the stubs it needs for the current task&lt;/li&gt;
&lt;li&gt;The model writes a short script that executes in a sandbox, invoking tools directly and handing back only the final output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bifrost Code Mode is this pattern, implemented at the gateway layer, inside the same control plane that already runs routing, access control, and observability. Teams evaluating how it fits into a broader architecture can skim the &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;Bifrost MCP gateway resource page&lt;/a&gt; for the complete feature surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Bifrost Code Mode Works: Python Stubs, Meta-Tools, and a Starlark Sandbox
&lt;/h2&gt;

&lt;p&gt;Inside Bifrost, connected MCP servers are rendered as a virtual filesystem of lightweight Python stub files. Python was chosen over JavaScript deliberately. Large language models have encountered far more real-world Python than any other language during training, which translates into higher first-pass success rates on generated orchestration scripts. A dedicated documentation tool trims the footprint further, letting the model pull doc strings for a specific tool only at the moment it is about to call it.&lt;/p&gt;

&lt;p&gt;Four meta-tools give the model everything it needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;listToolFiles&lt;/code&gt;: surface the available servers and tools&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;readToolFile&lt;/code&gt;: fetch the Python function signatures for a given server or tool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getToolDocs&lt;/code&gt;: pull detailed documentation for a specific tool before invoking it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;executeToolCode&lt;/code&gt;: run the orchestration script against the live tool bindings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Generated code is executed inside a Starlark interpreter sandbox that blocks imports, file I/O, and network access. That restriction keeps runs deterministic, fast, and safe to trigger automatically inside an agent loop. Platform teams can choose server-level stubs for compact discovery or tool-level stubs for finer permission control. Because tool scoping is enforced through &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt;, a model that lacks permission to call a tool never sees that tool's definition in the first place. The broader governance picture, including MCP Tool Groups and per-tool cost accounting, is documented in the Bifrost engineering deep-dive on &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;MCP access control, cost governance, and 92% lower token costs at scale&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers: Three Benchmark Rounds Across 96, 251, and 508 Tools
&lt;/h2&gt;

&lt;p&gt;The savings were measured with three controlled rounds, flipping Code Mode on and off while scaling the tool count between rounds. The same query set was used against the same models in every configuration, and pass rate was tracked throughout to confirm the reduction did not come at the expense of accuracy. The full methodology sits alongside additional performance data on the &lt;a href="https://www.getmaxim.ai/bifrost/resources/benchmarks" rel="noopener noreferrer"&gt;Bifrost benchmarks resource page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Headline outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Round 1, 96 tools across 6 servers&lt;/strong&gt;: input tokens moved from 19.9M to 8.3M (−58.2%); estimated cost dropped from $104.04 to $46.06 (−55.7%); pass rate stayed at 100% in both configurations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round 2, 251 tools across 11 servers&lt;/strong&gt;: input tokens moved from 35.7M to 5.5M (−84.5%); estimated cost dropped from $180.07 to $29.80 (−83.4%); pass rate reached 100% with Code Mode enabled&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round 3, 508 tools across 16 servers&lt;/strong&gt;: input tokens moved from 75.1M to 5.4M (−92.8%); estimated cost dropped from $377.00 to $29.00 (−92.2%); pass rate stayed at 100% in both configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two patterns emerge from the data. First, the savings are nonlinear. They compound as the MCP footprint grows, because the default pattern's cost tracks with tool count while Code Mode's cost tracks with what the model actually reads. Second, the gain did not cost accuracy. Pass rate held at 100% across Rounds 1 and 3 and matched that number in Round 2. The complete raw data, query set, and methodology are published in the &lt;a href="https://github.com/maximhq/bifrost-benchmarking/blob/main/mcp-code-mode-benchmark/benchmark_report.md" rel="noopener noreferrer"&gt;Bifrost MCP Code Mode benchmark report&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading the Cost Curve: What It Says About MCP Economics
&lt;/h2&gt;

&lt;p&gt;The most instructive part of the benchmark is the shape of the savings curve itself. Around 100 tools, Code Mode produces a solid but unspectacular advantage. At 250 tools, the gap widens noticeably. By 500 tools, the two approaches operate in entirely different cost regimes, with roughly 14× fewer input tokens per query and a total cost ratio near 13 to 1.&lt;/p&gt;

&lt;p&gt;Three takeaways follow for teams architecting AI infrastructure in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context economics, not tool count, sets the ceiling on agent capability.&lt;/strong&gt; The right question has shifted from "how many tools can we connect?" to "how many tools can we afford to expose on every turn?" Code execution removes that ceiling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP governance and MCP cost are the same problem wearing two hats.&lt;/strong&gt; The cleanest way to stop paying for tool definitions that go unused is to stop injecting them into context by default. Scoped access through virtual keys, tool groups, and per-tool bindings shrinks both the blast radius and the token bill simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The gateway layer is the correct place to solve this.&lt;/strong&gt; Implementing code execution once per agent or per application is fragile and duplicative. Solving it inside a gateway that already handles routing, authentication, and observability gives every MCP consumer the same economics with zero client changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Rolling Out Code Mode in Production
&lt;/h2&gt;

&lt;p&gt;Switching Code Mode on inside Bifrost is a configuration change, not a migration project. In practice, the rollout that has worked best follows four steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Register the MCP clients&lt;/strong&gt;: add each MCP server along with its connection type (HTTP, SSE, STDIO, or in-process). Bifrost discovers the available tools and starts syncing them on a configurable interval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flip Code Mode on per client&lt;/strong&gt;: toggle Code Mode in the client settings and the four meta-tools replace the flat tool catalog automatically. No schema changes and no redeployment are involved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mark safe tools as auto-executable&lt;/strong&gt;: add read-only tools to the auto-execute allowlist. &lt;code&gt;executeToolCode&lt;/code&gt; only becomes auto-executable once every tool the generated script calls is itself on the allowlist, which keeps write operations behind an explicit approval gate by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope consumers with virtual keys and MCP Tool Groups&lt;/strong&gt;: issue a scoped credential per consumer and bundle tools into named groups that attach to keys, teams, or customers. Access and &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;enterprise AI governance&lt;/a&gt; policies are applied at request time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every tool invocation is written to an audit log as a first-class entry with the tool name, the source server, the arguments passed in, the result returned, the latency, the virtual key that triggered the call, and the parent LLM request that started the agent loop. That level of telemetry is what turns the cost curve into something teams can audit rather than just anecdote.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Cutting MCP Token Costs Today
&lt;/h2&gt;

&lt;p&gt;When tool exposure is decoupled from context loading, MCP token costs stop being the ceiling on how far an agent can scale. The Bifrost benchmark at 508 tools across 16 servers delivered a 92.8% drop in input tokens and a 92.2% drop in estimated cost with no loss of accuracy, and the gap keeps widening as the tool catalog grows. To see how Bifrost handles MCP token cost optimization, governance, and observability in a live environment, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a Bifrost demo&lt;/a&gt; with the team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Bifrost MCP Gateway: Cutting Token Costs in Claude Code and Codex CLI by 92%</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:54:00 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/bifrost-mcp-gateway-cutting-token-costs-in-claude-code-and-codex-cli-by-92-2o1b</link>
      <guid>https://dev.to/kuldeep_paul/bifrost-mcp-gateway-cutting-token-costs-in-claude-code-and-codex-cli-by-92-2o1b</guid>
      <description>&lt;p&gt;&lt;em&gt;Bifrost MCP Gateway cuts token costs in Claude Code and Codex CLI by up to 92% through Code Mode, tool filtering, and unified governance.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Claude Code, Codex CLI, and every other coding agent on the market share one expensive habit: they consume tokens at an alarming rate. Plug in a handful of MCP servers for filesystem access, GitHub operations, internal APIs, or database tooling, and the full tool catalog gets serialized into the agent's context on every loop iteration. Most engineering teams notice the damage only after the monthly bill lands. Bifrost MCP Gateway addresses the underlying problem by rethinking how tools reach the model, pairing &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt; with per-consumer &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; plus fine-grained tool filtering, so coding agents burn a small portion of what they would otherwise waste. In controlled tests spanning 508 tools across 16 MCP servers, token usage collapsed by 92.8% while the pass rate stayed pinned at 100%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tool Bloat in MCP Drains Coding Agent Tokens
&lt;/h2&gt;

&lt;p&gt;The default behavior of classic MCP is costly: every tool schema from every connected server gets pushed into the model's prompt on every request. For a coding agent fronted by five MCP servers carrying thirty tools apiece, that means 150 tool schemas land before the model has parsed the first line of your instruction. Push the setup further, to 16 servers with roughly 500 tools, and the problem compounds, because classic MCP resends every definition on every call regardless of which tools the model will invoke.&lt;/p&gt;

&lt;p&gt;Anthropic's own engineering team called this out directly. A recent writeup on &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;code execution with MCP&lt;/a&gt; walked through a Drive-to-Salesforce workflow where context fell from 150,000 tokens down to 2,000 once tool definitions were loaded lazily instead of upfront. The same dynamic bites anyone driving Claude Code or Codex CLI against many MCP servers, since the bulk of token spend goes to catalogs the model never touches on that particular turn.&lt;/p&gt;

&lt;p&gt;Two downstream effects follow. First, inference cost scales with the size of your MCP footprint rather than with the work you want the agent to accomplish. Second, coding agents slow down as their tool catalog expands, because the model spends more of its context budget digesting schemas instead of reasoning through code. Claude Code's own docs note that &lt;a href="https://code.claude.com/docs/en/mcp" rel="noopener noreferrer"&gt;tool search is on by default&lt;/a&gt; specifically to dampen this effect, but client-side patches do not fix the problem when many teams, agents, and customers share a common tool fleet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Token Math Behind Claude Code and Codex CLI
&lt;/h2&gt;

&lt;p&gt;A familiar pattern keeps surfacing in coding agent deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A developer wires Claude Code or Codex CLI to a filesystem MCP server, a GitHub server, and several internal tool servers.&lt;/li&gt;
&lt;li&gt;Each server publishes between ten and fifty tools.&lt;/li&gt;
&lt;li&gt;Completing a non-trivial task takes the agent loop six to ten turns.&lt;/li&gt;
&lt;li&gt;Every turn reinjects the full tool list into the prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With 150 tool schemas running a few hundred tokens apiece, a single ten-turn coding task can readily consume 300K input tokens before producing a useful response. Multiply across hundreds of daily runs per engineer and the math compounds into thousands of dollars per month in raw schema overhead. Tool selection accuracy also suffers, since the model has to pick the right option out of dozens of irrelevant candidates.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Bifrost MCP Gateway Attacks Token Costs at the Root
&lt;/h2&gt;

&lt;p&gt;Bifrost is Maxim AI's open-source AI gateway, written in Go and adding only 11 microseconds of overhead at 5,000 requests per second. It plays both sides of the MCP protocol: it acts as an MCP client against upstream tool servers and as an MCP server that exposes a single &lt;a href="https://docs.getbifrost.ai/mcp/gateway-url" rel="noopener noreferrer"&gt;&lt;code&gt;/mcp&lt;/code&gt; endpoint&lt;/a&gt; to Claude Code, Codex CLI, Cursor, and other clients. Cost reduction for coding agents flows from three layers working in concert.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Mode: stubs on demand, not full schema dumps
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode&lt;/a&gt; is the core engine. Rather than pushing every tool schema into context, Bifrost presents upstream MCP servers as a virtual filesystem of lightweight Python stub files. Four meta-tools let the model walk that catalog lazily:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;listToolFiles&lt;/code&gt;: see which servers and tools are reachable&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;readToolFile&lt;/code&gt;: pull compact Python function signatures for a specific server or tool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getToolDocs&lt;/code&gt;: retrieve detailed documentation for a tool before invoking it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;executeToolCode&lt;/code&gt;: execute an orchestration script against live tool bindings inside a sandboxed Starlark runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model loads only the stubs actually relevant to the current task, composes a short script to chain the tools, and submits that script through &lt;code&gt;executeToolCode&lt;/code&gt;. Bifrost runs it in the sandbox, chains the underlying calls, and hands back only the final result. Intermediate outputs never round-trip through the prompt.&lt;/p&gt;

&lt;p&gt;Code Mode offers two binding granularities. Server-level binding bundles all tools from a server into one stub file, well-suited to servers carrying a modest number of tools. Tool-level binding gives each tool its own stub, which helps when a server ships thirty-plus tools with dense schemas. Both modes rely on the same four meta-tools. Teams evaluating broader options can also review Bifrost's dedicated &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; resources on centralized tool discovery and governance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool filtering: narrow what each coding agent can see
&lt;/h3&gt;

&lt;p&gt;Claude Code and Codex CLI rarely need unrestricted access to every tool behind the gateway. Bifrost's &lt;a href="https://docs.getbifrost.ai/mcp/filtering" rel="noopener noreferrer"&gt;tool filtering&lt;/a&gt; lets you define, per virtual key, the exact MCP tool set exposed. A key provisioned for a CI agent might be restricted to read-only operations. A key issued to a human developer's Claude Code session might cover the full catalog. Whatever scope you choose, the model only ever sees tools it is cleared to invoke, keeping context size and blast radius tight.&lt;/p&gt;

&lt;h3&gt;
  
  
  One &lt;code&gt;/mcp&lt;/code&gt; endpoint for centralized discovery
&lt;/h3&gt;

&lt;p&gt;Instead of registering multiple MCP servers inside every coding agent's config, teams point Claude Code or Codex CLI at Bifrost's single &lt;code&gt;/mcp&lt;/code&gt; endpoint. Every connected server is discovered and governed centrally. Add a new MCP server to Bifrost and it becomes available to every connected coding agent automatically, with no client-side config edits required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Results: 92% Cost Reduction at Scale
&lt;/h2&gt;

&lt;p&gt;Bifrost ran three rounds of controlled benchmarks, toggling Code Mode on and off while stepping tool count upward between rounds to measure how savings behave as MCP footprints grow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Round&lt;/th&gt;
&lt;th&gt;Tools × Servers&lt;/th&gt;
&lt;th&gt;Input Tokens (OFF)&lt;/th&gt;
&lt;th&gt;Input Tokens (ON)&lt;/th&gt;
&lt;th&gt;Token Reduction&lt;/th&gt;
&lt;th&gt;Cost Reduction&lt;/th&gt;
&lt;th&gt;Pass Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;96 tools · 6 servers&lt;/td&gt;
&lt;td&gt;19.9M&lt;/td&gt;
&lt;td&gt;8.3M&lt;/td&gt;
&lt;td&gt;−58.2%&lt;/td&gt;
&lt;td&gt;−55.7%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;251 tools · 11 servers&lt;/td&gt;
&lt;td&gt;35.7M&lt;/td&gt;
&lt;td&gt;5.5M&lt;/td&gt;
&lt;td&gt;−84.5%&lt;/td&gt;
&lt;td&gt;−83.4%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;508 tools · 16 servers&lt;/td&gt;
&lt;td&gt;75.1M&lt;/td&gt;
&lt;td&gt;5.4M&lt;/td&gt;
&lt;td&gt;−92.8%&lt;/td&gt;
&lt;td&gt;−92.2%&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two observations stand out. Savings are not linear, they compound as MCP footprint grows, because classic MCP ships every schema on every call while Code Mode's cost is bounded by what the model actively reads. Accuracy holds too: pass rate sits at 100% in every round. The complete report lives in the &lt;a href="https://github.com/maximhq/bifrost-benchmarking/blob/main/mcp-code-mode-benchmark/benchmark_report.md" rel="noopener noreferrer"&gt;Bifrost MCP Code Mode benchmarks repo&lt;/a&gt;, and further &lt;a href="https://www.getmaxim.ai/bifrost/resources/benchmarks" rel="noopener noreferrer"&gt;performance benchmarks&lt;/a&gt; document Bifrost's overhead profile under production load.&lt;/p&gt;

&lt;p&gt;For a deeper look at how Code Mode sits alongside governance and audit, the &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost MCP Gateway overview post&lt;/a&gt; walks through access control, cost attribution, and tool groups in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting Bifrost MCP Gateway in Front of Claude Code and Codex CLI
&lt;/h2&gt;

&lt;p&gt;Placing Claude Code or Codex CLI behind Bifrost takes only a few minutes. The &lt;a href="https://docs.getbifrost.ai/cli-agents/claude-code" rel="noopener noreferrer"&gt;Claude Code integration guide&lt;/a&gt; and &lt;a href="https://docs.getbifrost.ai/cli-agents/codex-cli" rel="noopener noreferrer"&gt;Codex CLI integration guide&lt;/a&gt; cover the full configuration. The essential steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run Bifrost locally or inside your VPC, then attach upstream MCP servers through the dashboard (HTTP, SSE, and STDIO transports are all supported).&lt;/li&gt;
&lt;li&gt;Turn Code Mode on per MCP client; no schema changes or redeployment are needed.&lt;/li&gt;
&lt;li&gt;Issue a virtual key for each consumer (human developer, CI pipeline, customer integration) and bind it to the tool set it is cleared to call.&lt;/li&gt;
&lt;li&gt;Point Claude Code or Codex CLI at Bifrost's &lt;code&gt;/mcp&lt;/code&gt; endpoint, passing the virtual key as credential.&lt;/li&gt;
&lt;li&gt;Where team-wide or customer-wide scope matters more than per-key scope, reach for &lt;a href="https://docs.getbifrost.ai/mcp/filtering" rel="noopener noreferrer"&gt;MCP Tool Groups&lt;/a&gt; instead.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the agent is wired up, each tool call is captured as a first-class log entry containing tool name, source server, arguments, result, latency, virtual key, and the parent LLM request that triggered the loop. That puts token-level cost tracking and per-tool cost tracking side by side, making spend attribution straightforward. Teams onboarding multiple terminal-based coding agents can also reference Bifrost's broader &lt;a href="https://www.getmaxim.ai/bifrost/resources/cli-agents" rel="noopener noreferrer"&gt;CLI coding agent resources&lt;/a&gt; for integration patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Gain Beyond Token Savings
&lt;/h2&gt;

&lt;p&gt;Token cost reduction is the headline outcome, but coding agents running through Bifrost MCP Gateway also inherit capabilities most teams otherwise build internally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scoped access&lt;/strong&gt; that restricts each coding agent to the tools it genuinely needs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit trails&lt;/strong&gt; where every tool execution is recorded with full arguments and results, which accelerates security reviews and debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health monitoring&lt;/strong&gt; covering automatic reconnection when upstream servers fail, plus periodic refresh to surface newly published tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth 2.0 with PKCE&lt;/strong&gt; for MCP servers that demand user-scoped auth, including dynamic client registration and automatic token refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified model routing&lt;/strong&gt;, since the same gateway that governs MCP traffic also handles provider routing, failover, and load balancing across 20+ LLM providers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams running Claude Code or Codex CLI at scale, the &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;Bifrost MCP gateway resource page&lt;/a&gt; and the &lt;a href="https://www.getmaxim.ai/bifrost/resources/claude-code" rel="noopener noreferrer"&gt;Claude Code integration resource&lt;/a&gt; cover deployment patterns and cost-saving configurations in greater depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Reducing Coding Agent Token Costs Today
&lt;/h2&gt;

&lt;p&gt;Token cost in coding agents stops being a rounding error once you hit production scale. When Claude Code, Codex CLI, and every other agent in the fleet push full tool catalogs on every turn, the invoice outruns the value delivered. Bifrost MCP Gateway brings those token costs back to heel by loading tool definitions lazily, scoping access through virtual keys, and consolidating every MCP server behind a single endpoint, without trading capability or accuracy.&lt;/p&gt;

&lt;p&gt;To see how Bifrost can cut token costs across your coding agent fleet, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;schedule a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Code Mode in Bifrost MCP Gateway: How Sandboxed Python Cuts Token Costs</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:52:36 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/code-mode-in-bifrost-mcp-gateway-how-sandboxed-python-cuts-token-costs-320e</link>
      <guid>https://dev.to/kuldeep_paul/code-mode-in-bifrost-mcp-gateway-how-sandboxed-python-cuts-token-costs-320e</guid>
      <description>&lt;p&gt;&lt;em&gt;With Code Mode in Bifrost MCP Gateway, agents orchestrate tools through short Python scripts, trimming token consumption by as much as 92% with no loss of capability.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Code Mode in Bifrost MCP Gateway replaces the conventional execution path, where every tool schema lands in the model's context on every request, with a compact scripting layer. Rather than pushing hundreds of tool definitions into the prompt, Bifrost surfaces four lightweight meta-tools and lets the model assemble a short Python program to coordinate the work. Across controlled benchmarks with more than 500 connected tools, this model-driven scripting approach has cut input tokens by up to 92.8% while keeping pass rate pinned at 100%. For any team operating production AI agents across several &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; servers, Code Mode is what separates a predictable AI bill from a runaway one.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Working Definition of Code Mode in Bifrost MCP Gateway
&lt;/h2&gt;

&lt;p&gt;At its core, Code Mode in Bifrost MCP Gateway is an orchestration mode in which the AI model composes Python to invoke MCP tools, rather than firing them individually through the standard function-calling loop. Connected MCP servers get projected as a virtual filesystem of Python stub files (&lt;code&gt;.pyi&lt;/code&gt; signatures), and the model pulls only the tools it actually needs. It then writes a script that wires those tools together, and Bifrost runs that script inside a sandboxed &lt;a href="https://github.com/bazelbuild/starlark" rel="noopener noreferrer"&gt;Starlark&lt;/a&gt; interpreter. Only the final result gets returned to the model's context.&lt;/p&gt;

&lt;p&gt;The design targets the context-bloat problem that surfaces the moment a team hooks up more than a handful of MCP servers. In the classic execution flow, every tool definition from every server is packed into the prompt on every turn. Five servers with thirty tools each means 150 schemas in context before the model has even read the user's message. Code Mode severs that coupling, so context cost is bounded by what the model chooses to read, not by how many tools sit in the registry. Teams evaluating &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway options&lt;/a&gt; often hit this ceiling first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Default MCP Execution Model Breaks Down on Cost
&lt;/h2&gt;

&lt;p&gt;Standard MCP usage hands the gateway the job of injecting every available tool schema into every LLM call. That works fine for demos and early prototypes. In production, three problems show up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token spend grows with every connected server.&lt;/strong&gt; The classic flow transmits the full tool catalog on each request and each intermediate turn of an agent loop. Plugging in more MCP servers makes the situation worse, not better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency climbs alongside context size.&lt;/strong&gt; Longer tool catalogs mean longer prompts, which drive up time-to-first-token and overall request latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Just prune the tool list" is a compromise, not a solution.&lt;/strong&gt; Dropping tools to manage cost means dropping capability. Teams end up juggling separate, artificially narrow tool sets for different agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Public engineering work has quantified this pattern. &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic's engineering team&lt;/a&gt; reported a drop from 150,000 to 2,000 tokens on a Google Drive to Salesforce workflow once tool calls were swapped out for code execution, and &lt;a href="https://blog.cloudflare.com/code-mode" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt; explored a parallel approach with a TypeScript runtime. Bifrost's Code Mode applies the same insight directly inside the &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;Bifrost MCP gateway&lt;/a&gt;, with two deliberate calls: Python rather than JavaScript (LLMs see considerably more Python in training), and a dedicated documentation meta-tool that squeezes context usage down further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inside Code Mode: The Four Meta-Tools
&lt;/h2&gt;

&lt;p&gt;Whenever Code Mode is active on an MCP client, Bifrost automatically injects four generic meta-tools into every request in place of the direct tool schemas that the classic flow would otherwise load.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Meta-tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;listToolFiles&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Discover which servers and tools are available as virtual &lt;code&gt;.pyi&lt;/code&gt; stub files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;readToolFile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load compact Python function signatures for a specific server or tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;getToolDocs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fetch detailed documentation for a specific tool before using it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;executeToolCode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run an orchestration script against the live tool bindings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Navigation through the tool catalog happens on demand. The model lists stub files, opens only the signatures it needs, optionally pulls detailed docs for a specific tool, and finally emits a short Python script that Bifrost executes in the sandbox. Both server-level and tool-level bindings are supported: one stub per server for compact discovery, or one stub per tool when more granular lookups are needed. The four-tool interface is identical across both modes. Full configuration details live in the &lt;a href="https://docs.getbifrost.ai/mcp/code-mode" rel="noopener noreferrer"&gt;Code Mode configuration reference&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Sandbox Allows (and Blocks)
&lt;/h3&gt;

&lt;p&gt;Model-generated scripts run inside a Starlark interpreter, a deterministic Python-like language that Google originally built for configuring its build system. The sandbox is intentionally tight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No imports&lt;/li&gt;
&lt;li&gt;No file I/O&lt;/li&gt;
&lt;li&gt;No network access&lt;/li&gt;
&lt;li&gt;Only tool calls against the permitted bindings and basic Python-like control flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That scope makes execution fast, deterministic, and safe enough to run under &lt;a href="https://docs.getbifrost.ai/mcp/agent-mode" rel="noopener noreferrer"&gt;Agent Mode&lt;/a&gt; with auto-execution turned on. Because they are read-only, the three meta-tools &lt;code&gt;listToolFiles&lt;/code&gt;, &lt;code&gt;readToolFile&lt;/code&gt;, and &lt;code&gt;getToolDocs&lt;/code&gt; are always auto-executable. &lt;code&gt;executeToolCode&lt;/code&gt; becomes auto-executable only once every tool its generated script calls appears on the configured allow-list.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Code Mode Lowers Token Costs in Real Workflows
&lt;/h2&gt;

&lt;p&gt;Take a multi-step e-commerce workflow: look up a customer, pull their order history, apply a discount, then send a confirmation. The gap between classic MCP and Code Mode shows up in the shape of the context, not just in the final output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classic MCP flow:&lt;/strong&gt; Every turn drags the full tool list along with it. Every intermediate tool result flows back through the model. With 10 MCP servers and more than 100 tools, most of each prompt gets spent on tool definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Mode flow:&lt;/strong&gt; The model reads a single stub file, writes one script that chains the calls together, and Bifrost runs that script inside the sandbox. Intermediate results stay in the sandbox. Only the compact final output reaches the model's context.&lt;/p&gt;

&lt;p&gt;Three rounds of controlled benchmarks comparing Code Mode on and off, scaling tool count between rounds, are published by Bifrost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Input tokens (off)&lt;/th&gt;
&lt;th&gt;Input tokens (on)&lt;/th&gt;
&lt;th&gt;Token reduction&lt;/th&gt;
&lt;th&gt;Cost reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;96 tools / 6 servers&lt;/td&gt;
&lt;td&gt;19.9M&lt;/td&gt;
&lt;td&gt;8.3M&lt;/td&gt;
&lt;td&gt;-58.2%&lt;/td&gt;
&lt;td&gt;-55.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;251 tools / 11 servers&lt;/td&gt;
&lt;td&gt;35.7M&lt;/td&gt;
&lt;td&gt;5.5M&lt;/td&gt;
&lt;td&gt;-84.5%&lt;/td&gt;
&lt;td&gt;-83.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;508 tools / 16 servers&lt;/td&gt;
&lt;td&gt;75.1M&lt;/td&gt;
&lt;td&gt;5.4M&lt;/td&gt;
&lt;td&gt;-92.8%&lt;/td&gt;
&lt;td&gt;-92.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Savings compound as tool count grows: the classic flow pays for every definition on every call, while Code Mode's bill is bounded by what the model actually reads. Pass rate held at 100% across all three rounds, confirming that efficiency did not come at the cost of accuracy. Bifrost's broader &lt;a href="https://www.getmaxim.ai/bifrost/resources/benchmarks" rel="noopener noreferrer"&gt;performance benchmarks&lt;/a&gt; cover the surrounding architecture, and the complete methodology and results for Code Mode are documented in the &lt;a href="https://github.com/maximhq/bifrost-benchmarking/blob/main/mcp-code-mode-benchmark/benchmark_report.md" rel="noopener noreferrer"&gt;Bifrost MCP Code Mode benchmark report&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How this cascades through production, including cost governance, access control, and per-tool pricing, is covered end-to-end in the &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost MCP Gateway launch post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Code Mode Matters for Enterprise AI Teams
&lt;/h2&gt;

&lt;p&gt;Token cost is just one reason Code Mode pays off in production. For platform and infrastructure teams running AI agents at scale, Code Mode opens up a set of operational properties that classic MCP execution cannot match:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capability without a cost penalty.&lt;/strong&gt; Every MCP server a team needs (internal APIs, search, databases, filesystem, CRM) can be connected without incurring a per-request token tax on each tool definition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable scaling.&lt;/strong&gt; Adding an MCP server no longer inflates the context window of every downstream agent. Per-request cost stays flat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quicker execution.&lt;/strong&gt; Fewer, larger model turns, with sandboxed orchestration between them, cut end-to-end latency compared to turn-by-turn tool invocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic workflows.&lt;/strong&gt; Orchestration logic sits in a deterministic Starlark script instead of being reassembled across several stochastic model turns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditable execution.&lt;/strong&gt; Every tool call inside a Code Mode script still shows up as a first-class log entry in Bifrost, carrying tool name, server, arguments, result, latency, virtual key, and parent LLM request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paired with Bifrost's &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys and governance&lt;/a&gt;, Code Mode slots into the broader pattern enterprise AI teams need: capability, cost control, and governance handled at the infrastructure layer rather than stitched onto each agent. For a wider view of how this pattern extends, Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;governance capabilities&lt;/a&gt; cover the full policy surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning Code Mode On for a Bifrost MCP Client
&lt;/h2&gt;

&lt;p&gt;Code Mode is a per-client toggle. Any MCP client connected to Bifrost (STDIO, HTTP, SSE, or in-process via the Go SDK) can be flipped between classic mode and Code Mode without a redeployment or a schema change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Connect an MCP server
&lt;/h3&gt;

&lt;p&gt;Open the MCP section of the Bifrost dashboard and add a client. Give it a name, choose the connection type, and supply the endpoint or command. Bifrost then discovers the server's tools and keeps them in sync on a configurable interval, with each client appearing in the list alongside a live health indicator. Complete setup instructions are in the &lt;a href="https://docs.getbifrost.ai/mcp/connecting-to-servers" rel="noopener noreferrer"&gt;connecting to MCP servers guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Flip on Code Mode
&lt;/h3&gt;

&lt;p&gt;Open the client's settings and turn Code Mode on. From that point, Bifrost stops packing the full tool catalog into context for that client. Starting with the next request, the model receives the four meta-tools and walks the tool filesystem on demand. Token usage on agent loops drops immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Set up auto-execution
&lt;/h3&gt;

&lt;p&gt;Tool calls need manual approval by default. To let the agent loop run autonomously, allowlist specific tools under the auto-execute settings. Allowlisting is per-tool, so &lt;code&gt;filesystem_read&lt;/code&gt; can auto-execute while &lt;code&gt;filesystem_write&lt;/code&gt; stays behind an approval gate. Under Code Mode, the three read-only meta-tools are always auto-executable, and &lt;code&gt;executeToolCode&lt;/code&gt; gets auto-execution only when every tool its script invokes sits on the allow-list.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Scope access using virtual keys
&lt;/h3&gt;

&lt;p&gt;Pair Code Mode with &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; to scope tool access per consumer. A virtual key tied to a customer-facing agent can be locked down to a specific subset of tools, while an internal admin key gets broader reach. Tools outside a virtual key's scope are invisible to the model, so prompt-level workarounds go away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Code Mode in Bifrost MCP Gateway
&lt;/h2&gt;

&lt;p&gt;Code Mode is the pragmatic answer to the question every team running MCP in production eventually asks: how do we keep adding capability without watching our token bill go exponential? By pulling orchestration out of prompts and into sandboxed Python, Code Mode in Bifrost MCP Gateway delivers as much as 92% lower token costs, quicker agent execution, and complete auditability, all through a single per-client switch. It works with any MCP server, plugs into virtual keys and tool groups for access control, and fits cleanly into the &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway architecture&lt;/a&gt; alongside Bifrost's LLM routing, fallbacks, and observability.&lt;/p&gt;

&lt;p&gt;To see what Code Mode in Bifrost MCP Gateway can do on your own agent workloads, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a Bifrost demo&lt;/a&gt; with the team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Running Claude Code and Other Coding Agents Through the Bifrost CLI</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:51:20 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/running-claude-code-and-other-coding-agents-through-the-bifrost-cli-4a3g</link>
      <guid>https://dev.to/kuldeep_paul/running-claude-code-and-other-coding-agents-through-the-bifrost-cli-4a3g</guid>
      <description>&lt;p&gt;&lt;em&gt;One Bifrost CLI command launches Claude Code, Codex CLI, Gemini CLI, and Opencode. No environment variables, MCP tools attached automatically, every model in one place.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Coding agents like Claude Code get wired into your &lt;a href="https://docs.getbifrost.ai/overview" rel="noopener noreferrer"&gt;Bifrost AI gateway&lt;/a&gt; by a single command when you use the Bifrost CLI. Rather than hand-editing base URLs, rotating API keys between providers, and touching each agent's own config file, engineers just type &lt;code&gt;bifrost&lt;/code&gt; in a terminal, pick which agent they want, pick a model, and get to work. This guide covers how the Bifrost CLI works with Claude Code and every other supported coding agent, starting with gateway setup and moving through features like tabbed sessions, git worktrees, and automatic MCP attach.&lt;/p&gt;

&lt;p&gt;Engineering teams now lean on coding agents as a default part of how they ship. Anthropic states that &lt;a href="https://www.anthropic.com/product/claude-code" rel="noopener noreferrer"&gt;the majority of code at Anthropic is now written by Claude Code&lt;/a&gt;, and its engineers increasingly spend their time on architecture, code review, and orchestration instead of writing every line themselves. As the number of agents in daily use grows (Claude Code for large refactors, Codex CLI for quick fixes, Gemini CLI for specific models), configuration overhead stacks up fast. That overhead is what the Bifrost CLI is designed to eliminate, collapsing multiple agent-specific setups into one launcher.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Bifrost CLI Actually Does
&lt;/h2&gt;

&lt;p&gt;The Bifrost CLI is an interactive terminal launcher that fronts every supported coding agent with your Bifrost gateway. Provider setup, model picking, API key injection, and MCP auto-attach all happen under the hood. Bifrost itself is the &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;open-source AI gateway by Maxim AI&lt;/a&gt;, giving teams unified access to 20+ LLM providers behind one OpenAI-compatible API with just 11 microseconds of overhead at 5,000 requests per second.&lt;/p&gt;

&lt;p&gt;Four coding agents are supported out of the box today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; (binary: &lt;code&gt;claude&lt;/code&gt;, provider path: &lt;code&gt;/anthropic&lt;/code&gt;), with automatic MCP attach and git worktree support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; (binary: &lt;code&gt;codex&lt;/code&gt;, provider path: &lt;code&gt;/openai&lt;/code&gt;), where &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt; is pointed at &lt;code&gt;{base}/openai/v1&lt;/code&gt; and the model is passed via &lt;code&gt;-model&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; (binary: &lt;code&gt;gemini&lt;/code&gt;, provider path: &lt;code&gt;/genai&lt;/code&gt;), with model override through &lt;code&gt;-model&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opencode&lt;/strong&gt; (binary: &lt;code&gt;opencode&lt;/code&gt;, provider path: &lt;code&gt;/openai&lt;/code&gt;), using a generated Opencode runtime config to load custom models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Per-agent integration specifics live in the &lt;a href="https://docs.getbifrost.ai/cli-agents/overview" rel="noopener noreferrer"&gt;CLI agents documentation&lt;/a&gt;. For a broader view of how Bifrost fits into terminal-first developer workflows, the &lt;a href="https://www.getmaxim.ai/bifrost/resources/cli-agents" rel="noopener noreferrer"&gt;CLI coding agents resource page&lt;/a&gt; walks through the full set of supported agents and integration patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Run Coding Agents Through Bifrost
&lt;/h2&gt;

&lt;p&gt;Putting Claude Code and every other coding agent behind Bifrost gives an engineering org three concrete wins: one entry point for every model, centralized governance over agent spend, and a shared MCP tool layer. Instead of each engineer wiring API keys into their own agent and each agent carrying its own tool config, Bifrost becomes the single control plane.&lt;/p&gt;

&lt;h3&gt;
  
  
  One Interface to Every Model
&lt;/h3&gt;

&lt;p&gt;By default Claude Code runs on Claude Opus and Sonnet, but teams frequently want more flexibility. Certain tasks map better to GPT-4o from OpenAI or a Gemini model from Google, whether for language coverage, framework compatibility, or cost. When you launch Claude Code through the Bifrost CLI, it talks to Bifrost's OpenAI-compatible API, which means any of the 20+ providers Bifrost covers (OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure OpenAI, Gemini, Groq, Mistral, Cohere, Cerebras, Ollama, and others) can sit behind your coding agent. This is possible because of Bifrost's &lt;a href="https://docs.getbifrost.ai/features/drop-in-replacement" rel="noopener noreferrer"&gt;drop-in replacement&lt;/a&gt; design: the agent believes it is calling OpenAI or Anthropic directly, and Bifrost handles the routing behind that illusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Governance and Spend Control
&lt;/h3&gt;

&lt;p&gt;Token budgets go fast when coding agents are in play. A single multi-file refactor inside Claude Code can chew through hundreds of thousands of tokens, and usage scales roughly linearly with the size of your team. &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;Bifrost governance&lt;/a&gt; treats virtual keys as the primary governance entity, which lets you set budgets, rate limits, and model access permissions per engineer or per team. Senior engineers can be permitted to run expensive reasoning models, while more junior ones default to cost-efficient options. Every token gets attributed, shows up in dashboards, and stays within virtual-key budgets. For the full enterprise picture, the &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;enterprise governance resource page&lt;/a&gt; goes through the governance model in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shared MCP Tools Across the Team
&lt;/h3&gt;

&lt;p&gt;MCP tools add real leverage to every coding agent (filesystem access, database queries, GitHub integration, docs lookup, internal APIs), but configuring MCP servers separately inside each agent for each engineer is tedious. Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; puts that configuration in one place. When the Bifrost CLI fires up Claude Code, Bifrost's MCP endpoint is attached automatically, so every tool configured in Bifrost is immediately usable in the agent, no &lt;code&gt;claude mcp add-json&lt;/code&gt; calls or hand-edited JSON files required. Teams that are standardizing on MCP for internal tools and data access feel this the most. For more detail on how that architecture compounds into token savings, read &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost MCP Gateway access control, cost governance, and 92% lower token costs&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisite: A Running Bifrost Gateway
&lt;/h2&gt;

&lt;p&gt;You need a running Bifrost gateway for the CLI to connect to. Starting one takes zero configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default the gateway comes up on &lt;code&gt;http://localhost:8080&lt;/code&gt;. Opening that URL in a browser gives you the web UI for adding providers, setting up virtual keys, and turning on features like semantic caching or observability. Docker works equally well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull maximhq/bifrost
docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/data:/app/data maximhq/bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-v $(pwd)/data:/app/data&lt;/code&gt; flag mounts persistent storage so your configuration survives container restarts. For more advanced setup (different ports, log levels, file-based config, PostgreSQL-backed persistence), every flag and mode is documented in the &lt;a href="https://docs.getbifrost.ai/quickstart/gateway/setting-up" rel="noopener noreferrer"&gt;gateway setup guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Once at least one provider is configured in the gateway, the CLI is ready to launch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the Bifrost CLI
&lt;/h2&gt;

&lt;p&gt;The Bifrost CLI runs on Node.js 18+ and installs via npx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that first run the &lt;code&gt;bifrost&lt;/code&gt; binary is on your path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bifrost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need to pin a specific CLI version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost-cli &lt;span class="nt"&gt;--cli-version&lt;/span&gt; v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Starting a Claude Code Session via the Bifrost CLI
&lt;/h2&gt;

&lt;p&gt;Typing &lt;code&gt;bifrost&lt;/code&gt; drops you into an interactive TUI that walks through five setup steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Base URL&lt;/strong&gt;: Point the CLI at your Bifrost gateway (usually &lt;code&gt;http://localhost:8080&lt;/code&gt; in local dev).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual Key (optional)&lt;/strong&gt;: If &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual key authentication&lt;/a&gt; is on, enter your key here. Virtual keys get written to your OS keyring (macOS Keychain, Windows Credential Manager, Linux Secret Service), not to plaintext files on disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose a Harness&lt;/strong&gt;: Pick Claude Code from the list. The CLI reports install status and version. If Claude Code is missing, it offers to install the binary via npm for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select a Model&lt;/strong&gt;: The CLI hits your gateway's &lt;code&gt;/v1/models&lt;/code&gt; endpoint and shows a searchable list of available models. Type to filter, arrow through the list, or paste in any model ID manually (for instance, &lt;code&gt;anthropic/claude-sonnet-4-5-20250929&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Launch&lt;/strong&gt;: Check the configuration summary and hit Enter.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From there the CLI handles every required environment variable, applies the provider-specific config, and starts Claude Code right in your current terminal. You are now inside Claude Code as usual, except every request is flowing through Bifrost.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Auto-Attach for Claude Code
&lt;/h3&gt;

&lt;p&gt;Whenever Claude Code starts through the Bifrost CLI, the CLI registers Bifrost's MCP endpoint at &lt;code&gt;/mcp&lt;/code&gt; automatically, making all your configured MCP tools available from inside Claude Code. When a virtual key is in use, the CLI also configures authenticated MCP access with the proper &lt;code&gt;Authorization&lt;/code&gt; header. No &lt;code&gt;claude mcp add-json&lt;/code&gt; invocations are needed on your end. For the other harnesses (Codex CLI, Gemini CLI, Opencode), the CLI prints out the MCP server URL and you wire it into the agent's settings manually. Teams going deeper on this workflow can review Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/claude-code" rel="noopener noreferrer"&gt;Claude Code integration resources&lt;/a&gt; for provider failover, cost tracking, and MCP attach patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tabbed Session Interface
&lt;/h2&gt;

&lt;p&gt;Rather than exiting when a session ends, the Bifrost CLI keeps you inside a tabbed terminal UI. A tab bar at the bottom shows the CLI version, one tab for each running or recent agent session, and a status badge on each tab:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 means the session is changing right now (the agent is actively working)&lt;/li&gt;
&lt;li&gt;✅ means the session is idle and waiting&lt;/li&gt;
&lt;li&gt;🔔 means the session raised an actual terminal alert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hitting &lt;code&gt;Ctrl+B&lt;/code&gt; focuses the tab bar at any time. Once there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;n&lt;/code&gt; spawns a new tab and launches another agent session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;x&lt;/code&gt; closes the current tab&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;h&lt;/code&gt; / &lt;code&gt;l&lt;/code&gt; navigate left and right between tabs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;19&lt;/code&gt; jump straight to a tab by number&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Esc&lt;/code&gt; / &lt;code&gt;Enter&lt;/code&gt; / &lt;code&gt;Ctrl+B&lt;/code&gt; take you back to the active session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pays off when you want to flip between Claude Code on one task and Gemini CLI on another, or run two Claude Code sessions in parallel against separate branches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Git Worktrees with Claude Code
&lt;/h2&gt;

&lt;p&gt;Worktree support ships for Claude Code, which lets sessions run in isolated git worktrees so parallel development stays clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @maximhq/bifrost-cli &lt;span class="nt"&gt;-worktree&lt;/span&gt; feature-branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TUI also exposes worktree mode during setup. Under the hood the CLI passes the &lt;code&gt;--worktree&lt;/code&gt; flag through to Claude Code, which spins up a fresh working directory on the specified branch. That enables patterns like running two Claude Code agents at once, one on &lt;code&gt;main&lt;/code&gt; and one on a feature branch, with no file conflicts between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration File and CLI Flags
&lt;/h2&gt;

&lt;p&gt;CLI configuration persists at &lt;code&gt;~/.bifrost/config.json&lt;/code&gt;. The file gets created on first run and updates as you make changes in the TUI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"base_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"default_harness"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"default_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic/claude-sonnet-4-5-20250929"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Virtual keys are not written to this file, they live in your OS keyring.&lt;/p&gt;

&lt;p&gt;Flags the CLI accepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;config &amp;lt;path&amp;gt;&lt;/code&gt;: Load a custom &lt;code&gt;config.json&lt;/code&gt; (handy for per-project gateway setups)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;no-resume&lt;/code&gt;: Skip the resume flow and start a fresh setup&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;worktree &amp;lt;n&amp;gt;&lt;/code&gt;: Spin up a git worktree for the session (Claude Code only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the summary screen, shortcut keys let you change settings without restarting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;u&lt;/code&gt; swaps the base URL&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;v&lt;/code&gt; updates the virtual key&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;h&lt;/code&gt; moves to a different harness&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;m&lt;/code&gt; picks a different model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;w&lt;/code&gt; sets a worktree name (Claude Code only)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;d&lt;/code&gt; opens the Bifrost dashboard in the browser&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;l&lt;/code&gt; toggles harness exit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Jumping Between Coding Agents
&lt;/h2&gt;

&lt;p&gt;The real value of the Bifrost CLI shows up when you want to switch agents quickly. Ending a Claude Code session lands you back at the summary screen with your previous configuration still in place. Tap &lt;code&gt;h&lt;/code&gt; to swap Claude Code for Codex CLI, tap &lt;code&gt;m&lt;/code&gt; to try GPT-4o instead of Claude Sonnet, then hit Enter to relaunch. Base URLs, API keys, model flags, agent-specific settings: the CLI reconfigures all of it on your behalf.&lt;/p&gt;

&lt;p&gt;Opencode gets two extra behaviors: the CLI produces a provider-qualified model reference and a runtime config so Opencode comes up with the correct model, and it keeps whatever theme is already defined in your &lt;code&gt;tui.json&lt;/code&gt;, falling back to the adaptive system theme when nothing is set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflows That Show Up in Practice
&lt;/h2&gt;

&lt;p&gt;A handful of patterns keep appearing among teams running the Bifrost CLI with coding agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Side-by-side agent comparison&lt;/strong&gt;: An engineer opens a tab with Claude Code on a task, opens a second tab with Codex CLI on the same task, and compares the outputs. Because traffic all flows through Bifrost, each request is logged and tied back to the same virtual key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worktree-driven parallel work&lt;/strong&gt;: A single engineer runs Claude Code against a bug fix in one worktree and another Claude Code session against a feature in a different worktree, with both tabs visible at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different models for different tasks&lt;/strong&gt;: Claude Opus takes the heavy architectural refactors, Gemini covers documentation-heavy work, and a local Ollama model picks up the small edits. None of that requires leaving the CLI or redoing config.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team-wide MCP tool sharing&lt;/strong&gt;: Platform engineers wire MCP servers up once inside the Bifrost dashboard (filesystem access, internal APIs, database tools), and every engineer's Claude Code session picks those tools up automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Fixing Common Problems
&lt;/h2&gt;

&lt;p&gt;A few snags come up regularly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"npm not found in path"&lt;/strong&gt;: The CLI relies on npm to install missing harnesses. Make sure Node.js 18+ is installed and &lt;code&gt;npm --version&lt;/code&gt; resolves cleanly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent binary not found after install&lt;/strong&gt;: Either restart your terminal or put npm's global bin on your &lt;code&gt;PATH&lt;/code&gt; with &lt;code&gt;export PATH="$(npm config get prefix)/bin:$PATH"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model list empty&lt;/strong&gt;: Check that your Bifrost gateway answers at the configured base URL, confirm at least one provider is set up, and (if virtual keys are on) verify your key is permitted to list models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual key getting dropped between sessions&lt;/strong&gt;: OS keyring storage is what preserves it. On Linux, make sure &lt;code&gt;gnome-keyring&lt;/code&gt; or &lt;code&gt;kwallet&lt;/code&gt; is active. If the keyring is unreachable, the CLI logs a warning and keeps running, but you'll re-enter the key each session.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Up and Running with the Bifrost CLI
&lt;/h2&gt;

&lt;p&gt;The Bifrost CLI makes every coding agent a first-class citizen of your AI gateway. Engineers stop wrestling with environment variables and per-agent config files, and platform teams get centralized governance, observability, and MCP tool management spanning every agent their org uses. Claude Code, Codex CLI, Gemini CLI, and Opencode all route through one launcher, one credential set, and one dashboard.&lt;/p&gt;

&lt;p&gt;To begin using the Bifrost CLI with Claude Code or any other supported coding agent, bring up a gateway with &lt;code&gt;npx -y @maximhq/bifrost&lt;/code&gt;, install the CLI with &lt;code&gt;npx -y @maximhq/bifrost-cli&lt;/code&gt;, and walk through the setup. Teams looking at Bifrost for production coding agent workflows can &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team to see how the CLI, MCP gateway, and governance layer work together at scale.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Bifrost's Interactive Prompt Playground: Author, Version, and Ship Prompts From the Gateway</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:49:16 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/bifrosts-interactive-prompt-playground-author-version-and-ship-prompts-from-the-gateway-1pmp</link>
      <guid>https://dev.to/kuldeep_paul/bifrosts-interactive-prompt-playground-author-version-and-ship-prompts-from-the-gateway-1pmp</guid>
      <description>&lt;p&gt;&lt;em&gt;Build, test, and version prompts inside Bifrost's interactive prompt playground, then promote committed versions to production through a single HTTP header.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every LLM application has a control layer, and that layer is its prompts. They set the tone, define guardrails, guide tool selection, and steer reasoning, yet most engineering teams still keep them buried as hardcoded strings inside application code. An interactive prompt playground changes the situation by giving engineers, product managers, and QA a single workspace to draft, run, and version prompts before anything reaches production. Bifrost embeds this workflow directly into the AI gateway, which means the version you iterate on in the UI is the same artifact your application invokes in production. No separate tool, no parallel SDK, no additional network hop.&lt;/p&gt;

&lt;p&gt;The sections below walk through how the Bifrost &lt;a href="https://docs.getbifrost.ai/features/prompt-repository/playground" rel="noopener noreferrer"&gt;prompt repository and playground&lt;/a&gt; are structured, how sessions and versions keep experimentation safe, and how committed versions attach to live inference traffic through simple HTTP headers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining the Interactive Prompt Playground
&lt;/h2&gt;

&lt;p&gt;An interactive prompt playground is a workspace where developers write messages, execute them against real LLM providers, inspect the completions, adjust parameters, and save versions without redeploying code. Think of it as a REPL for natural-language instructions: compose a prompt, run it, review the output, tune it, and repeat. A production-grade playground layers version control, cross-provider testing, and a clean promotion path from draft to deployed prompt on top of that core loop.&lt;/p&gt;

&lt;p&gt;What makes Bifrost different is that its playground lives inside the gateway itself. Placement is the whole point here. Every run you kick off in the playground travels through the same routing, governance, observability, and key management that carries your production traffic. There is no sandbox with surprise differences from production; you are testing on production infrastructure with a UI attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Bifrost Prompt Repository Is Organized
&lt;/h2&gt;

&lt;p&gt;Four concepts shape the Bifrost prompt repository, and each one mirrors how engineering teams actually work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Folders&lt;/strong&gt;: Logical containers for prompts, generally grouped by product area, feature, or use case. A folder takes a name and an optional description, and prompts can either live inside folders or sit at the root.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt;: The primary unit in the repository. Each prompt is a container that holds the full lifecycle of one prompt template, from early drafts through to production-ready releases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sessions&lt;/strong&gt;: Editable working copies used for experimentation. You can tweak messages, swap providers, change parameters, and run the prompt as many times as you like without affecting any committed version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versions&lt;/strong&gt;: Immutable snapshots of a prompt. Once committed, a version is locked. Each version captures the complete message history, the provider and model configuration, the model parameters, and a commit message.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Numbering is sequential (v1, v2, v3, and so on), and any previous version can be restored from the dropdown next to the Commit Version button. That structure is the minimum bar every &lt;a href="https://www.getmaxim.ai/articles/prompt-versioning-and-its-best-practices-2025/" rel="noopener noreferrer"&gt;prompt versioning workflow&lt;/a&gt; should clear: immutable history, a clear commit trail, and one-click rollback.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workspace Layout and a First Run
&lt;/h2&gt;

&lt;p&gt;A three-panel layout keeps authoring, testing, and configuration on screen at the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sidebar (left)&lt;/strong&gt;: Browse prompts, manage folders, and reorganize items with drag-and-drop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playground (center)&lt;/strong&gt;: Compose and run your prompt messages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings (right)&lt;/strong&gt;: Choose provider, model, API key, variables, and model parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A first run typically follows this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a folder if you want to group related prompts by team or feature.&lt;/li&gt;
&lt;li&gt;Create a new prompt and drop it into a folder.&lt;/li&gt;
&lt;li&gt;Add messages in the playground: system messages for instructions, user messages for input, and assistant messages for few-shot examples.&lt;/li&gt;
&lt;li&gt;Configure the provider, model, and parameters from the settings panel.&lt;/li&gt;
&lt;li&gt;Click Run (or press Cmd/Ctrl + S) to execute. The + Add button appends a message to history without triggering a run.&lt;/li&gt;
&lt;li&gt;Save the session to keep your work, then commit a version once you are happy with it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A red asterisk appears next to the prompt name whenever a session has unsaved edits. Saved sessions can be renamed and reopened from the dropdown next to the Save button, which keeps parallel experimental branches accessible without crowding the version history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Across Providers From Inside the Gateway
&lt;/h2&gt;

&lt;p&gt;Comparing behavior across models is one of the hardest parts of prompt engineering. A system prompt that performs well on one provider can return noticeably different completions on another. In the Bifrost playground, switching providers and models happens right in the settings panel, with every run travelling through Bifrost's unified OpenAI-compatible interface.&lt;/p&gt;

&lt;p&gt;Because the playground runs on top of &lt;a href="https://docs.getbifrost.ai/providers/supported-providers/overview" rel="noopener noreferrer"&gt;Bifrost's 20+ supported providers&lt;/a&gt;, a single prompt can be tried against OpenAI, Anthropic, AWS Bedrock, Google Vertex AI, Azure OpenAI, Groq, Mistral, Cohere, and more, all without switching tools or re-entering credentials. The API key used for a run is also configurable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto&lt;/strong&gt;: Picks the first available key for the chosen provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific key&lt;/strong&gt;: Uses a particular key for this run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual key&lt;/strong&gt;: Uses a governance-managed key with its own budgets, rate limits, and access controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Routing playground traffic through &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; means experiments remain inside the same budgets, quotas, and audit logs that cover everything else. Prompt experimentation no longer acts as a governance blind spot and instead behaves like any other controlled engineering activity. Teams that need to go deeper can explore Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;governance capabilities&lt;/a&gt; for policy enforcement, RBAC, and access control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Message Roles and Multimodal Content
&lt;/h2&gt;

&lt;p&gt;The playground supports every message role and artifact type that real agent workflows demand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System messages&lt;/strong&gt; for behavior and instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User messages&lt;/strong&gt; for input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assistant messages&lt;/strong&gt; for model responses or few-shot examples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calls&lt;/strong&gt; for function calls issued by the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool results&lt;/strong&gt; for mock or real responses from the invoked tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That coverage is what lifts the playground beyond single-turn chat. Teams building agents can replay a complete tool-use loop, trace how the model selects which tool to call, and catch the cases where a reasoning chain breaks. For any model that accepts multimodal input, user messages can also carry attachments such as images and PDFs, which become available automatically once the selected model supports them. Teams wiring up MCP-based tool calls can pair the playground with Bifrost's &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; for centralized tool discovery and governance across every MCP server in use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Version Control for Prompts Headed to Production
&lt;/h2&gt;

&lt;p&gt;Production prompts deserve the same rigor as application code. An &lt;a href="https://dev.to/kuldeep_paul/mastering-prompt-versioning-best-practices-for-scalable-llm-development-2mgm"&gt;analysis of prompt versioning best practices&lt;/a&gt; calls out immutability, commit messages, and traceable rollback as the three pillars of a reliable workflow, and Bifrost's version model maps directly onto all three.&lt;/p&gt;

&lt;p&gt;Committing a version freezes the following into an immutable snapshot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The chosen message history (system, user, assistant, tool calls, tool results).&lt;/li&gt;
&lt;li&gt;The provider and model configuration.&lt;/li&gt;
&lt;li&gt;The model parameters, including temperature, max tokens, streaming flag, and any other settings.&lt;/li&gt;
&lt;li&gt;A commit message explaining the change.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whenever the current session has drifted from the last committed version, an &lt;strong&gt;Unpublished Changes&lt;/strong&gt; badge surfaces. That removes any ambiguity about what is actually shipping. If a teammate opens the prompt a week later and sees v7, they can be confident that v7 is still exactly what it was on the day it was committed, no matter how much session-level iteration has happened since.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Committed Prompt Versions in Production
&lt;/h2&gt;

&lt;p&gt;A playground only pays off when the prompts it generates run unchanged in production. Bifrost closes that loop through the &lt;a href="https://docs.getbifrost.ai/features/prompt-repository/prompts-plugin" rel="noopener noreferrer"&gt;Prompts plugin&lt;/a&gt;, which attaches committed versions to live inference requests with zero client-side prompt management code required.&lt;/p&gt;

&lt;p&gt;Behavior is controlled by two HTTP headers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bf-prompt-id&lt;/code&gt;: UUID of the prompt in the repository. Required to activate injection.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bf-prompt-version&lt;/code&gt;: Integer version number (for example, &lt;code&gt;3&lt;/code&gt; for v3). Optional, and when omitted the latest committed version is used.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugin resolves the requested prompt and version, folds the stored model parameters into the request (request values win on conflicts), and prepends the version's message history to the incoming &lt;code&gt;messages&lt;/code&gt; (Chat Completions) or &lt;code&gt;input&lt;/code&gt; (Responses API). Your application still sends the dynamic user turn; the template itself comes from the repository.&lt;/p&gt;

&lt;p&gt;A Chat Completions request ends up looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"bf-prompt-id: YOUR-PROMPT-UUID"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"bf-prompt-version: 3"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"x-bf-vk: sk-bf-your-virtual-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "openai/gpt-5.4",
    "messages": [
      { "role": "user", "content": "Tell me about Bifrost Gateway?" }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the plugin maintains an in-memory cache that refreshes whenever prompts are created, updated, or deleted through the gateway APIs, new commits become visible to production without any process restart. Prompt releases get fully decoupled from application deploys, which is the outcome every mature prompt management setup is trying to reach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Gateway-Native Playground Changes the Math
&lt;/h2&gt;

&lt;p&gt;Most LLM teams end up operating three or four tools stitched together: one for authoring prompts, one for evaluation, one for routing, and one for observability. Every boundary between those tools creates a place where a prompt that worked in staging ends up different from the one that actually runs in production. A gateway-native playground collapses those boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identical execution path&lt;/strong&gt;: Playground runs and production runs share the same routing, fallbacks, caching, and guardrails. There is no "but it worked in the playground" category of bug.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared governance&lt;/strong&gt;: Virtual keys, budgets, rate limits, and audit logs apply to experimentation in exactly the same way they apply to production traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One source of truth&lt;/strong&gt;: Committed versions sit in the same config store that serves inference. A production request always references the precise artifact you committed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No extra SDK&lt;/strong&gt;: Clients keep using standard OpenAI-compatible APIs with two optional headers. There is no prompt-fetching library to pin, upgrade, or babysit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teams that want deeper evaluation, scenario simulation, and live-traffic quality monitoring can combine the Bifrost playground with Maxim AI's evaluation stack, but the core loop of authoring, testing, versioning, and serving prompts already lives inside Bifrost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started With the Bifrost Prompt Playground
&lt;/h2&gt;

&lt;p&gt;An interactive prompt playground turns prompt engineering into a disciplined, collaborative practice: folders for organization, sessions for safe iteration, versions for immutable releases, and HTTP headers for production attachment. Because it ships as part of the &lt;a href="https://www.getmaxim.ai/bifrost" rel="noopener noreferrer"&gt;Bifrost AI gateway&lt;/a&gt;, you get it alongside multi-provider routing, governance, caching, and observability, with no second platform to run.&lt;/p&gt;

&lt;p&gt;To see how Bifrost can unify prompt management with your AI gateway, browse the &lt;a href="https://www.getmaxim.ai/bifrost/resources" rel="noopener noreferrer"&gt;Bifrost resources hub&lt;/a&gt; or &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Reduce MCP Token Costs for Claude Code Without Losing Capability</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Thu, 16 Apr 2026 19:47:03 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/reduce-mcp-token-costs-for-claude-code-without-losing-capability-2djn</link>
      <guid>https://dev.to/kuldeep_paul/reduce-mcp-token-costs-for-claude-code-without-losing-capability-2djn</guid>
      <description>&lt;p&gt;&lt;em&gt;Cut MCP token costs for Claude Code by up to 92% using Bifrost's MCP gateway, Code Mode orchestration, and centralized tool governance.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Wiring Claude Code up to more than a few MCP servers tends to produce the same outcome: token consumption rises, responses slow down, and the monthly bill lands higher than anyone forecasted. The tools are not the real issue. The problem sits in how the Model Context Protocol (MCP) injects tool definitions into context on every single request. To reduce MCP token costs for Claude Code without stripping away functionality, teams need an infrastructure tier that controls tool exposure, caches what can be cached, and shifts orchestration out of the model prompt. Bifrost, the open-source AI gateway built by Maxim AI, is designed for exactly this role. This guide breaks down where MCP token costs actually come from, what Claude Code's built-in features can and cannot handle, and how Bifrost's &lt;a href="https://docs.getbifrost.ai/mcp/overview" rel="noopener noreferrer"&gt;MCP gateway&lt;/a&gt; combined with Code Mode trims token usage by as much as 92% in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP Token Costs Come From in Claude Code
&lt;/h2&gt;

&lt;p&gt;MCP token costs balloon because tool schemas are loaded into every message, not once per session. Each MCP server connected to Claude Code pushes its complete tool catalog, including names, descriptions, parameter schemas, and expected outputs, into the model's context with every turn. Hook up five servers carrying thirty tools each and the model is reading 150 tool definitions before the user's prompt even arrives.&lt;/p&gt;

&lt;p&gt;The numbers have been measured. One recent breakdown found that &lt;a href="https://www.jdhodges.com/blog/claude-code-mcp-server-token-costs/" rel="noopener noreferrer"&gt;a typical four-server MCP setup in Claude Code adds around 7,000 tokens of overhead per message, with heavier setups crossing 50,000 tokens before a single prompt is typed&lt;/a&gt;. A separate teardown reported &lt;a href="https://www.mindstudio.ai/blog/claude-code-mcp-server-token-overhead" rel="noopener noreferrer"&gt;multi-server configurations commonly adding 15,000 to 20,000 tokens of overhead per turn on usage-based billing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Three dynamics amplify the pain as workloads scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Loading on every message&lt;/strong&gt;: Tool definitions reload with every turn, so a 50-message conversation pays that overhead 50 separate times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idle tools still charge you&lt;/strong&gt;: A Playwright server's 22 browser tools tag along even when the task is editing a Python script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wordy descriptions&lt;/strong&gt;: Open-source MCP servers often ship with long, human-friendly tool descriptions that inflate per-tool token consumption.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Token overhead is more than a line on an invoice. It squeezes the working context the model needs for the actual task, which erodes output quality in long sessions and triggers compaction earlier than it should.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code's Built-In Optimizations Cover
&lt;/h2&gt;

&lt;p&gt;Anthropic has shipped several optimizations that handle the straightforward cases. Mapping what they cover helps clarify where an external layer still has to carry the load.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://code.claude.com/docs/en/costs" rel="noopener noreferrer"&gt;Claude Code's official cost management guidance&lt;/a&gt; recommends a mix of tool search deferral, prompt caching, auto-compaction, model tiering, and custom hooks. Tool search is the most relevant mechanism for MCP: once total tool definitions cross a threshold, Claude Code defers them so only tool names enter context until Claude actually calls one. That can save 13,000+ tokens in intensive sessions.&lt;/p&gt;

&lt;p&gt;These client-side controls help, but they leave three gaps for teams running MCP in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No centralized governance&lt;/strong&gt;: Tool deferral is a local optimization. It gives platform teams no control over which tools a specific developer, team, or customer integration is permitted to call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No orchestration layer&lt;/strong&gt;: Even with deferral, multi-step tool workflows still pay for schema loads, intermediate tool outputs, and model round-trips at every step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cross-session visibility&lt;/strong&gt;: Individual developers can run &lt;code&gt;/context&lt;/code&gt; and &lt;code&gt;/mcp&lt;/code&gt; to inspect their own sessions, but there is no organization-wide view of which MCP tools are draining tokens across the team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a solo developer running Claude Code on a laptop with two or three servers, the built-in optimizations are enough. For a platform team rolling Claude Code out to dozens or hundreds of engineers on shared MCP infrastructure, they are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Bifrost Cuts MCP Token Costs for Claude Code
&lt;/h2&gt;

&lt;p&gt;Bifrost sits between Claude Code and the fleet of MCP servers your team depends on. Rather than Claude Code talking to each server directly, it connects to Bifrost's single &lt;code&gt;/mcp&lt;/code&gt; endpoint. Bifrost takes over discovery, tool governance, execution, and the orchestration pattern that actually moves the needle on token cost: Code Mode.&lt;/p&gt;

&lt;p&gt;The evidence is documented in &lt;a href="https://www.getmaxim.ai/bifrost/blog/bifrost-mcp-gateway-access-control-cost-governance-and-92-lower-token-costs-at-scale" rel="noopener noreferrer"&gt;Bifrost's MCP gateway cost benchmark&lt;/a&gt;, where input tokens dropped 58% with 96 tools connected, 84% with 251 tools, and 92% with 508 tools, all while pass rate stayed at 100%. Teams evaluating &lt;a href="https://www.getmaxim.ai/bifrost/resources/mcp-gateway" rel="noopener noreferrer"&gt;MCP gateway options&lt;/a&gt; can see the centralized tool discovery architecture in more depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Mode: orchestration that stops paying the per-turn schema tax
&lt;/h3&gt;

&lt;p&gt;Code Mode is the single biggest contributor to token reduction. Rather than pushing every MCP tool definition into context, Bifrost exposes connected MCP servers as a virtual filesystem of lightweight Python stub files. The model reads only what it needs, writes a short Python script that orchestrates the tools, and Bifrost runs that script inside a sandboxed Starlark interpreter.&lt;/p&gt;

&lt;p&gt;Regardless of how many MCP servers are wired up, the model works with only four meta-tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;listToolFiles&lt;/code&gt;: Discover which servers and tools are accessible.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;readToolFile&lt;/code&gt;: Load Python function signatures for a specific server or tool.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getToolDocs&lt;/code&gt;: Pull detailed documentation for a specific tool before invoking it.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;executeToolCode&lt;/code&gt;: Run the orchestration script against live tool bindings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pattern is conceptually close to what &lt;a href="https://www.anthropic.com/engineering/code-execution-with-mcp" rel="noopener noreferrer"&gt;Anthropic's engineering team described for code execution with MCP&lt;/a&gt;, where a Google Drive to Salesforce workflow collapsed from 150,000 tokens to 2,000. Bifrost builds this approach directly into the gateway, picks Python over JavaScript for better LLM fluency, and adds the dedicated docs tool to compress context further. &lt;a href="https://blog.cloudflare.com/code-mode/" rel="noopener noreferrer"&gt;Cloudflare independently documented the same exponential savings pattern&lt;/a&gt; in their own evaluation.&lt;/p&gt;

&lt;p&gt;The savings compound as servers are added. Classic MCP charges for every tool definition on every request, so connecting more servers worsens the tax. Code Mode's cost is capped by what the model actually reads, not by how many tools happen to exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Virtual keys and tool groups: stop paying for access a consumer should not have
&lt;/h3&gt;

&lt;p&gt;Every request routed through Bifrost carries a virtual key. Each key is scoped to a defined set of tools, and scoping operates at the tool level rather than just the server level. A key can be granted &lt;code&gt;filesystem_read&lt;/code&gt; access without ever seeing &lt;code&gt;filesystem_write&lt;/code&gt; from the same MCP server. The model only encounters definitions for tools the key is allowed to use, so unauthorized tools cost exactly zero tokens.&lt;/p&gt;

&lt;p&gt;At organizational scale, &lt;a href="https://docs.getbifrost.ai/mcp/filtering" rel="noopener noreferrer"&gt;MCP Tool Groups&lt;/a&gt; push this further: a named bundle of tools can be attached to any mix of virtual keys, teams, customers, or providers. Bifrost resolves the correct set at request time with no database lookups, kept in memory and synchronized across cluster nodes. Broader &lt;a href="https://www.getmaxim.ai/bifrost/resources/governance" rel="noopener noreferrer"&gt;governance capabilities&lt;/a&gt; including RBAC, audit logs, and budget controls apply across the same gateway.&lt;/p&gt;

&lt;h3&gt;
  
  
  Centralized gateway: one connection, one audit trail
&lt;/h3&gt;

&lt;p&gt;Bifrost surfaces every connected MCP server through a single &lt;code&gt;/mcp&lt;/code&gt; endpoint. Claude Code connects once and discovers every tool across every MCP server the virtual key permits. Register a new MCP server in Bifrost and it shows up in Claude Code automatically, with zero changes on the client side.&lt;/p&gt;

&lt;p&gt;This matters for cost because it gives platform teams the visibility Claude Code's per-session tooling cannot. Every tool execution becomes a first-class log entry with tool name, server, arguments, result, latency, virtual key, and parent LLM request, plus token costs and per-tool costs whenever the tools call paid external APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Bifrost as Your MCP Gateway for Claude Code
&lt;/h2&gt;

&lt;p&gt;Going from a fresh Bifrost instance to Claude Code with Code Mode enabled takes only a few minutes. Bifrost runs as a &lt;a href="https://docs.getbifrost.ai/features/drop-in-replacement" rel="noopener noreferrer"&gt;drop-in replacement for existing SDKs&lt;/a&gt;, so no changes to application code are required.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Register MCP clients in Bifrost&lt;/strong&gt;: Go to the MCP section of the Bifrost dashboard and add each MCP server you want to expose, including connection type (HTTP, SSE, or STDIO), endpoint, and any required headers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turn on Code Mode&lt;/strong&gt;: Open the client settings and flip the Code Mode toggle. No schema rewrites, no redeployment. Token usage drops immediately as the four meta-tools take the place of full schema injection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure auto-execute and virtual keys&lt;/strong&gt;: In the &lt;a href="https://docs.getbifrost.ai/features/governance/virtual-keys" rel="noopener noreferrer"&gt;virtual keys&lt;/a&gt; section, create scoped credentials for each consumer and pick which tools each key can call. For autonomous agent loops, allow read-only tools to auto-execute while keeping write operations gated behind approval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Point Claude Code at Bifrost&lt;/strong&gt;: In Claude Code's MCP settings, add Bifrost as an MCP server using the gateway URL. Claude Code discovers every tool the virtual key permits through a single connection.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From that point forward, Claude Code sees a governed, token-efficient view of your MCP ecosystem, and every tool call is logged with complete cost attribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring the Impact on Your Team
&lt;/h2&gt;

&lt;p&gt;Cutting MCP token costs for Claude Code only matters if the impact is measurable. Bifrost's &lt;a href="https://docs.getbifrost.ai/features/observability" rel="noopener noreferrer"&gt;observability&lt;/a&gt; exposes the data that drives cost decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token cost broken out by virtual key, by tool, and by MCP server over time.&lt;/li&gt;
&lt;li&gt;End-to-end traces of every agent run: which tools fired, in what sequence, with what arguments, and at what latency.&lt;/li&gt;
&lt;li&gt;Spend breakdowns that put LLM token costs and tool costs side by side, revealing the complete cost of every agent workflow.&lt;/li&gt;
&lt;li&gt;Native Prometheus metrics and &lt;a href="https://docs.getbifrost.ai/features/telemetry" rel="noopener noreferrer"&gt;OpenTelemetry (OTLP)&lt;/a&gt; export for Grafana, New Relic, Honeycomb, and Datadog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teams assessing the cost impact at their own scale can cross-reference &lt;a href="https://www.getmaxim.ai/bifrost/resources/benchmarks" rel="noopener noreferrer"&gt;Bifrost's published performance benchmarks&lt;/a&gt;, which record 11 microseconds of overhead at 5,000 requests per second, and consult the &lt;a href="https://www.getmaxim.ai/bifrost/resources/buyers-guide" rel="noopener noreferrer"&gt;LLM Gateway Buyer's Guide&lt;/a&gt; for a full capability comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Token Costs: The Production MCP Stack
&lt;/h2&gt;

&lt;p&gt;MCP without governance and cost control becomes unworkable the moment you move past one developer's local setup. Bifrost's MCP gateway covers the full set of production concerns in one layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scoped access through virtual keys and per-tool filtering.&lt;/li&gt;
&lt;li&gt;Organization-wide governance via MCP Tool Groups.&lt;/li&gt;
&lt;li&gt;Complete audit trails for every tool call, suitable for SOC 2, GDPR, HIPAA, and ISO 27001.&lt;/li&gt;
&lt;li&gt;Per-tool cost visibility alongside LLM token spend.&lt;/li&gt;
&lt;li&gt;Code Mode to trim context cost without trimming capability.&lt;/li&gt;
&lt;li&gt;The same gateway that governs MCP traffic also handles LLM provider routing, &lt;a href="https://docs.getbifrost.ai/features/fallbacks" rel="noopener noreferrer"&gt;automatic failover&lt;/a&gt;, load balancing, &lt;a href="https://docs.getbifrost.ai/features/semantic-caching" rel="noopener noreferrer"&gt;semantic caching&lt;/a&gt;, and unified key management across 20+ AI providers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When LLM calls and tool calls both flow through one gateway, model tokens and tool costs sit in one audit log under one access control model. That is the infrastructure pattern production AI systems actually require. Teams already using Claude Code with Bifrost can review the &lt;a href="https://www.getmaxim.ai/bifrost/resources/claude-code" rel="noopener noreferrer"&gt;Claude Code integration guide&lt;/a&gt; for implementation specifics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Reducing MCP Token Costs for Claude Code
&lt;/h2&gt;

&lt;p&gt;Reducing MCP token costs for Claude Code is not about cutting tools or settling for less capability. It is about moving tool governance and orchestration down into the infrastructure layer where they belong. Bifrost's MCP gateway and Code Mode cut token usage by up to 92% on large tool catalogs while strengthening access control and handing platform teams the cost visibility they need to run Claude Code at scale.&lt;/p&gt;

&lt;p&gt;To see what Bifrost can do for your team's Claude Code token bill while giving you production-grade MCP governance, &lt;a href="https://getmaxim.ai/bifrost/book-a-demo" rel="noopener noreferrer"&gt;book a demo&lt;/a&gt; with the Bifrost team.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Top 5 AI Gateways for Seamless Integration of OpenAI GPT Models in Enterprise</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Tue, 07 Apr 2026 19:49:20 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/top-5-ai-gateways-for-seamless-integration-of-openai-gpt-models-in-enterprise-12im</link>
      <guid>https://dev.to/kuldeep_paul/top-5-ai-gateways-for-seamless-integration-of-openai-gpt-models-in-enterprise-12im</guid>
      <description>&lt;p&gt;Enterprise adoption of OpenAI's GPT models has reached a critical inflection point. The usage of structured workflows such as Projects and Custom GPTs has increased 19× year-to-date, showing a shift from casual querying to integrated, repeatable processes, with organizations now leveraging GPT across production systems at scale. However, integrating OpenAI's APIs directly into applications without a centralized control layer creates substantial operational, financial, and governance risks.&lt;/p&gt;

&lt;p&gt;In 2025, AI adoption reached a tipping point, with around 78% of organizations already using AI in at least one business function, and roughly 71% leveraging generative AI in their daily operations. Yet despite this widespread adoption, most enterprises lack the infrastructure to manage multiple models, enforce consistent governance, control costs, and maintain observability across distributed teams. This is where AI gateways become essential—a unified control plane that transforms how enterprises govern, secure, and optimize access to OpenAI's models and other LLM providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Enterprise AI Gateways Matter for OpenAI GPT Integration
&lt;/h2&gt;

&lt;p&gt;An AI gateway sits between your applications and model providers, transforming direct API calls into a managed, monitored, and governed experience. Rather than calling OpenAI directly from each application, teams route traffic through a centralized gateway that provides multiple critical capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Control and Budget Management&lt;/strong&gt;: Aggregate spending on AI APIs surpassed billions of dollars in 2025, with many organizations discovering that their actual bills far exceeded initial estimates. Without proper controls, a single poorly scoped agent loop or misconfigured API key can consume an entire quarterly budget in hours. Enterprise-grade gateways provide hierarchical cost controls at the team, project, and customer level, enabling precise cost allocation and preventing runaway expenses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliability and Failover&lt;/strong&gt;: Production AI applications cannot afford downtime when a single provider experiences outages. Gateways enable automatic failover between providers or models, ensuring requests are rerouted seamlessly to alternative endpoints without user-facing disruptions. This reliability is critical for mission-critical applications in finance, healthcare, and customer support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance and Compliance&lt;/strong&gt;: Regulatory pressure is increasing globally, with enterprises needing centralized logs, full traceability, and policy enforcement at the infrastructure layer. Gateways enforce compliance requirements as executable rules rather than manual processes, enabling organizations to demonstrate control through comprehensive audit trails and automated policy enforcement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability and Performance&lt;/strong&gt;: Direct API integrations scatter observability across applications, making it difficult to track model performance, identify bottlenecks, or correlate usage with business outcomes. Gateways provide unified visibility into latency, token consumption, cost, and quality metrics across all LLM calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Leading AI Gateways for OpenAI GPT Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Bifrost
&lt;/h3&gt;

&lt;p&gt;Bifrost is a purpose-built LLM gateway designed for enterprises deploying OpenAI GPT models alongside multiple providers. As a purpose-built AI-native gateway, Bifrost currently defines the enterprise benchmark for performance, governance depth, and integrated observability.&lt;/p&gt;

&lt;p&gt;Bifrost's architecture is optimized for zero-overhead integration. Teams can replace their OpenAI client library with Bifrost's OpenAI-compatible API in a single line of code, with no application refactoring required. This drop-in replacement capability eliminates migration friction and enables rapid deployment across existing systems.&lt;/p&gt;

&lt;p&gt;The platform unifies access to 12+ providers—OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, Cohere, Mistral, Ollama, Groq, and others—through a single interface. This abstraction is critical for enterprises evaluating multiple models or implementing multi-vendor strategies without creating vendor lock-in. Automatic failover ensures that if OpenAI experiences rate limits or outages, requests transparently route to alternative providers with no application changes.&lt;/p&gt;

&lt;p&gt;Bifrost's governance layer provides hierarchical cost controls at multiple levels: virtual API keys for different teams, fine-grained rate limiting, usage quotas per project, and customer-level budgeting. This enables organizations to safely delegate API access to distributed teams while maintaining centralized financial oversight. Native integration with HashiCorp Vault ensures API keys are securely managed and rotated automatically.&lt;/p&gt;

&lt;p&gt;The platform's semantic caching layer reduces both cost and latency. By analyzing request semantics rather than exact string matching, Bifrost caches responses to conceptually similar queries, delivering cached results when appropriate and reducing token consumption to OpenAI's APIs. For organizations processing high volumes of similar requests—common in customer support, RAG systems, and data analysis—semantic caching can reduce costs by 30-50%.&lt;/p&gt;

&lt;p&gt;Additional enterprise capabilities include Model Context Protocol (MCP) support, enabling GPT models to access external tools and data sources; distributed tracing for debugging complex AI workflows; and Prometheus metrics for production monitoring. See more: &lt;a href="https://www.getbifrost.ai" rel="noopener noreferrer"&gt;Bifrost AI Gateway&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LangSmith
&lt;/h3&gt;

&lt;p&gt;LangSmith, developed by the LangChain creators, provides a comprehensive prompt management and observability platform designed primarily for LangChain-based applications. The platform has processed over 15 billion traces and serves more than 300 enterprise customers.&lt;/p&gt;

&lt;p&gt;LangSmith excels at capturing the complete execution context of LLM calls, including intermediate steps, tool invocations, and metadata. This detailed tracing enables teams to inspect the exact prompt sent to OpenAI, the response received, and any downstream processing. The platform's prompt hub allows teams to version and manage prompts as first-class components, with the ability to test different versions against production datasets.&lt;/p&gt;

&lt;p&gt;For organizations deeply invested in LangChain, LangSmith's tight integration provides seamless workflow enhancement. However, the platform's architecture is optimized for LangChain ecosystems. Teams using other frameworks or building custom AI orchestration logic may find the integration less seamless and experience vendor lock-in concerns.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Langfuse
&lt;/h3&gt;

&lt;p&gt;Langfuse is an open-source platform supporting the full LLM application lifecycle: development, monitoring, evaluation, and debugging. Its open-source nature makes it attractive to organizations prioritizing flexibility and avoiding proprietary vendor lock-in.&lt;/p&gt;

&lt;p&gt;The platform provides prompt management capabilities including versioned registries and interactive playgrounds for testing prompt variations. Real-time monitoring dashboards surface key metrics including latency, token consumption, cost, and quality assessments. Langfuse supports both automated evaluation methods and human feedback collection, enabling teams to quantify improvements and track regressions.&lt;/p&gt;

&lt;p&gt;For teams with infrastructure expertise and the operational capacity to self-host, Langfuse provides excellent flexibility. However, maintaining an open-source deployment requires dedicated DevOps resources, infrastructure provisioning, and ongoing operational overhead that many enterprises prefer to avoid.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. APIGateway (Kong/APIGee alternative approach)
&lt;/h3&gt;

&lt;p&gt;Some enterprises repurpose traditional API gateways like Kong or Apigee for LLM traffic, adding custom middleware for OpenAI integration. This approach leverages existing API infrastructure investments but requires significant custom development to implement LLM-specific features like semantic caching, cost tracking, and provider failover.&lt;/p&gt;

&lt;p&gt;Traditional API gateways excel at HTTP routing and basic rate limiting but lack LLM-native capabilities. They do not understand token counting, semantic similarity for caching, or provider-specific configuration requirements. Organizations choosing this path typically invest engineering resources equivalent to building a custom gateway, with limited ability to leverage industry best practices or keep pace with evolving LLM provider APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. vLLM (Open Source Inference Engine)
&lt;/h3&gt;

&lt;p&gt;vLLM is an open-source inference engine optimized for serving large language models efficiently. While primarily designed for hosting self-hosted models rather than managing provider APIs, some organizations deploy vLLM to serve cached responses and reduce dependency on external APIs.&lt;/p&gt;

&lt;p&gt;vLLM provides exceptional throughput and low-latency inference for self-hosted deployments, achieving up to 24× higher throughput than standard transformers. However, it does not provide the governance, cost management, or multi-provider orchestration capabilities that enterprise applications require. vLLM is best suited as a component within a larger gateway architecture, not as a standalone enterprise solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Critical Capabilities for Enterprise AI Gateways
&lt;/h2&gt;

&lt;p&gt;When evaluating gateways for OpenAI GPT integration, assess these core dimensions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency and Performance&lt;/strong&gt;: Gateway overhead directly impacts application responsiveness. For real-time AI applications, copilots, chat interfaces, and agentic workflows, gateway latency compounds quickly under sustained traffic, making ultra-low overhead architectures a measurable difference at scale. Measure end-to-end latency—the time from application request to final response—not just gateway processing time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Management Sophistication&lt;/strong&gt;: Simple rate limiting is insufficient. Enterprise gateways must provide hierarchical cost controls, token-level granularity, customer-level budgeting, and the ability to allocate costs across departments or business units. Teams need visibility into actual spend versus budget and the ability to enforce limits before costs spiral.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Provider Flexibility&lt;/strong&gt;: The ability to route requests across multiple providers—OpenAI, Anthropic, Azure, Bedrock—without code changes is critical for reducing vendor lock-in and implementing failover strategies. Evaluate whether the gateway supports provider-agnostic configurations and automatic request translation across provider APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance and Auditability&lt;/strong&gt;: Regulatory requirements demand comprehensive audit trails, data residency controls, and encryption at rest and in transit. For regulated industries (financial services, healthcare, legal), ensure the gateway provides SOC 2 Type II compliance, GDPR support, and the ability to enforce data residency policies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developer Experience&lt;/strong&gt;: Gateways should integrate seamlessly with existing SDKs and frameworks. Zero-configuration startup, drop-in replacement APIs, and minimal code changes reduce friction during deployment and lower the risk of integration errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Choosing the Right Gateway
&lt;/h2&gt;

&lt;p&gt;The choice of AI gateway determines whether your organization can scale OpenAI GPT integration safely and profitably. In 2026, the most mature AI organizations will not be those that simply use AI, but those that govern, secure, and optimize it through a centralized, intelligent gateway layer.&lt;/p&gt;

&lt;p&gt;For enterprises prioritizing zero-friction integration, comprehensive cost management, multi-provider flexibility, and enterprise-grade governance without operational overhead, Bifrost delivers the full stack of capabilities required for production-scale OpenAI GPT deployments. For teams already invested in LangChain ecosystems, LangSmith provides tight integration at the cost of some flexibility. For organizations with strong infrastructure teams preferring open-source solutions, Langfuse offers excellent flexibility with the trade-off of operational complexity.&lt;/p&gt;

&lt;p&gt;The time to implement a centralized AI gateway is now—before costs spiral, governance becomes fragmented, and operational complexity outpaces your team's capacity to manage it. Start evaluating your options, assess your organization's architectural requirements, and implement the gateway that enables safe, profitable, and compliant AI integration at scale.&lt;/p&gt;

&lt;p&gt;Ready to unify your OpenAI GPT integration with enterprise-grade governance and observability? &lt;a href="https://getmaxim.ai/demo" rel="noopener noreferrer"&gt;Book a demo with Maxim AI&lt;/a&gt; to see how Bifrost and Maxim's evaluation platform work together to deliver reliable AI applications. Or &lt;a href="https://app.getmaxim.ai/sign-up" rel="noopener noreferrer"&gt;get started free&lt;/a&gt; to begin managing your AI gateway and evaluation workflows today.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Top 5 MCP Gateways for Secure AI Agent Access and Tool Provisioning</title>
      <dc:creator>Kuldeep Paul</dc:creator>
      <pubDate>Tue, 07 Apr 2026 19:48:48 +0000</pubDate>
      <link>https://dev.to/kuldeep_paul/top-5-mcp-gateways-for-secure-ai-agent-access-and-tool-provisioning-217p</link>
      <guid>https://dev.to/kuldeep_paul/top-5-mcp-gateways-for-secure-ai-agent-access-and-tool-provisioning-217p</guid>
      <description>&lt;h2&gt;
  
  
  Understanding the MCP Gateway Challenge
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol emerged as an open standard in November 2024, providing a universal interface for AI systems to integrate with data sources and tools. Unlike proprietary alternatives such as OpenAI's Function Calling or Assistants API, MCP offered the promise of vendor-neutral standardization for agent-to-tool communication.&lt;/p&gt;

&lt;p&gt;However, early production deployments revealed a critical gap. While the MCP specification focused on protocol mechanics, it did not prescribe infrastructure patterns for managing multiple servers at scale without centralization. Teams deploying dozens of MCP servers directly to AI agents discovered that this decentralized model created three compounding problems: authentication fragmentation, security governance blind spots, and operational chaos at scale.&lt;/p&gt;

&lt;p&gt;An MCP gateway addresses these challenges by acting as a single, secure front door that abstracts multiple Model Context Protocol servers behind one endpoint, providing a reverse proxy and management layer that handles authentication, routing, and policy enforcement. The result is unified governance, centralized security enforcement, and production-grade reliability for AI agent tool access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Gateways Are Critical for Production Security
&lt;/h2&gt;

&lt;p&gt;The stakes of unsecured MCP deployments are significant. The Model Context Protocol enables powerful capabilities through arbitrary data access and code execution paths, requiring implementors to carefully address security and trust considerations.&lt;/p&gt;

&lt;p&gt;Without a gateway, three categories of threats proliferate:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token Passthrough Attacks.&lt;/strong&gt; If an MCP client holds a user's high-privilege OAuth token and connects to a malicious or compromised MCP server, an attacker could trick the client into sending that token to an external endpoint or using it to modify resources without explicit user intent. A gateway enforces token audience-binding so that credentials issued for one server are cryptographically unusable by another, preventing lateral movement across compromised tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Poisoning.&lt;/strong&gt; The April 2025 security backlash highlighted dangers of tool poisoning and tool mimicry, where attackers create fake tools that mimic legitimate ones. A governance-forward gateway maintains an allowlist of approved tools and returns explicit failure responses when agents attempt to access unapproved endpoints, preventing silent data leakage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Exfiltration Through Tool Responses.&lt;/strong&gt; AI agents handling sensitive customer data can inadvertently leak information through tool outputs. Gateways that intercept all data flows between agents and MCP servers enable inspection and transformation, detecting and redacting personally identifiable information before data reaches agents and blocking secrets from being sent to MCP tools.&lt;/p&gt;

&lt;p&gt;A centralized gateway architecture shifts the security burden from individual users to centralized security administrators, ensuring consistent policy application across the organization regardless of which AI agents or MCP servers are used.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Bifrost: Developer-Optimized MCP Gateway with Production Reliability
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://getbifrost.ai" rel="noopener noreferrer"&gt;Bifrost&lt;/a&gt; stands as the leading MCP gateway solution, combining developer-first design with enterprise-grade security and governance. As Maxim AI's open-source AI gateway, Bifrost extends beyond MCP to provide unified access to 12+ LLM providers while managing tool provisioning through comprehensive MCP support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core MCP Capabilities&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bifrost leads the MCP gateway market with sub-3ms latency, built-in tool registry, and seamless integration capabilities. The &lt;a href="https://docs.getbifrost.ai/features/mcp" rel="noopener noreferrer"&gt;MCP integration&lt;/a&gt; enables AI models to interact with external tools including filesystem access, web search, and database queries, all managed through the gateway's unified policy framework.&lt;/p&gt;

&lt;p&gt;Bifrost's tool provisioning model balances flexibility with security. Teams configure tool access through the gateway's &lt;a href="https://docs.getbifrost.ai/features/governance" rel="noopener noreferrer"&gt;governance layer&lt;/a&gt;, enabling hierarchical budget management, team-based access control, and granular usage tracking per tool and agent. This approach allows organizations to approve new tools without custom code deployment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security and Governance at Scale&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bifrost implements comprehensive controls addressing the three threat categories outlined above. Token management is handled transparently through the gateway, preventing passthrough attacks. Tool access is controlled via allowlists with real-time monitoring, and all data flows through Bifrost's policy engine for inspection and filtering.&lt;/p&gt;

&lt;p&gt;The platform provides &lt;a href="https://docs.getbifrost.ai/features/observability" rel="noopener noreferrer"&gt;native observability&lt;/a&gt; including Prometheus metrics and distributed tracing, enabling security teams to audit every tool invocation, monitor for anomalous patterns, and attribute costs to specific agents and tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment Flexibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bifrost supports multiple deployment patterns: Docker containers for self-managed infrastructure, Kubernetes for enterprise scale, and &lt;a href="https://getbifrost.ai/cloud" rel="noopener noreferrer"&gt;Bifrost Cloud&lt;/a&gt; for fully managed deployments with automated scaling. This flexibility ensures organizations can standardize on Bifrost regardless of infrastructure preferences.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Lasso Security: Purpose-Built for AI Agent Threat Detection
&lt;/h2&gt;

&lt;p&gt;Lasso Security, recognized as a 2024 Gartner Cool Vendor for AI Security, focuses on the "invisible agent" problem, prioritizing security monitoring and threat detection over raw performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specialized Security Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The plugin-based architecture enables real-time security scanning, token masking, and AI safety guardrails, allowing organizations to add security capabilities incrementally rather than adopting an all-or-nothing approach.&lt;/p&gt;

&lt;p&gt;Lasso's differentiator lies in tool reputation analysis. The system tracks and scores MCP servers based on behavior patterns, code analysis, and community feedback, addressing supply chain security concerns that many organizations cite as their primary barrier to MCP adoption. Real-time threat detection monitors for jailbreaks, unauthorized access patterns, and data exfiltration attempts using AI agent-specific behavioral analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Case Fit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lasso is optimal for organizations where threat modeling and intrusion detection are primary concerns. If your deployment prioritizes security monitoring above operational simplicity, Lasso's specialized capabilities justify the architectural trade-off of additional complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Amazon Bedrock AgentCore Gateway: Managed Service with Semantic Tool Discovery
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock AgentCore Gateway provides a fully managed service that enables organizations to convert APIs, Lambda functions, and existing services into MCP-compatible tools with zero-code tool creation from OpenAPI specifications and Smithy models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated Complexity Reduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Translation capability converts agent requests using protocols like MCP into API requests and Lambda invocations, eliminating the need to manage protocol integration or version support, while composition combines multiple APIs and functions into a single MCP endpoint.&lt;/p&gt;

&lt;p&gt;AgentCore Gateway automatically provisions semantic search capabilities, enabling intelligent tool discovery through natural language queries rather than requiring agents to enumerate available tools. For organizations with hundreds of tools, this semantic approach dramatically improves agent decision-making and reduces prompt overhead.&lt;/p&gt;

&lt;p&gt;Gateway provides both comprehensive ingress authentication and egress authentication in a fully-managed service, with one-click integration for popular tools such as Salesforce, Slack, Jira, Asana, and Zendesk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS-Native Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AgentCore Gateway is the clear choice for organizations standardized on AWS infrastructure. Tight integration with IAM, VPC, CloudWatch, and Lambda eliminates external authentication complexity. However, if your architecture spans multiple cloud providers or requires on-premises MCP server access, AgentCore's AWS-specific constraints become limiting.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. IBM Context Forge: Federation for Complex Enterprise Environments
&lt;/h2&gt;

&lt;p&gt;IBM's Context Forge represents the most architecturally ambitious approach in the market, with auto-discovery via mDNS, health monitoring, and capability merging enabling deployments where multiple gateways work together seamlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Federation and Composition&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For very large organizations with complex infrastructure spanning multiple environments, the federation model solves real operational problems by enabling virtual server composition where teams combine multiple MCP servers into single logical endpoints, simplifying agent interactions while maintaining backend flexibility.&lt;/p&gt;

&lt;p&gt;Flexible authentication supports JWT Bearer tokens, Basic Auth, and custom header schemes with AES encryption for tool credentials, accommodating heterogeneous security requirements across enterprise environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Caveat&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The explicit disclaimer about lack of official IBM support creates adoption friction for enterprise customers, requiring careful evaluation of support SLAs and maintenance commitments.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. TrueFoundry: Unified AI Infrastructure with MCP Integration
&lt;/h2&gt;

&lt;p&gt;TrueFoundry provides MCP gateway capabilities as part of a broader unified AI infrastructure management platform. For organizations building comprehensive AI stacks spanning model deployment, prompt management, and observability, TrueFoundry offers integrated MCP tool provisioning within a unified control plane.&lt;/p&gt;

&lt;p&gt;TrueFoundry is particularly valuable for teams already standardizing on the platform who require MCP capabilities without introducing additional tools. However, if MCP gateway simplicity is your primary concern, single-purpose solutions may offer better developer experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Selecting the Right MCP Gateway: Decision Framework
&lt;/h2&gt;

&lt;p&gt;Your MCP gateway choice depends on four dimensions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Requirements.&lt;/strong&gt; If threat detection and behavioral monitoring are non-negotiable, Lasso Security's specialized architecture justifies additional complexity. For standard governance needs, Bifrost's built-in controls are sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud Infrastructure.&lt;/strong&gt; AWS-native organizations benefit from Bedrock AgentCore's managed service approach and direct IAM integration. Multi-cloud or on-premises deployments require Bifrost or other provider-agnostic solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational Scale.&lt;/strong&gt; Organizations managing hundreds of tools across multiple environments benefit from federation capabilities like IBM Context Forge. Smaller deployments are well-served by simpler architectures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Developer Experience.&lt;/strong&gt; Bifrost's &lt;a href="https://docs.getbifrost.ai/features/drop-in-replacement" rel="noopener noreferrer"&gt;drop-in replacement model&lt;/a&gt; for OpenAI and Anthropic APIs, combined with zero-configuration startup, enables rapid deployment. Other solutions require greater setup effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Considerations for Secure MCP Deployment
&lt;/h2&gt;

&lt;p&gt;Regardless of which gateway you select, three implementation patterns emerge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Discovery and Governance.&lt;/strong&gt; Implement semantic tool discovery so agents can identify appropriate tools without explicit prompting. Require explicit approval workflows for new tools, preventing supply chain attacks through malicious tool injection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential and Token Management.&lt;/strong&gt; Never pass user credentials directly to MCP servers. Use the gateway to manage audience-bound tokens, ensuring that credentials issued for one tool are unusable by others. Implement token rotation policies to limit blast radius of compromised credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability and Anomaly Detection.&lt;/strong&gt; Log every tool invocation with agent context, tool name, arguments, and response. Monitor for anomalous patterns such as unusual tool combinations, unexpected data access patterns, or repeated failures to invoke specific tools. Use these logs to inform security policies and detect early indicators of compromise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moving Forward: Building Trustworthy AI Agent Ecosystems
&lt;/h2&gt;

&lt;p&gt;The rapid adoption of MCP across OpenAI, Google DeepMind, and enterprise platforms validates the protocol's architectural value. However, security researchers have identified multiple outstanding security issues with MCP including prompt injection and tool permissions that allow unauthorized access, reinforcing that gateway-level security controls are essential for production deployments.&lt;/p&gt;

&lt;p&gt;Organizations treating the MCP gateway as core infrastructure rather than an afterthought achieve both operational simplicity and security assurance. The gateway becomes your control plane for trusted tool access, enabling confident deployment of AI agents across your organization.&lt;/p&gt;

&lt;p&gt;To explore how Bifrost and &lt;a href="https://getmaxim.ai/products/agent-simulation-evaluation" rel="noopener noreferrer"&gt;Maxim's evaluation platform&lt;/a&gt; work together to ensure reliable AI agent behavior before and after tool access is provisioned, &lt;a href="https://getmaxim.ai/demo" rel="noopener noreferrer"&gt;schedule a demo with our team&lt;/a&gt;. We'll walk through real-world tool governance patterns, security controls that prevent data exfiltration, and evaluation strategies that confirm agents use approved tools correctly.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
