Hadil Ben Abdallah

Posted on Mar 20

How to Run MCP Servers in Production (Security, Scaling & Governance for AI Tooling)

#ai #llm #backend #opensource

Over the past year, MCP servers have quickly become one of the most important building blocks in modern AI systems.

Instead of limiting LLMs to static prompts, MCP (Model Context Protocol) allows models to interact with external tools and services in a structured way. That means agents can query databases, read repositories, call APIs, or trigger internal workflows while reasoning through a task.

At small scale, setting up MCP servers is surprisingly simple. You connect a tool, expose its schema, and the model can start using it almost immediately.

But once MCP tooling moves into production environments, the architecture starts to matter.

Questions appear quickly:

Who controls which tools an agent can access?
How do you prevent accidental access to sensitive systems?
How do you monitor tool usage across teams?
How do you enforce budgets and rate limits?

In other words, MCP servers introduce a new infrastructure layer: AI tooling infrastructure.

This article explores how teams are running MCP servers in production, and why introducing an AI gateway like Bifrost becomes essential for securing, scaling, and governing AI tooling.

What MCP Servers Actually Do

Before diving into infrastructure concerns, it’s helpful to understand what MCP servers provide.

The Model Context Protocol (MCP) allows LLMs to access structured tools in a standardized way. Instead of writing custom integrations for every API, developers expose tools through MCP servers that describe:

available actions
input parameters
expected outputs
permissions or constraints

For example, an MCP server might expose tools like:

search_docs
query_database
create_github_issue
deploy_service

When an AI agent reasons about a task, it can decide whether to call one of these tools.

In practice, this transforms LLMs from simple chat interfaces into autonomous problem-solving systems capable of interacting with real infrastructure.

The MCP Architecture Most Developers Start With

When developers experiment with MCP, they usually connect servers directly to the agent environment.

The architecture usually looks something like this in early MCP experiments:

AI agent connected directly to multiple MCP servers including database tools, GitHub tools, search tools, and internal APIs in a Model Context Protocol architecture — Direct MCP architecture where an AI agent connects to multiple MCP servers.

At first, this approach works well.

The agent can access multiple tools, and each MCP server handles its own logic.

However, once the number of tools grows, several problems begin to appear:

Tool permissions become difficult to manage
Logging is fragmented across servers
Security policies are inconsistent
Teams configure tools differently

For a solo developer this may be manageable.
For a shared AI platform, it quickly becomes fragile.

The Hidden Problem: Uncontrolled AI Tool Access

The biggest challenge with MCP servers isn’t connecting tools. It’s controlling how those tools are used.

Imagine an engineering organization running multiple AI agents across development environments.

Each agent might have access to tools capable of:

querying internal databases
modifying infrastructure
triggering deployments
reading proprietary data

Without centralized governance, several risks emerge.

1. Security Risks

If MCP tools are configured locally, developers may accidentally expose sensitive systems.

For example:

production databases
internal APIs
CI/CD pipelines

Once the model knows a tool exists, it may attempt to use it while reasoning.

That makes permission management essential.

2. Cost and Resource Usage

Many tools invoked by AI agents generate external costs.

Examples include:

LLM API calls
database queries
search indexes
compute-heavy workflows

Without centralized governance, it becomes difficult to answer questions like:

Which team is generating the most AI cost?
Which tools are used most frequently?
Which agents trigger expensive operations?

3. Observability Challenges

Debugging AI systems is already difficult.

When tool calls are distributed across multiple MCP servers, tracing a single request can become nearly impossible.

You may need to inspect logs across:

the agent
each MCP server
external APIs

A centralized logging layer simplifies this dramatically.

Introducing the MCP Gateway Layer

This is where the concept of an MCP gateway becomes useful.

Instead of connecting AI agents directly to every MCP server, requests pass through a centralized control layer.

MCP gateway architecture routing AI agent tool requests to database tools, GitHub tools, search tools, and internal APIs — MCP gateway architecture where a centralized gateway routes AI agent requests to multiple MCP tool servers.

The gateway becomes responsible for:

tool discovery
permission enforcement
logging and observability
rate limiting
request validation
budget control

From the agent’s perspective, nothing changes.
But the infrastructure becomes far easier to manage.

Running MCP Infrastructure with Bifrost

This is exactly where Bifrost AI gateway fits into the architecture.

Bifrost is an open-source AI gateway designed for production LLM workloads. In addition to handling multi-provider model routing, it also supports MCP tooling infrastructure.

Instead of exposing MCP servers directly to agents, teams can route all traffic through Bifrost.

AI gateway architecture using Bifrost routing AI agent requests to multiple MCP servers including database, GitHub, search, and internal API tools — AI gateway architecture where Bifrost routes AI agent requests to multiple MCP servers.

This architecture allows Bifrost to act as a central control plane for AI tooling.

Key capabilities include:

MCP tool discovery and routing
access control and permission policies
cost tracking for tool usage
centralized observability
rate limiting and quotas

Because these policies live in infrastructure rather than local configuration, governance becomes consistent across teams.

Centralized Governance with Virtual Keys

When AI agents gain access to external tools, governance quickly becomes a critical concern.

Unlike simple API calls, MCP tools can trigger real actions: querying databases, interacting with internal APIs, or modifying resources. Without proper controls, a misconfigured agent could easily execute operations it should never have access to.

A practical solution is to move governance away from individual applications and enforce it at the infrastructure layer.

Bifrost approaches this problem through Virtual Keys, which act as programmable access policies for AI workloads.

Instead of distributing unrestricted API keys to every agent or service, teams issue Virtual Keys that define exactly what an AI workload is allowed to do.

A single Virtual Key can enforce rules such as:

maximum monthly spending for LLM usage
request throughput limits
allowed or restricted model providers
access permissions for specific MCP tools

For example, an organization might create separate keys for different teams or environments.

A development key might allow access to internal documentation tools and staging APIs, while production automation keys might be restricted to read-only tools.

Requests routed through Bifrost include the Virtual Key header:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-engineering-main" \
  -d '{ ... }'

Before the request reaches a model provider or MCP server, the gateway evaluates the policy associated with that key.

If the request violates the defined constraints, it is rejected immediately.

This approach shifts security decisions from application logic to infrastructure policy, which is significantly easier to audit and maintain as systems grow.

Observability for AI Tooling

Another major advantage of routing MCP traffic through a gateway is observability.

Every request passing through Bifrost can capture metadata such as:

prompt inputs
tool calls triggered by the model
selected model provider
token usage
latency
cost
error messages

Logs are accessible through the built-in dashboard:

http://localhost:8080/logs

This provides a unified view of how AI agents interact with tools across the entire system.

Instead of debugging across multiple services, engineers can trace requests directly from the gateway layer.

Performance and Scalability

Once MCP tooling becomes part of a production system, the number of interactions between models and tools increases rapidly.

Agents might query multiple services during a single task: retrieving documentation, inspecting logs, and triggering automation pipelines. In larger environments, this traffic can grow to thousands of tool calls per minute.

At that scale, infrastructure design matters.

Bifrost is implemented in Go and built specifically for high-concurrency workloads. The gateway is capable of handling large volumes of AI requests while simultaneously performing several critical operations:

request routing to different model providers
enforcement of policy rules
logging of tool interactions
tracking token consumption and cost

Because these responsibilities are centralized in the gateway layer, individual applications remain lightweight. They simply send requests to a single endpoint while the gateway handles the operational complexity behind the scenes.

In practice, this design allows organizations to support:

large fleets of AI agents
multiple MCP tool servers
shared AI infrastructure across teams

without turning the application layer into a tangled web of integrations.

How MCP Servers Changed the Way I Build AI Workflows

When I first started experimenting with MCP servers, the setup was surprisingly simple. I connected an AI agent to a few MCP tools, exposed their schemas, and the model could immediately start using them to search documentation, query APIs, or inspect code.

At first, everything worked smoothly.

But as I added more tools and MCP servers, the system started becoming harder to manage. Understanding which tool an agent had used, tracking logs, and debugging tool failures often meant jumping between multiple services.

The complexity didn’t come from the tools themselves, but from how they were connected.

Introducing a gateway layer in front of the MCP servers made a big difference. Instead of agents connecting directly to every tool, all requests passed through a single control point.

That small architectural change made the system much easier to operate.

Tool usage became visible in one place, policies could be applied consistently, and adding new tools no longer required updating multiple integrations.

For small experiments, direct MCP integrations work fine. But once MCP servers start powering real workflows, treating them as AI infrastructure rather than just tools makes everything significantly easier to manage.

When You Actually Need MCP Governance

Not every project requires a full MCP governance layer.

For prototypes or personal projects, direct MCP connections are usually sufficient.

However, once AI systems interact with real infrastructure, centralized governance becomes important.

You’ll likely benefit from an MCP gateway when:

multiple MCP servers are deployed
AI agents interact with production systems
teams share AI tooling infrastructure
security policies must be enforced
cost control becomes necessary

At that point, treating MCP as infrastructure rather than tooling becomes the right architectural decision.

Final Thoughts

MCP servers are transforming how AI systems interact with software environments.

Instead of limiting models to static prompts, developers can give them access to tools capable of retrieving data, triggering workflows, and modifying systems.

That power comes with new operational challenges.

As the number of tools grows, teams need ways to control access, track usage, and enforce governance policies.

Introducing an MCP gateway layer solves many of these challenges by centralizing security, observability, and cost control.

If you're interested in how organizations build full multi-provider AI infrastructure around gateway architectures, I explored that topic in more detail in this guide:
How to Build a Multi-Provider LLM Infrastructure with an AI Gateway (OpenAI, Claude, Azure & Vertex).

For teams building production AI platforms, Bifrost AI gateway provides the infrastructure necessary to run MCP tooling safely at scale.

Thanks for reading! 🙏🏻 I hope you found this useful ✅ Please react and follow for more 😍 Made with 💙 by Hadil Ben Abdallah

Hadil Ben Abdallah

Software Engineer • Technical Writer (300K+ readers & 20K+ followers) • Trusted by 10+ companies I turn brands into websites people 💙 to use

Top comments (13)

Ben Abdallah Hanadi • Mar 20

This really captures the shift from “cool demo” to real infrastructure. MCP feels simple at first, but the moment you scale, governance becomes unavoidable.
The gateway layer insight is spot on.

Hadil Ben Abdallah • Mar 20

Exactly. That transition is where most of the real challenges start.

MCP makes it feel easy in the beginning, which is why the infrastructure side often gets overlooked. But once it touches real systems, governance isn’t optional anymore.

Glad the gateway layer insight resonated 😍

Ashita Prasad • Mar 21

Nice article.
I too have been exploring MCP servers in the context of MCP apps where the agent renders interactive UI inside the chat.
You can also check out my article here - dev.to/ashita/a-practical-guide-to...

Hadil Ben Abdallah • Mar 21

Thanks; I really appreciate it!

That’s a super interesting angle; MCP + interactive UI inside chat opens up a whole new layer of possibilities. I’ll check out your article

Aida Said • Mar 20

Loved how you broke down the hidden complexity behind MCP setups. It’s not about connecting tools; it’s about controlling them. The Bifrost angle makes the whole architecture feel much more practical.
Thank you for sharing 👏🏻

Hadil Ben Abdallah • Mar 21

Really appreciate that; glad it landed.

That’s exactly the shift I was aiming to highlight: the complexity isn’t in plugging tools in; it’s in controlling how they’re used once things scale.

Happy the Bifrost angle made it feel more concrete 🔥

klement Gunndu • Mar 22

The tool permission sprawl you describe is real — even tool-level access control isn't granular enough once agents start chaining multiple MCP calls in a single reasoning step. Have you seen teams implementing per-chain audit trails rather than per-tool?

Hadil Ben Abdallah • Mar 22

That’s a great point; tool-level control starts to fall short pretty quickly once chains get involved.

Yeah, I’ve seen some teams move toward per-chain tracing/auditing, especially to capture the full reasoning path instead of isolated calls. It makes debugging and accountability way clearer.

Feels like that’s where observability is heading next.

Dev Monster • Mar 21

Super relatable progression from experimentation to production pain. The idea of treating MCP as infrastructure, not just tooling, is something a lot of teams realize too late.
Great clarity here 👍🏻

Hadil Ben Abdallah • Mar 21

Really appreciate that; glad it resonated.

That shift tends to happen a bit “too late” for a lot of teams 😅
Once MCP is treated as infrastructure, everything starts to make a lot more sense.

Yurii Cherkasov • May 26 • Edited

What I like about the article is that it positions MCP not as a developer convenience, but as an infrastructure entity.

The first wave of MCP adoption naturally was like "let's connect the agent to more tools".
But the second wave is more about control planes and secure use: who can call what, from which environment, with which identity, under which budget, and with what audit trail.

In that sense, MCP gateways may become for AI tools what API gateways became for microservices. Not because every small project needs one, but because unmanaged, insecure tool access does not scale well

SidClaw • Apr 2

good breakdown of the gateway layer. one distinction worth drawing: the governance here is mostly access control -- which tools an agent can use, rate limits, cost budgets. that's necessary but it stops at the boundary.

the gap i keep running into is action-level governance. not "can this agent call this tool" but "should this specific call, with this specific payload, execute right now?" a DELETE query against a production database is syntactically the same as a SELECT. the gateway sees both as valid tool calls. but one of them probably shouldn't execute without someone reviewing the actual query first.

klement's point about per-chain audit trails is exactly right too. once agents chain 3-4 tool calls in a single reasoning step, per-tool logs don't capture the decision path. you need the full chain -- what the agent was trying to do, which tools it called, and why it decided on the next step.