DEV Community

Cover image for How to Run MCP Servers in Production (Security, Scaling & Governance for AI Tooling)
Hadil Ben Abdallah
Hadil Ben Abdallah

Posted on

How to Run MCP Servers in Production (Security, Scaling & Governance for AI Tooling)

Over the past year, MCP servers have quickly become one of the most important building blocks in modern AI systems.

Instead of limiting LLMs to static prompts, MCP (Model Context Protocol) allows models to interact with external tools and services in a structured way. That means agents can query databases, read repositories, call APIs, or trigger internal workflows while reasoning through a task.

At small scale, setting up MCP servers is surprisingly simple. You connect a tool, expose its schema, and the model can start using it almost immediately.

But once MCP tooling moves into production environments, the architecture starts to matter.

Questions appear quickly:

  • Who controls which tools an agent can access?
  • How do you prevent accidental access to sensitive systems?
  • How do you monitor tool usage across teams?
  • How do you enforce budgets and rate limits?

In other words, MCP servers introduce a new infrastructure layer: AI tooling infrastructure.

This article explores how teams are running MCP servers in production, and why introducing an AI gateway like Bifrost becomes essential for securing, scaling, and governing AI tooling.


What MCP Servers Actually Do

Before diving into infrastructure concerns, it’s helpful to understand what MCP servers provide.

The Model Context Protocol (MCP) allows LLMs to access structured tools in a standardized way. Instead of writing custom integrations for every API, developers expose tools through MCP servers that describe:

  • available actions
  • input parameters
  • expected outputs
  • permissions or constraints

For example, an MCP server might expose tools like:

  • search_docs
  • query_database
  • create_github_issue
  • deploy_service

When an AI agent reasons about a task, it can decide whether to call one of these tools.

In practice, this transforms LLMs from simple chat interfaces into autonomous problem-solving systems capable of interacting with real infrastructure.


The MCP Architecture Most Developers Start With

When developers experiment with MCP, they usually connect servers directly to the agent environment.

The architecture usually looks something like this in early MCP experiments:

AI agent connected directly to multiple MCP servers including database tools, GitHub tools, search tools, and internal APIs in a Model Context Protocol architecture
Direct MCP architecture where an AI agent connects to multiple MCP servers.

At first, this approach works well.

The agent can access multiple tools, and each MCP server handles its own logic.

However, once the number of tools grows, several problems begin to appear:

  • Tool permissions become difficult to manage
  • Logging is fragmented across servers
  • Security policies are inconsistent
  • Teams configure tools differently

For a solo developer this may be manageable.
For a shared AI platform, it quickly becomes fragile.


The Hidden Problem: Uncontrolled AI Tool Access

The biggest challenge with MCP servers isn’t connecting tools. It’s controlling how those tools are used.

Imagine an engineering organization running multiple AI agents across development environments.

Each agent might have access to tools capable of:

  • querying internal databases
  • modifying infrastructure
  • triggering deployments
  • reading proprietary data

Without centralized governance, several risks emerge.

1. Security Risks

If MCP tools are configured locally, developers may accidentally expose sensitive systems.

For example:

  • production databases
  • internal APIs
  • CI/CD pipelines

Once the model knows a tool exists, it may attempt to use it while reasoning.

That makes permission management essential.

2. Cost and Resource Usage

Many tools invoked by AI agents generate external costs.

Examples include:

  • LLM API calls
  • database queries
  • search indexes
  • compute-heavy workflows

Without centralized governance, it becomes difficult to answer questions like:

  • Which team is generating the most AI cost?
  • Which tools are used most frequently?
  • Which agents trigger expensive operations?

3. Observability Challenges

Debugging AI systems is already difficult.

When tool calls are distributed across multiple MCP servers, tracing a single request can become nearly impossible.

You may need to inspect logs across:

  • the agent
  • each MCP server
  • external APIs

A centralized logging layer simplifies this dramatically.


Introducing the MCP Gateway Layer

This is where the concept of an MCP gateway becomes useful.

Instead of connecting AI agents directly to every MCP server, requests pass through a centralized control layer.

MCP gateway architecture routing AI agent tool requests to database tools, GitHub tools, search tools, and internal APIs
MCP gateway architecture where a centralized gateway routes AI agent requests to multiple MCP tool servers.

The gateway becomes responsible for:

  • tool discovery
  • permission enforcement
  • logging and observability
  • rate limiting
  • request validation
  • budget control

From the agent’s perspective, nothing changes.
But the infrastructure becomes far easier to manage.


Running MCP Infrastructure with Bifrost

This is exactly where Bifrost AI gateway fits into the architecture.

Bifrost is an open-source AI gateway designed for production LLM workloads. In addition to handling multi-provider model routing, it also supports MCP tooling infrastructure.

Instead of exposing MCP servers directly to agents, teams can route all traffic through Bifrost.

AI gateway architecture using Bifrost routing AI agent requests to multiple MCP servers including database, GitHub, search, and internal API tools
AI gateway architecture where Bifrost routes AI agent requests to multiple MCP servers.

This architecture allows Bifrost to act as a central control plane for AI tooling.

Key capabilities include:

  • MCP tool discovery and routing
  • access control and permission policies
  • cost tracking for tool usage
  • centralized observability
  • rate limiting and quotas

Because these policies live in infrastructure rather than local configuration, governance becomes consistent across teams.


Centralized Governance with Virtual Keys

When AI agents gain access to external tools, governance quickly becomes a critical concern.

Unlike simple API calls, MCP tools can trigger real actions: querying databases, interacting with internal APIs, or modifying resources. Without proper controls, a misconfigured agent could easily execute operations it should never have access to.

A practical solution is to move governance away from individual applications and enforce it at the infrastructure layer.

Bifrost approaches this problem through Virtual Keys, which act as programmable access policies for AI workloads.

Instead of distributing unrestricted API keys to every agent or service, teams issue Virtual Keys that define exactly what an AI workload is allowed to do.

A single Virtual Key can enforce rules such as:

  • maximum monthly spending for LLM usage
  • request throughput limits
  • allowed or restricted model providers
  • access permissions for specific MCP tools

For example, an organization might create separate keys for different teams or environments.

A development key might allow access to internal documentation tools and staging APIs, while production automation keys might be restricted to read-only tools.

Requests routed through Bifrost include the Virtual Key header:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-engineering-main" \
  -d '{ ... }'
Enter fullscreen mode Exit fullscreen mode

Before the request reaches a model provider or MCP server, the gateway evaluates the policy associated with that key.

If the request violates the defined constraints, it is rejected immediately.

This approach shifts security decisions from application logic to infrastructure policy, which is significantly easier to audit and maintain as systems grow.


Observability for AI Tooling

Another major advantage of routing MCP traffic through a gateway is observability.

Every request passing through Bifrost can capture metadata such as:

  • prompt inputs
  • tool calls triggered by the model
  • selected model provider
  • token usage
  • latency
  • cost
  • error messages

Logs are accessible through the built-in dashboard:

http://localhost:8080/logs
Enter fullscreen mode Exit fullscreen mode

This provides a unified view of how AI agents interact with tools across the entire system.

Instead of debugging across multiple services, engineers can trace requests directly from the gateway layer.


Performance and Scalability

Once MCP tooling becomes part of a production system, the number of interactions between models and tools increases rapidly.

Agents might query multiple services during a single task: retrieving documentation, inspecting logs, and triggering automation pipelines. In larger environments, this traffic can grow to thousands of tool calls per minute.

At that scale, infrastructure design matters.

Bifrost is implemented in Go and built specifically for high-concurrency workloads. The gateway is capable of handling large volumes of AI requests while simultaneously performing several critical operations:

  • request routing to different model providers
  • enforcement of policy rules
  • logging of tool interactions
  • tracking token consumption and cost

Because these responsibilities are centralized in the gateway layer, individual applications remain lightweight. They simply send requests to a single endpoint while the gateway handles the operational complexity behind the scenes.

In practice, this design allows organizations to support:

  • large fleets of AI agents
  • multiple MCP tool servers
  • shared AI infrastructure across teams

without turning the application layer into a tangled web of integrations.


How MCP Servers Changed the Way I Build AI Workflows

When I first started experimenting with MCP servers, the setup was surprisingly simple. I connected an AI agent to a few MCP tools, exposed their schemas, and the model could immediately start using them to search documentation, query APIs, or inspect code.

At first, everything worked smoothly.

But as I added more tools and MCP servers, the system started becoming harder to manage. Understanding which tool an agent had used, tracking logs, and debugging tool failures often meant jumping between multiple services.

The complexity didn’t come from the tools themselves, but from how they were connected.

Introducing a gateway layer in front of the MCP servers made a big difference. Instead of agents connecting directly to every tool, all requests passed through a single control point.

That small architectural change made the system much easier to operate.

Tool usage became visible in one place, policies could be applied consistently, and adding new tools no longer required updating multiple integrations.

For small experiments, direct MCP integrations work fine. But once MCP servers start powering real workflows, treating them as AI infrastructure rather than just tools makes everything significantly easier to manage.


When You Actually Need MCP Governance

Not every project requires a full MCP governance layer.

For prototypes or personal projects, direct MCP connections are usually sufficient.

However, once AI systems interact with real infrastructure, centralized governance becomes important.

You’ll likely benefit from an MCP gateway when:

  • multiple MCP servers are deployed
  • AI agents interact with production systems
  • teams share AI tooling infrastructure
  • security policies must be enforced
  • cost control becomes necessary

At that point, treating MCP as infrastructure rather than tooling becomes the right architectural decision.


Final Thoughts

MCP servers are transforming how AI systems interact with software environments.

Instead of limiting models to static prompts, developers can give them access to tools capable of retrieving data, triggering workflows, and modifying systems.

That power comes with new operational challenges.

As the number of tools grows, teams need ways to control access, track usage, and enforce governance policies.

Introducing an MCP gateway layer solves many of these challenges by centralizing security, observability, and cost control.

If you're interested in how organizations build full multi-provider AI infrastructure around gateway architectures, I explored that topic in more detail in this guide:
How to Build a Multi-Provider LLM Infrastructure with an AI Gateway (OpenAI, Claude, Azure & Vertex).

For teams building production AI platforms, Bifrost AI gateway provides the infrastructure necessary to run MCP tooling safely at scale.


Thanks for reading! 🙏🏻
I hope you found this useful ✅
Please react and follow for more 😍
Made with 💙 by Hadil Ben Abdallah
LinkedIn GitHub Daily.dev

Top comments (2)

Collapse
 
hanadi profile image
Ben Abdallah Hanadi

This really captures the shift from “cool demo” to real infrastructure. MCP feels simple at first, but the moment you scale, governance becomes unavoidable.
The gateway layer insight is spot on.

Collapse
 
hadil profile image
Hadil Ben Abdallah

Exactly. That transition is where most of the real challenges start.

MCP makes it feel easy in the beginning, which is why the infrastructure side often gets overlooked. But once it touches real systems, governance isn’t optional anymore.

Glad the gateway layer insight resonated 😍