When Anthropic released the Model Context Protocol in November 2024, the initial conversation was mostly about the protocol itself: a standard way for AI agents to discover and call tools without building custom integrations for every API. That problem was real and the protocol mostly solved it.
But MCP adoption created a second problem that teams started hitting around mid-2025: how do you manage dozens of MCP server connections at scale, control what agents can access, see what they're actually doing, and handle credential rotation without your security team losing sleep? The base protocol doesn't address any of this.
That's the gap MCP gateways fill. I spent several weeks evaluating the main options. This covers what I found, including where each tool has a genuine edge and where it falls short.
What actually matters when evaluating an MCP gateway
Most comparison posts lead with latency benchmarks or feature checkboxes. Those matter, but three questions did more to differentiate the tools in practice:
Where does it fit in your existing stack? Some gateways are standalone infrastructure; others integrate tightly with a specific cloud provider or container runtime. The right choice depends heavily on what you're already running - adopting the wrong architecture fit creates more work than it saves.
What security model does it enforce? Tool poisoning, credential exposure, and unauthorized agent access are production concerns, not theoretical ones. Gateways take meaningfully different approaches, and the differences aren't cosmetic.
What's the operational overhead at scale? Managing 5 MCP connections with no central observability is fine. Managing 50 without it isn't. Solutions that are easy to set up often become painful to operate as workloads grow.
The five tools I evaluated
1. TrueFoundry

TrueFoundry's MCP gateway is built around a specific architectural bet: teams managing LLM workloads shouldn't have to run separate infrastructure for MCP orchestration. The unified platform handles both, with the same security, observability, and rate-limiting mechanisms applying to LLM calls and tool calls alike.
The performance claims in their docs are specific s*ub-3ms latency, 350+ RPS* on a single vCPU, attributed to in-memory auth and rate limiting rather than database lookups. The architecture makes that plausible, but there's no published benchmark methodology or test configuration with these numbers. [NEEDS: test configuration details — payload size, model size, infra spec, test harness — so readers can reproduce or compare against their own workload.] If latency is a hard requirement, run your own test before planning capacity.
The genuinely strong parts: unified billing and observability across LLM and tool usage, MCP Server Groups for per-team isolation without separate gateway deployments, and an interactive playground that generates production-ready client code across multiple languages. If you're already tracking LLM costs through TrueFoundry's AI gateway, getting consolidated tool-call data in the same dashboard is a real operational win rather than a feature checkbox.
The weaknesses: this is a full-platform product, which means you're adopting a broader dependency. If you want a thin, standalone MCP proxy, TrueFoundry is more than you need. It's also a commercial product — pricing isn't published in a way that makes quick evaluation easy for smaller teams. [NEEDS: pricing information or at least an order-of-magnitude range.]
Best fit: Teams already running significant AI workloads on TrueFoundry, or those who want a single vendor managing both LLM routing and MCP orchestration with unified cost visibility.
2. Docker MCP Gateway
Docker applied its core capability — container isolation — to the MCP problem, and the result is coherent. Each MCP server runs in its own container with CPU capped at 1 core, memory at 2GB, and no host filesystem access by default. Cryptographically signed container images address supply chain security in a way that none of the other tools in this list have an equivalent for.
The Docker MCP Catalog now ships with 300+ verified, pre-packaged server images. That's the largest pre-built library of any option here, and it lowers the barrier to trying new tools significantly — the difference between "pull this image" and "read the setup docs and figure out how to auth it" is non-trivial when you're evaluating a dozen servers at once.
The Docker Desktop integration is a genuine differentiator for local development. Developers get safe, isolated MCP experimentation without complex setup, and the same container model carries forward to production environments.
Where Docker is weaker: the latency profile (source benchmarks show 50–200ms [NEEDS: benchmark methodology]) reflects container overhead, which compounds for agents making many short sequential tool calls. There's also limited built-in observability beyond logging and call tracing — you need to bring your own monitoring stack for meaningful analytics.
Best fit: Container-first infrastructure teams, use cases involving code execution or high-isolation requirements (the resource caps matter a lot here), and teams that want standardized packaging across many MCP servers without managing custom deployment scripts.
3. IBM ContextForge
One important correction to the framing you'll find in some other writeups: IBM ContextForge is no longer in alpha/beta, and "no commercial support" is no longer accurate. IBM released v1.0.0-GA and offers IBM Elite Support for commercial deployments. The project now has 100+ open source contributors and powers IBM Consulting Advantage, which serves 160,000+ users. It's not an experimental side project — that framing is outdated.
With that corrected: ContextForge's federation capabilities are the most architecturally sophisticated of any option here. Auto-discovery via mDNS, health monitoring across gateway instances, and capability merging that lets multiple gateways present as a unified endpoint — these are features that matter in genuinely complex deployments where multiple teams or regions each manage their own MCP infrastructure. Virtual server composition lets you combine multiple MCP backends into a single logical endpoint, simplifying agent interactions without restructuring your backend.
Authentication flexibility is also notable: JWT Bearer, Basic Auth, custom header schemes, AES encryption for tool credentials, and multi-database backend support (PostgreSQL, MySQL, SQLite). If you're integrating with existing enterprise identity systems, this matters.
Where ContextForge is weaker: the developer experience is steeper than Docker or TrueFoundry. Configuration requires more infrastructure expertise, and the latency overhead (100–300ms per source benchmarks [NEEDS: methodology]) is meaningfully higher than lighter options. Also worth evaluating: how tightly it integrates with IBM's own cloud services versus cloud-agnostic deployments.
Best fit: Large enterprises anticipating multiple gateway deployments across environments or regions — this is the only tool here that was designed for federation from the ground up. Requires a team comfortable with infrastructure complexity.
4. Microsoft MCP Gateway (Azure)
Microsoft's approach isn't a single product — it's a set of integration points across Azure services that together handle gateway responsibilities. Azure API Management handles policy enforcement and OAuth flows; Azure App Service and Functions handle server hosting; Microsoft Entra ID handles authentication and RBAC. The recent Build 2026 updates added built-in MCP for Azure App Service and improved Functions MCP extensions with native Entra auth.
For Azure-native organizations, this native integration is a real advantage. OAuth flows work without additional configuration. Entra ID policies apply directly. The Azure Resource Manager MCP Server gives agents first-class access to infrastructure operations — querying, deploying, and managing Azure resources through ARM — in ways that would require significant custom integration with other gateways.
Where Microsoft wins clearly: if your security and compliance posture is built around Entra ID, and your agents primarily interact with Azure-hosted services, the native identity flow is substantive, not cosmetic. No other tool here matches it in the Azure-native scenario.
Where it's weaker: anything outside the Azure ecosystem. Multi-cloud or hybrid deployments require significant custom work. The operational surface area is large — you're managing multiple Azure services rather than a single gateway product — and getting comprehensive monitoring requires stitching together Azure Monitor, Application Insights, and service-specific logging.
Best fit: Azure-committed organizations where Entra ID investment can be leveraged directly. Not a good fit for multi-cloud architectures or teams that need quick setup and centralized observability in a single console.
5. Lasso Security
Lasso (2024 Gartner Cool Vendor for AI Security — verified) is built around a problem the other tools treat as secondary: when AI agents interact with tools, most gateway infrastructure gives you almost no visibility into whether those interactions were legitimate or malicious.
Their specific capabilities: real-time prompt injection detection that blocks malicious inputs before they reach MCP tools; MCP server reputation scoring based on behavior patterns, code analysis, and community data (automatic blocking of flagged servers addresses supply chain attacks from compromised tool packages); token masking to prevent credential exposure in tool call logs. The plugin-based architecture lets organizations add security controls incrementally rather than adopting all-or-nothing.
This is the only tool here that was built with agent security as the primary design axis. That shows both in the depth of the security features and in what it doesn't do well as a general gateway — it's more security overlay than full orchestration infrastructure.
Where Lasso wins clearly: regulated industries and environments where agent tool access is a high-consequence security surface. Healthcare, financial services, and legal sectors handling sensitive data get threat detection specifically designed for AI agent behavior patterns, which general-purpose security tools don't provide.
Where it's weaker: the security scanning adds latency overhead (100–250ms range per source benchmarks [NEEDS: methodology]), which compounds for high-frequency tool calls. It's also not a complete gateway replacement — most teams would run it alongside Docker or TrueFoundry for the routing and management layer.
Best fit: Regulated industries, any team where a security incident involving agent tool access would have severe downstream consequences. Likely used in conjunction with another gateway rather than as a standalone solution.
How to choose
Rather than a ranked list, here's how I'd map situations to tools:
Already running TrueFoundry for LLM routing, want unified cost and observability → TrueFoundry
Container-first infra, code execution isolation needed, want a large pre-built server catalog → Docker
Multiple gateway deployments across environments or regions, federation is a real requirement → IBM ContextForge
Azure-native, Entra ID is your identity backbone, agents interact primarily with Azure services → Microsoft
Regulated industry, agent security threat detection is a non-negotiable → Lasso Security (likely layered with one of the above)
One honest caveat on the performance comparisons: the latency numbers cited above — and in most comparison posts, including earlier versions of this one — come without published benchmark methodology, test configuration, or hardware specs. They're directionally useful but not reliable for capacity planning. If latency matters for your use case, run your own test on representative payloads and tool call patterns before committing.
Things this post didn't settle
A few open questions I'm still thinking about after this evaluation:
Observability standards: Each tool exports metrics and logs in different schemas. There's no common format yet, which means your monitoring stack needs custom adapters regardless of which gateway you choose. This is underdiscussed in most gateway comparisons.
Cost modeling at scale: The operational cost picture — caching overhead, retry rates, security scanning compute — is hard to predict without production data. Most teams I've talked to have been surprised by total tool-call costs at scale.
Multi-agent coordination: None of these tools natively handle agent-to-agent tool routing in a multi-agent architecture. If your setup involves several agents sharing a gateway, you'll hit undocumented edge cases.
If you've deployed any of these in production and have data that contradicts what I've written here — especially on latency or operational overhead at scale — I'd genuinely like to hear about it in the comments.
Top comments (0)