The Model Context Protocol (MCP) has emerged as a critical standard for enabling AI models to interact with external tools and data sources securely. As organisations increasingly deploy MCP servers at scale in Kubernetes environments, understanding their performance characteristics under load becomes essential for production readiness.
This article analyses the findings from initial load testing performed on MCP servers running in Kubernetes with ToolHive, examining three different transport protocols and their suitability for high-concurrency production workloads.
Test Methodology and Setup
The load testing was conducted using a systematic approach to evaluate three MCP transport implementations:
- stdio: Standard input/output communication requiring direct container attachment
- SSE (Server-Sent Events): HTTP-based streaming protocol
- StreamableHTTP: Custom streamable HTTP protocol designed for MCP
Each transport type was subjected to various load scenarios to measure throughput, error rates, latency, and scalability characteristics. The tests focused on identifying bottlenecks and determining which transport mechanisms could reliably handle production-scale traffic.
The MCP server used for testing was yardstick
, which exposes an echo
tool that simply returns the text provided in the request. This design helps eliminate caching effects, giving a clearer view of raw MCP server and ToolHive performance. Functionally similar to the mcp/everything
server, yardstick is containerised and supports all three transport types.
This MCP server was deployed onto a local Kubernetes cluster using kind, with ToolHive running the MCP server and simple port forwarding for access. In real environments, this would look largely different, resulting in some additional latency in response times.
Performance Findings by Transport Type
stdio Transport
The stdio implementation demonstrated severe performance limitations that make it unsuitable for production use.
Test Name | Concurrent Connections | Duration | RPS | Total Expected | Actual Requests | Successful | Failed | Req/sec | Min RT | Max RT | Avg RT |
---|---|---|---|---|---|---|---|---|---|---|---|
Basic Test | 20 | 5s | 10 | 50 | 22 | 2 | 20 | 0.64 | 19.78ms | 30.02s | 20.01s |
Error Breakdown:
- Timeouts: 8
- Connection resets: 3
- Connection closed: 9
The underlying architecture’s reliance on direct container attachment introduces built-in scalability limits. Every connection consumes dedicated container resources, making horizontal scaling costly and unreliable. As a result, performance was poor even at low concurrency: out of 50 requests, only 2 succeeded, and over half never left the client due to the cascading effects of earlier timeout errors.
SSE Transport
Test Name | Concurrent Connections | Duration | RPS | Total Expected | Actual Requests | Success Rate | Req/sec | Min RT | Max RT | Avg RT |
---|---|---|---|---|---|---|---|---|---|---|
Basic Test | 20 | 5s | 10 | 50 | 50 | 100.00% | 7.23 | 11.13ms | 21.89ms | 18.56ms |
Sustained Load | 20 | 60s | 50 | 3000 | 1861 | 100.00%* | 29.87 | 4.76ms | 2.00s | 564.57ms |
Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)
Compared to stdio, SSE demonstrated superior throughput and reliability, completing all traffic, including items not transmitted in the stdio trial, and maintained solid performance at moderate volumes. However, with sustained heavy load, response times deteriorated, and at peak rates, the test harness expired before all requests could be issued.
SSE is now officially deprecated (in favour of Streamable HTTP), so expect fewer and fewer MCP servers to offer this as a transport type in future.
Streamable HTTP Transport
Streamable HTTP dominated across all metrics, but with one crucial caveat about session management: shared session pools and unique session pools.
Shared Session Pool (10 sessions)
Test Scenario | Concurrent Connections | Duration | RPS | Total Expected | Requests | Success Rate | Req/sec | Min RT | Max RT | Avg RT |
---|---|---|---|---|---|---|---|---|---|---|
Basic Load Test | 20 | 5s | 10 | 50 | 50 | 100.00% | 7.24 | 1.88ms | 15.66ms | 5.31ms |
Sustained | 20 | 60s | 50 | 3000 | 3000 | 100.00% | 48.40 | 1.02ms | 97.55ms | 5.03ms |
High Load | 50 | 60s | 100 | 6000 | 6000 | 100.00% | 96.78 | 831µs | 135.05ms | 6.68ms |
Very High Load | 200 | 60s | 500 | 30000 | 18757 | 100.00% | 299.85 | 1.33ms | 783.43ms | 622.20ms |
Very High Load | 400 | 60s | 500 | 30000 | 18546 | 100.00% | 293.16 | 36.87ms | 1.69s | 1.28s |
Very High Load | 1000 | 60s | 500 | 30000 | 19112 | 100.00% | 292.62 | 5.09ms | 3.58s | 3.09s |
Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)
Unique Session Per Request
Test Scenario | Concurrent Connections | Duration | RPS | Total Expected | Requests | Success Rate | Req/sec | Min RT | Max RT | Avg RT |
---|---|---|---|---|---|---|---|---|---|---|
Sustained | 20 | 60s | 50 | 3000 | 2244 | 100.00% | 36.07 | 4.05ms | 1.31s | 272.93ms |
High Load | 50 | 60s | 100 | 6000 | 2086 | 100.00% | 33.03 | 5.37ms | 4.23s | 1.12s |
Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)
Streamable HTTP maintained 100% success rates across all requests sent during the scenarios while delivering 290-300 requests per second with shared sessions versus only 30-36 requests per second with unique sessions.
The Key Insight: Session Management is Everything
The most striking finding was the 10x performance difference between shared and unique session handling in Streamable HTTP. This reveals that session reuse isn't just an optimisation - it's fundamental to achieving production-scale performance.
Recommendations:
- Build around sessions: Pool and reuse aggressively (where appropriate)
- Avoid stdio in production, prefer Streamable HTTP by default (unless you have good reasons not to use it)
The Caveats
- The yardstick MCP server is a simple
echo
tool with no long-running work, so it responds extremely quickly. Real MCP servers in the wild will almost certainly benchmark slower than the figures shown here. - Tests were run on a local Kubernetes cluster with port-forwarding, minimising latency. Expect slower results on remote clusters.
- The load testing tool used was built specifically to run performance tests against MCP servers and is not a battle-hardened tool
The Takeaway
These results fundamentally change how we should think about MCP server deployments. Transport choice isn't just a technical detail - it's a make-or-break architectural decision that can determine whether your AI capabilities scale or fail under load.
For teams building production AI systems with MCP, Streamable HTTP with optimised session management represents a key path forward in the current MCP landscape for achieving the reliability and performance modern applications demand.
Top comments (0)