Chris Burns for Stacklok

Posted on Aug 19

Performance Testing MCP Servers in Kubernetes: Transport Choice is THE Make-or-Break Decision for Scaling MCP

#mcp #toolhive #kubernetes #ai

The Model Context Protocol (MCP) has emerged as a critical standard for enabling AI models to interact with external tools and data sources securely. As organisations increasingly deploy MCP servers at scale in Kubernetes environments, understanding their performance characteristics under load becomes essential for production readiness.

This article analyses the findings from initial load testing performed on MCP servers running in Kubernetes with ToolHive, examining three different transport protocols and their suitability for high-concurrency production workloads.

Test Methodology and Setup

The load testing was conducted using a systematic approach to evaluate three MCP transport implementations:

stdio: Standard input/output communication requiring direct container attachment
SSE (Server-Sent Events): HTTP-based streaming protocol
StreamableHTTP: Custom streamable HTTP protocol designed for MCP

Each transport type was subjected to various load scenarios to measure throughput, error rates, latency, and scalability characteristics. The tests focused on identifying bottlenecks and determining which transport mechanisms could reliably handle production-scale traffic.

The MCP server used for testing was yardstick, which exposes an echo tool that simply returns the text provided in the request. This design helps eliminate caching effects, giving a clearer view of raw MCP server and ToolHive performance. Functionally similar to the mcp/everything server, yardstick is containerised and supports all three transport types.

This MCP server was deployed onto a local Kubernetes cluster using kind, with ToolHive running the MCP server and simple port forwarding for access. In real environments, this would look largely different, resulting in some additional latency in response times.

Performance Findings by Transport Type

stdio Transport

The stdio implementation demonstrated severe performance limitations that make it unsuitable for production use.

Test Name	Concurrent Connections	Duration	RPS	Total Expected	Actual Requests	Successful	Failed	Req/sec	Min RT	Max RT	Avg RT
Basic Test	20	5s	10	50	22	2	20	0.64	19.78ms	30.02s	20.01s

Error Breakdown:

Timeouts: 8
Connection resets: 3
Connection closed: 9

The underlying architecture’s reliance on direct container attachment introduces built-in scalability limits. Every connection consumes dedicated container resources, making horizontal scaling costly and unreliable. As a result, performance was poor even at low concurrency: out of 50 requests, only 2 succeeded, and over half never left the client due to the cascading effects of earlier timeout errors.

SSE Transport

Test Name	Concurrent Connections	Duration	RPS	Total Expected	Actual Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Basic Test	20	5s	10	50	50	100.00%	7.23	11.13ms	21.89ms	18.56ms
Sustained Load	20	60s	50	3000	1861	100.00%*	29.87	4.76ms	2.00s	564.57ms

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Compared to stdio, SSE demonstrated superior throughput and reliability, completing all traffic, including items not transmitted in the stdio trial, and maintained solid performance at moderate volumes. However, with sustained heavy load, response times deteriorated, and at peak rates, the test harness expired before all requests could be issued.

SSE is now officially deprecated (in favour of Streamable HTTP), so expect fewer and fewer MCP servers to offer this as a transport type in future.

Streamable HTTP Transport

Streamable HTTP dominated across all metrics, but with one crucial caveat about session management: shared session pools and unique session pools.

Shared Session Pool (10 sessions)

Test Scenario	Concurrent Connections	Duration	RPS	Total Expected	Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Basic Load Test	20	5s	10	50	50	100.00%	7.24	1.88ms	15.66ms	5.31ms
Sustained	20	60s	50	3000	3000	100.00%	48.40	1.02ms	97.55ms	5.03ms
High Load	50	60s	100	6000	6000	100.00%	96.78	831µs	135.05ms	6.68ms
Very High Load	200	60s	500	30000	18757	100.00%	299.85	1.33ms	783.43ms	622.20ms
Very High Load	400	60s	500	30000	18546	100.00%	293.16	36.87ms	1.69s	1.28s
Very High Load	1000	60s	500	30000	19112	100.00%	292.62	5.09ms	3.58s	3.09s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Unique Session Per Request

Test Scenario	Concurrent Connections	Duration	RPS	Total Expected	Requests	Success Rate	Req/sec	Min RT	Max RT	Avg RT
Sustained	20	60s	50	3000	2244	100.00%	36.07	4.05ms	1.31s	272.93ms
High Load	50	60s	100	6000	2086	100.00%	33.03	5.37ms	4.23s	1.12s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Streamable HTTP maintained 100% success rates across all requests sent during the scenarios while delivering 290-300 requests per second with shared sessions versus only 30-36 requests per second with unique sessions.

The Key Insight: Session Management is Everything

The most striking finding was the 10x performance difference between shared and unique session handling in Streamable HTTP. This reveals that session reuse isn't just an optimisation - it's fundamental to achieving production-scale performance.

Recommendations:

Build around sessions: Pool and reuse aggressively (where appropriate)
Avoid stdio in production, prefer Streamable HTTP by default (unless you have good reasons not to use it)

The Caveats

The yardstick MCP server is a simple echo tool with no long-running work, so it responds extremely quickly. Real MCP servers in the wild will almost certainly benchmark slower than the figures shown here.
Tests were run on a local Kubernetes cluster with port-forwarding, minimising latency. Expect slower results on remote clusters.
The load testing tool used was built specifically to run performance tests against MCP servers and is not a battle-hardened tool

The Takeaway

These results fundamentally change how we should think about MCP server deployments. Transport choice isn't just a technical detail - it's a make-or-break architectural decision that can determine whether your AI capabilities scale or fail under load.

For teams building production AI systems with MCP, Streamable HTTP with optimised session management represents a key path forward in the current MCP landscape for achieving the reliability and performance modern applications demand.

DEV Community

Performance Testing MCP Servers in Kubernetes: Transport Choice is THE Make-or-Break Decision for Scaling MCP

Test Methodology and Setup

Performance Findings by Transport Type

stdio Transport

SSE Transport

Streamable HTTP Transport

Shared Session Pool (10 sessions)

Unique Session Per Request

The Key Insight: Session Management is Everything

The Caveats

The Takeaway

Top comments (0)