DEV Community

Chris Burns for Stacklok

Posted on

Performance Testing MCP Servers in Kubernetes: Transport Choice is THE Make-or-Break Decision for Scaling MCP

The Model Context Protocol (MCP) has emerged as a critical standard for enabling AI models to interact with external tools and data sources securely. As organisations increasingly deploy MCP servers at scale in Kubernetes environments, understanding their performance characteristics under load becomes essential for production readiness.

This article analyses the findings from initial load testing performed on MCP servers running in Kubernetes with ToolHive, examining three different transport protocols and their suitability for high-concurrency production workloads.

Test Methodology and Setup

The load testing was conducted using a systematic approach to evaluate three MCP transport implementations:

  • stdio: Standard input/output communication requiring direct container attachment
  • SSE (Server-Sent Events): HTTP-based streaming protocol
  • StreamableHTTP: Custom streamable HTTP protocol designed for MCP

Each transport type was subjected to various load scenarios to measure throughput, error rates, latency, and scalability characteristics. The tests focused on identifying bottlenecks and determining which transport mechanisms could reliably handle production-scale traffic.

The MCP server used for testing was yardstick, which exposes an echo tool that simply returns the text provided in the request. This design helps eliminate caching effects, giving a clearer view of raw MCP server and ToolHive performance. Functionally similar to the mcp/everything server, yardstick is containerised and supports all three transport types.

This MCP server was deployed onto a local Kubernetes cluster using kind, with ToolHive running the MCP server and simple port forwarding for access. In real environments, this would look largely different, resulting in some additional latency in response times.

Performance Findings by Transport Type

stdio Transport

The stdio implementation demonstrated severe performance limitations that make it unsuitable for production use.

Test Name Concurrent Connections Duration RPS Total Expected Actual Requests Successful Failed Req/sec Min RT Max RT Avg RT
Basic Test 20 5s 10 50 22 2 20 0.64 19.78ms 30.02s 20.01s

Error Breakdown:

  • Timeouts: 8
  • Connection resets: 3
  • Connection closed: 9

The underlying architecture’s reliance on direct container attachment introduces built-in scalability limits. Every connection consumes dedicated container resources, making horizontal scaling costly and unreliable. As a result, performance was poor even at low concurrency: out of 50 requests, only 2 succeeded, and over half never left the client due to the cascading effects of earlier timeout errors.

SSE Transport

Test Name Concurrent Connections Duration RPS Total Expected Actual Requests Success Rate Req/sec Min RT Max RT Avg RT
Basic Test 20 5s 10 50 50 100.00% 7.23 11.13ms 21.89ms 18.56ms
Sustained Load 20 60s 50 3000 1861 100.00%* 29.87 4.76ms 2.00s 564.57ms

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Compared to stdio, SSE demonstrated superior throughput and reliability, completing all traffic, including items not transmitted in the stdio trial, and maintained solid performance at moderate volumes. However, with sustained heavy load, response times deteriorated, and at peak rates, the test harness expired before all requests could be issued.

SSE is now officially deprecated (in favour of Streamable HTTP), so expect fewer and fewer MCP servers to offer this as a transport type in future.

Streamable HTTP Transport

Streamable HTTP dominated across all metrics, but with one crucial caveat about session management: shared session pools and unique session pools.

Shared Session Pool (10 sessions)

Test Scenario Concurrent Connections Duration RPS Total Expected Requests Success Rate Req/sec Min RT Max RT Avg RT
Basic Load Test 20 5s 10 50 50 100.00% 7.24 1.88ms 15.66ms 5.31ms
Sustained 20 60s 50 3000 3000 100.00% 48.40 1.02ms 97.55ms 5.03ms
High Load 50 60s 100 6000 6000 100.00% 96.78 831µs 135.05ms 6.68ms
Very High Load 200 60s 500 30000 18757 100.00% 299.85 1.33ms 783.43ms 622.20ms
Very High Load 400 60s 500 30000 18546 100.00% 293.16 36.87ms 1.69s 1.28s
Very High Load 1000 60s 500 30000 19112 100.00% 292.62 5.09ms 3.58s 3.09s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Unique Session Per Request

Test Scenario Concurrent Connections Duration RPS Total Expected Requests Success Rate Req/sec Min RT Max RT Avg RT
Sustained 20 60s 50 3000 2244 100.00% 36.07 4.05ms 1.31s 272.93ms
High Load 50 60s 100 6000 2086 100.00% 33.03 5.37ms 4.23s 1.12s

Success Rate: 100% of sent requests (Note: Load harness limitations prevented sending all intended requests at peak load)

Streamable HTTP maintained 100% success rates across all requests sent during the scenarios while delivering 290-300 requests per second with shared sessions versus only 30-36 requests per second with unique sessions.

The Key Insight: Session Management is Everything

The most striking finding was the 10x performance difference between shared and unique session handling in Streamable HTTP. This reveals that session reuse isn't just an optimisation - it's fundamental to achieving production-scale performance.

Recommendations:

  • Build around sessions: Pool and reuse aggressively (where appropriate)
  • Avoid stdio in production, prefer Streamable HTTP by default (unless you have good reasons not to use it)

The Caveats

  • The yardstick MCP server is a simple echo tool with no long-running work, so it responds extremely quickly. Real MCP servers in the wild will almost certainly benchmark slower than the figures shown here.
  • Tests were run on a local Kubernetes cluster with port-forwarding, minimising latency. Expect slower results on remote clusters.
  • The load testing tool used was built specifically to run performance tests against MCP servers and is not a battle-hardened tool

The Takeaway

These results fundamentally change how we should think about MCP server deployments. Transport choice isn't just a technical detail - it's a make-or-break architectural decision that can determine whether your AI capabilities scale or fail under load.

For teams building production AI systems with MCP, Streamable HTTP with optimised session management represents a key path forward in the current MCP landscape for achieving the reliability and performance modern applications demand.

Top comments (0)