DEV Community

Dylan Dumont
Dylan Dumont

Posted on

HTTP/2 Multiplexing: Why One Connection Is Enough

In the age of high-concurrency systems, opening a new TCP connection for every request is performance suicide.

What We're Building

We are designing a backend service that handles thousands of concurrent requests without triggering connection timeouts. The goal is to eliminate the latency overhead of TCP handshakes for every single interaction. By leveraging HTTP/2 features, we can serve multiple requests simultaneously over a single persistent connection. This article explains how to configure a client to utilize multiplexing effectively and the architectural benefits this brings to a distributed backend system.

Step 1 — Reusing the TCP Handshake

The biggest bottleneck in network programming is establishing a connection. A three-way handshake adds round-trip latency that shouldn't be paid for every small request. HTTP/1.1 uses persistent connections, but HTTP/2 takes this further by allowing parallel requests on that one connection.

transport := &http.Transport{
    MaxIdleConnsPerHost:   100,
    IdleConnTimeout:       90 * time.Second,
    ForceAttemptHTTP2:     true,
    TLSHandshakeTimeout:   10 * time.Second,
}
Enter fullscreen mode Exit fullscreen mode

In Go, setting ForceAttemptHTTP2: true is critical. Without it, a Go client might default to HTTP/1.1, opening a new TCP connection per host. By configuring MaxIdleConnsPerHost, we ensure the client keeps idle connections alive, amortizing the TCP handshake cost across hundreds of requests. This reduces latency significantly, especially for cold starts.

Step 2 — Sending Multiple Requests Per Connection

HTTP/2 introduces a critical improvement over HTTP/1.1: multiplexing. This allows multiple requests and responses to share the same underlying TCP connection. Unlike HTTP/1.1, where requests are serialized, HTTP/2 allows a client to fire GET /user?id=1 and GET /user?id=2 in parallel on the same wire.

req1, _ := http.NewRequest("GET", "https://api.example.com/user/1", nil)
req2, _ := http.NewRequest("GET", "https://api.example.com/user/2", nil)
resp1, err1 := client.Do(req1)
resp2, err2 := client.Do(req2)
Enter fullscreen mode Exit fullscreen mode

Here we see how a single http.Client instance handles two requests concurrently. While the code looks simple, the underlying transport layer manages the frames interleaved in the TCP stream. This eliminates head-of-line blocking for the payload, meaning a slow server response won't stall other requests on the same connection.

Step 3 — Compressing Overhead with HPACK

HTTP headers can be verbose. Repeating Host, User-Agent, or Authorization headers for every request wastes bandwidth and CPU. HTTP/2 uses the HPACK compression format to encode headers.

// HPACK is handled transparently inside the transport layer.
// You don't usually need to configure it manually in standard Go clients.
client := &http.Client{
    Transport: transport,
}
Enter fullscreen mode Exit fullscreen mode

By enabling HTTP/2 (ForceAttemptHTTP2: true), the client automatically applies HPACK. This drastically reduces the bytes written over the wire without altering the payload. For high-throughput services, this means less network chatter and lower CPU overhead for serialization, especially on constrained networks.

Step 4 — Managing Stream State and Errors

HTTP/2 introduces new error handling semantics compared to HTTP/1.1. If a request fails, the error applies to the specific stream, not the entire connection. However, a severe issue like a certificate error or a RST_STREAM might close the connection.

// Handling a stream-level error vs connection-level error.
if resp.StatusCode == http.StatusServiceUnavailable {
    // Stream error: retry this request or fail gracefully.
}
Enter fullscreen mode Exit fullscreen mode

We must ensure our client handles these scenarios gracefully. Using timeouts and retry logic is essential. If the server sends a GOAWAY frame, the client should release resources and stop sending new requests to that connection, preventing data loss during graceful shutdowns.

Key Takeaways

  • TCP Persistence — Establishing TCP is expensive; reuse connections to amortize the handshake cost.
  • Parallel Streams — Sending multiple requests simultaneously eliminates queuing latency.
  • Header Compression — Reduces bandwidth usage without changing application logic.
  • Flow Control — Manages backpressure to prevent overwhelmed servers from crashing.

What's Next

Now that you understand the mechanics, the next step is integrating this into your service. You might consider moving to HTTP/3 (QUIC) which provides better resilience against packet loss, or exploring gRPC which sits on top of HTTP/2 for even lower latency. You should also look into connection pooling strategies for database drivers which work similarly.

Further Reading

Here are resources that deepen your understanding of systems and performance:

Conclusion

By optimizing your client for multiplexing, you gain the resilience to handle load spikes and the efficiency to reduce latency. The next time you open a new connection for a small request, remember that HTTP/2 provides the tools to do better. Happy coding.

Top comments (0)