Kshitij Sharma

Posted on Apr 22

When Your API “Randomly” Starts Timing Out

#backend #webdev #networking #distributedsystems

You deploy a perfectly fine service. Load tests passed. Latency looked clean. Then production hits—and suddenly:

P95 latency spikes
Requests hang without logs
CPU is fine, memory is fine… but users are screaming

This isn’t a “bug.” This is you not understanding the actual HTTP request lifecycle beyond the textbook diagram.

If you don’t know what really happens between a client sending a request and your handler returning a response, you’re flying blind.

HTTP Request Lifecycle — What Actually Happens Under the Hood

Forget diagrams like Client → Server → Response. That’s marketing-level abstraction.

A real request goes through:

1. Connection Establishment

DNS resolution
TCP handshake (3-way)
TLS handshake (if HTTPS)
Connection pooling / reuse (keep-alive)

2. Kernel → User Space Transition

NIC receives packet → kernel buffer
Socket read readiness via epoll/kqueue
Data copied into user space buffers

3. HTTP Parsing

Raw bytes → protocol parsing (headers, method, path)
Chunked decoding / content-length validation
Header normalization

4. Routing & Middleware Chain

Path matching (often regex or trie-based)
Middleware execution (auth, logging, rate limiting)

5. Business Logic Execution

DB calls
External APIs
CPU-bound work

6. Response Construction

Serialization (JSON, protobuf, etc.)
Compression (gzip, brotli)

7. Write Back to Socket

Kernel send buffer
TCP congestion control
Potential partial writes

8. Connection Lifecycle Decision

Keep-alive reuse vs close
Idle timeout tracking

Miss any one of these layers, and you’ll misdiagnose production issues.

Where Systems Actually Break

Let’s cut the theory. Real failures:

🔴 1. Head-of-Line Blocking in Connection Pools

You think you're async, but your HTTP client pool is exhausted.

Result:

Requests queue waiting for a free connection
Latency explodes without CPU increase

🔴 2. Slow Clients = Resource Leaks

If a client reads slowly:

Your server keeps buffers open
Threads/event-loop slots remain occupied

This is classic slowloris territory.

🔴 3. Middleware Abuse

Stacking 10 middlewares sounds clean.

Reality:

Each adds latency
Each may block (logging, auth calls)
Hard to reason about ordering

🔴 4. TLS Handshake Overhead

Without reuse:

Every request pays ~1–2 RTT extra
CPU spikes due to crypto

🔴 5. Kernel Buffer Backpressure

Your app “sent” the response.

Kernel says:

Nope, buffer full. Try later.

If you ignore this:

Writes block
Event loop stalls
Throughput collapses

Architecture Decisions That Actually Matter

1. Thread-per-request vs Event Loop

Thread-per-request (e.g., classic Java)

Pros:

Simpler mental model
Blocking code is fine

Cons:

Context switching overhead
Memory per thread (~1MB stack)

Event-driven (Node.js, Netty, Go runtime hybrid)

Pros:

High concurrency
Efficient IO

Cons:

Blocking = catastrophic
Debugging harder

2. Reverse Proxy in Front (NGINX / Envoy)

You should not expose your app server directly.

Why:

Handles TLS termination
Absorbs slow clients
Better connection management

3. Connection Reuse Strategy

Bad:

New TCP per request

Better:

HTTP/1.1 keep-alive

Best:

HTTP/2 multiplexing

Trade-off:

HTTP/2 introduces head-of-line blocking at TCP layer
QUIC (HTTP/3) fixes it but adds complexity

Implementation: What This Looks Like in Code

Example: Minimal HTTP Server (Node.js — showing lifecycle touchpoints)

const http = require('http');

const server = http.createServer((req, res) => {
  // 1. Request received (already parsed by Node's HTTP parser)

  // 2. Middleware simulation
  const start = Date.now();

  if (req.headers['x-block']) {
    // simulate bad middleware
    while (Date.now() - start < 100) {}
  }

  // 3. Business logic
  setTimeout(() => {
    const responseBody = JSON.stringify({ ok: true });

    // 4. Response write
    res.setHeader('Content-Type', 'application/json');
    res.setHeader('Content-Length', Buffer.byteLength(responseBody));

    res.write(responseBody);

    // 5. End response (flush to kernel)
    res.end();
  }, 10);
});

// 6. Connection-level tuning
server.keepAliveTimeout = 5000;
server.headersTimeout = 6000;

server.listen(3000);

Where This Code Lies to You

You don’t see TCP
You don’t see kernel buffers
You don’t control backpressure explicitly
You don’t see partial writes

That abstraction is convenient—and dangerous.

Advanced Concern: Backpressure Handling

Most people ignore this. That’s why systems collapse under load.

Example (Node.js stream backpressure):

function writeResponse(res, data) {
  const canContinue = res.write(data);

  if (!canContinue) {
    // Kernel buffer full — wait
    res.once('drain', () => {
      console.log('Resumed writing');
    });
  }
}