DEV Community

Cover image for When Your API “Randomly” Starts Timing Out
Kshitij Sharma
Kshitij Sharma

Posted on

When Your API “Randomly” Starts Timing Out

You deploy a perfectly fine service. Load tests passed. Latency looked clean. Then production hits—and suddenly:

  • P95 latency spikes
  • Requests hang without logs
  • CPU is fine, memory is fine… but users are screaming

This isn’t a “bug.” This is you not understanding the actual HTTP request lifecycle beyond the textbook diagram.

If you don’t know what really happens between a client sending a request and your handler returning a response, you’re flying blind.

HTTP Request Lifecycle — What Actually Happens Under the Hood

Forget diagrams like Client → Server → Response. That’s marketing-level abstraction.

A real request goes through:

1. Connection Establishment

  • DNS resolution
  • TCP handshake (3-way)
  • TLS handshake (if HTTPS)
  • Connection pooling / reuse (keep-alive)

2. Kernel → User Space Transition

  • NIC receives packet → kernel buffer
  • Socket read readiness via epoll/kqueue
  • Data copied into user space buffers

3. HTTP Parsing

  • Raw bytes → protocol parsing (headers, method, path)
  • Chunked decoding / content-length validation
  • Header normalization

4. Routing & Middleware Chain

  • Path matching (often regex or trie-based)
  • Middleware execution (auth, logging, rate limiting)

5. Business Logic Execution

  • DB calls
  • External APIs
  • CPU-bound work

6. Response Construction

  • Serialization (JSON, protobuf, etc.)
  • Compression (gzip, brotli)

7. Write Back to Socket

  • Kernel send buffer
  • TCP congestion control
  • Potential partial writes

8. Connection Lifecycle Decision

  • Keep-alive reuse vs close
  • Idle timeout tracking

Miss any one of these layers, and you’ll misdiagnose production issues.


Where Systems Actually Break

Let’s cut the theory. Real failures:

🔴 1. Head-of-Line Blocking in Connection Pools

You think you're async, but your HTTP client pool is exhausted.

Result:

  • Requests queue waiting for a free connection
  • Latency explodes without CPU increase

🔴 2. Slow Clients = Resource Leaks

If a client reads slowly:

  • Your server keeps buffers open
  • Threads/event-loop slots remain occupied

This is classic slowloris territory.


🔴 3. Middleware Abuse

Stacking 10 middlewares sounds clean.

Reality:

  • Each adds latency
  • Each may block (logging, auth calls)
  • Hard to reason about ordering

🔴 4. TLS Handshake Overhead

Without reuse:

  • Every request pays ~1–2 RTT extra
  • CPU spikes due to crypto

🔴 5. Kernel Buffer Backpressure

Your app “sent” the response.

Kernel says:

Nope, buffer full. Try later.

If you ignore this:

  • Writes block
  • Event loop stalls
  • Throughput collapses

Architecture Decisions That Actually Matter

1. Thread-per-request vs Event Loop

Thread-per-request (e.g., classic Java)

Pros:

  • Simpler mental model
  • Blocking code is fine

Cons:

  • Context switching overhead
  • Memory per thread (~1MB stack)

Event-driven (Node.js, Netty, Go runtime hybrid)

Pros:

  • High concurrency
  • Efficient IO

Cons:

  • Blocking = catastrophic
  • Debugging harder

2. Reverse Proxy in Front (NGINX / Envoy)

You should not expose your app server directly.

Why:

  • Handles TLS termination
  • Absorbs slow clients
  • Better connection management

3. Connection Reuse Strategy

Bad:

  • New TCP per request

Better:

  • HTTP/1.1 keep-alive

Best:

  • HTTP/2 multiplexing

Trade-off:

  • HTTP/2 introduces head-of-line blocking at TCP layer
  • QUIC (HTTP/3) fixes it but adds complexity

Implementation: What This Looks Like in Code

Example: Minimal HTTP Server (Node.js — showing lifecycle touchpoints)

const http = require('http');

const server = http.createServer((req, res) => {
  // 1. Request received (already parsed by Node's HTTP parser)

  // 2. Middleware simulation
  const start = Date.now();

  if (req.headers['x-block']) {
    // simulate bad middleware
    while (Date.now() - start < 100) {}
  }

  // 3. Business logic
  setTimeout(() => {
    const responseBody = JSON.stringify({ ok: true });

    // 4. Response write
    res.setHeader('Content-Type', 'application/json');
    res.setHeader('Content-Length', Buffer.byteLength(responseBody));

    res.write(responseBody);

    // 5. End response (flush to kernel)
    res.end();
  }, 10);
});

// 6. Connection-level tuning
server.keepAliveTimeout = 5000;
server.headersTimeout = 6000;

server.listen(3000);
Enter fullscreen mode Exit fullscreen mode

Where This Code Lies to You

  • You don’t see TCP
  • You don’t see kernel buffers
  • You don’t control backpressure explicitly
  • You don’t see partial writes

That abstraction is convenient—and dangerous.


Advanced Concern: Backpressure Handling

Most people ignore this. That’s why systems collapse under load.

Example (Node.js stream backpressure):

function writeResponse(res, data) {
  const canContinue = res.write(data);

  if (!canContinue) {
    // Kernel buffer full — wait
    res.once('drain', () => {
      console.log('Resumed writing');
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

If you ignore this:

  • Memory spikes
  • Latency spikes
  • Eventually crashes

Failure Case: Timeout Mismatch Hell

You configure:

  • Load balancer timeout: 60s
  • App server timeout: 30s
  • DB timeout: 10s

What happens?

  • DB times out → app retries
  • App still running → LB kills connection
  • Client retries → duplicate work

Result:

  • Cascade failure

Trade-offs You Can’t Avoid

Latency vs Throughput

  • Small buffers → lower latency, more syscalls
  • Large buffers → better throughput, worse tail latency

Simplicity vs Control

  • Frameworks hide complexity
  • But you lose control over:

    • connection reuse
    • backpressure
    • parsing behavior

CPU vs Network Efficiency

  • Compression saves bandwidth
  • Costs CPU
  • Under load, CPU becomes bottleneck

Keep-Alive vs Resource Locking

  • Keep-alive reduces handshake overhead
  • But holds connections longer
  • Risk: connection pool exhaustion

Final System Design (What Actually Works in Production)

A sane architecture:

Client
  ↓
CDN (optional)
  ↓
Reverse Proxy (NGINX / Envoy)
  ↓
App Server (stateless, event-driven)
  ↓
Service Layer
  ↓
Database / Cache
Enter fullscreen mode Exit fullscreen mode

Key rules:

  • Terminate TLS early
  • Enforce timeouts at every layer
  • Use connection pooling aggressively
  • Monitor queueing, not just CPU

Key Takeaways (No Fluff)

  • HTTP lifecycle is mostly not in your code—it’s in the kernel and network stack
  • Most latency issues are queueing problems, not computation problems
  • Backpressure is real; ignoring it will kill your system
  • Middleware is not free—treat it like production code, not decoration
  • Timeouts must be aligned across layers or you create cascading failures
  • Keep-alive and pooling are double-edged swords

If you still think HTTP is just “request comes in, response goes out,” you’re not ready to debug production systems.

Top comments (0)