Introduction
Imagine this scenario: you’ve built a backend application, tested it on your localhost, and the requests are blazing fast with minimal resource usage. But when the server load increases, requests slow down or even start to time out. You’re left scratching your head — is the problem in the backend logic, the programming language, the database queries, the hosting machine, or the reverse proxy?
In this article, we’ll take a deep dive into how backend HTTP requests are executed under the hood. By understanding what your server is actually doing, you’ll be better equipped to identify performance bottlenecks.
How TCP Connections Are Established
We won’t go into too much depth here, just enough lingo to follow the rest of the article. (If you’d like a deep-dive on TCP itself, drop a comment and I’ll write a separate post.)
TCP uses a three-way handshake:
- SYN: the client sends a sync request to the server.
- SYN/ACK: the server acknowledges and responds with its own sync request.
- ACK: the client acknowledges, and both sides are in sync.
At this point, the TCP connection is established.
👉 Why TCP?
Because HTTP/1.1 and HTTP/2 — the most widely used versions — are built on top of TCP. (HTTP/3 uses QUIC, a different protocol, which we won’t cover here.)
The Kernel’s Role
When you send an HTTP request to a server, the Linux kernel manages it using two queues:
1. The SYN Queue
This queue holds half-open connections. After the server replies with SYN/ACK
but before the client’s final ACK
, the connection sits here.
2. The Accept Queue
Once the handshake is complete, the connection is moved to the accept queue. At this point, the kernel is done — it’s up to the backend application to pick it up using the accept()
syscall.
Important detail: if the accept queue is full, new connections will be dropped (or reset). This is controlled by the kernel parameter:
cat /proc/sys/net/core/somaxconn
-> 128
(default is often 128). Increasing this can help servers under heavy load.
Another detail: if the ack from clients takes too long the kernel automatically drops the connections
The Backend Application’s Role
After a connection enters the accept queue, the application must call the accept()
syscall to start handling it.
Your application’s concurrency performance depends on how quickly it can:
- Accept new connections.
- Execute requests without blocking other connections.
How Different Backends Handle Connections
We need to distinguish two phases:
-
Accepting connections: how the server calls
accept()
. - Executing connections: how the HTTP request itself is processed.
Node.js
- Single-threaded for both accepting and executing requests.
- Famous for non-blocking I/O. While a request is waiting for an I/O operation (like a DB query), Node.js can accept new connections.
👉 Problem: if a request is CPU-bound (e.g., a huge loop or sync file read), the event loop blocks. Other requests won’t even be logged until the CPU task finishes.
Example:
import express from 'express';
const app = express();
const LIMIT = 1_000_000_000;
function processData() {
let sum = 0;
for (let i = 0; i < LIMIT; i++) sum += i;
return sum;
}
app.get('/blocking', (req, res) => {
console.log('Blocking request received');
const result = processData();
res.send(`Result is ${result}`);
});
app.listen(3000);
Call /blocking
twice in quick succession: the second request won’t even log until the first finishes.
Go
- Accepts connections on the main thread.
- Immediately spawns a goroutine for each request, leaving the main thread free to keep accepting new ones.
From the net/http
docs:
Serve accepts incoming connections, creating a new service goroutine for each. The service goroutines read requests and then call the handler to reply.
This model balances simplicity with concurrency.
Python (WSGI servers like Gunicorn/uWSGI)
- Concurrency depends on workers (threads or processes).
- When a connection is accepted, it’s assigned to a worker.
- Multiple workers prevent one slow request from blocking others.
- CPU-bound workloads still block individual workers, so scaling often means running multiple processes.
PHP (Apache/Nginx + PHP-FPM)
- The web server (Apache or Nginx) handles connection management.
- Each request is passed to a PHP worker (via mod_php or PHP-FPM).
- Requests run in isolated processes, so one request won’t block another — but each worker is heavier than a lightweight goroutine or async event loop.
Others (quick mentions)
- Java (Servlets, Tomcat, Netty): thread-per-request (Tomcat) or event-driven (Netty).
- Rust (Tokio, Actix): async event loops, similar to Node.js but highly optimized.
Useful Commands
Play around with these to observe queues and connections:
cat /proc/sys/net/ipv4/tcp_max_syn_backlog # max SYN queue length
cat /proc/sys/net/core/somaxconn # max accept queue length
ss -tln # show sockets + queue info
netstat -anp # legacy alternative
For load testing:
ab -n 1000 -c 100 http://localhost:3000/
wrk -t12 -c400 -d30s http://localhost:3000/
Conclusion
By now, you should have a clearer picture of what happens when a client connects to your backend:
- The kernel manages the handshake and queues.
- The backend application must efficiently
accept()
and execute connections. - Different runtimes handle this differently: event loops, goroutines, workers, or processes.
Key takeaway: performance issues often aren’t just about your code or your database queries — they can stem from queue limits, blocking workloads, or how your runtime handles concurrency.
In part two, we’ll look at how the kernel manages sending responses using the send/receive queues (sendq
and recvq
).
Top comments (0)