DEV Community

Cover image for From Sockets to Server: What I Learned Building My Own Web Server
Krishna Aditya Srivastava
Krishna Aditya Srivastava

Posted on

From Sockets to Server: What I Learned Building My Own Web Server

Most of us never think about what happens when a web server actually receives a request. Frameworks handle it. Infrastructure hides it. And that's fine — until you want to really understand what's going on underneath.

So I built one myself. An HTTP server in C++, starting from raw POSIX sockets. No frameworks, no libraries for the hard parts. Just system calls, byte buffers, and a lot of edge cases.

What started as a learning exercise turned into something more specific: watching performance bottlenecks shift layers as the architecture improved. That turned out to be the most interesting part.


Why Build This at All?

A few questions kept nagging me that I couldn't answer confidently:

  • How does a server know when a full HTTP request has arrived?

  • What actually happens when headers come in as fragments?

  • Why does a server handle 10 users fine, then struggle at 500?

  • Where do production servers like NGINX actually spend their time? The only way to stop guessing was to build it and find out.

The goal wasn't to beat NGINX. It was to make the costs visible.


The Architecture (Deliberately Simple)

I kept the design modular so failures were easy to trace:

Client → Accept → HTTP Read/Parse → Route → Response → Write
Enter fullscreen mode Exit fullscreen mode

Each layer had one job:

  • Socket layersocket, bind, listen, accept

  • HTTP I/O — buffered reads, parsing, response writing

  • Router — static and dynamic path matching

  • Runtime — thread pool and/or epoll-based execution That separation paid off. When something broke, it was obvious where to look.


Networking Is Messier Than the Textbook

The textbook version of a server looks clean:

  1. Accept connection

  2. Read request

  3. Process it

  4. Write response

  5. Close The real version? Reads are partial. Clients disconnect mid-write. Malformed requests arrive constantly. Keep-alive connections blur the line between "done" and "waiting."

Even tiny decisions matter. Adding SO_REUSEADDR — one line — prevents restart failures caused by sockets stuck in TIME_WAIT. The details add up fast.


HTTP Parsing: The First Humbling Moment

My first assumption was wrong immediately:

One read() call = one complete HTTP request.

Almost never true.

What actually works:

  • Accumulate incoming bytes in a buffer

  • Scan for \r\n\r\n (the end of headers)

  • Only then parse the headers

  • Use Content-Length to know how much body to expect And you need guardrails:

  • Cap header size (16KB is common)

  • Cap body size

  • Reject malformed requests early These defensive checks improved stability more than any performance optimization I made. Correctness has to come before speed.


The Concurrency Problem (Where It Gets Interesting)

My first concurrency model was simple: a thread pool with blocking I/O. Each thread picks up a connection and handles it start to finish.

This works great — until it doesn't.

The breaking point: threads block while waiting for slow or idle clients. With enough connections, every thread is just waiting. New requests queue up. Latency climbs. Throughput flatlines.

That's when I started benchmarking seriously.


Benchmarking: Watching Bottlenecks Move

I measured throughput (req/s), latency (avg and p99), and CPU behavior across four configurations. The question I asked at every stage:

What's the bottleneck now, and why?

Stage 1 — Baseline: ~5,000 req/s

Throughput stayed flat no matter how many connections I threw at it. Latency shot up from 10ms to 150ms+.

This is textbook queueing saturation — like a single checkout lane with a growing line. The system was fully occupied. More load just meant more waiting, not more work done.

The lesson: the architecture itself was the ceiling, not the code.

Stage 2 — Thread Pool Optimization: ~21,000 req/s

With 4 threads handling 800 connections in parallel, throughput jumped to ~21K req/s with p99 latency around 48ms.

Profiling with perf showed heavy time in:

  • Syscalls (write, do_syscall_64)

  • TCP stack functions (tcp_sendmsg, ip_output) That's a good sign. The bottleneck moved from application logic to the kernel's networking stack.

Stage 3 — Epoll: ~43,000 req/s

Switching to epoll roughly doubled throughput again.

The old model scanned all connections to find active ones — O(N) work even for idle sockets. Epoll flips this: the kernel tells you which sockets are ready. You only touch active connections.

Epoll isn't an optimization. It's a different cost model entirely. Without it, high connection counts just waste CPU on sockets that aren't doing anything.

Stage 4 — Epoll + Threads: ~57,000 req/s

Combining event-driven I/O with parallel execution got close to NGINX territory. Workers stayed fully utilized. Latency held steady under load.


How Does This Compare to NGINX?

NGINX clocked in around 60,000 req/s — slightly better, with lower average latency.

But not because of magic. The gap comes from:

  • Minimal userspace overhead

  • Highly tuned event loops

  • Efficient buffering

  • Fewer syscalls per request The key realization: the gap isn't conceptual. It's maturity. The architecture is similar. NGINX just has years of refinement on top.


The Pattern That Surprised Me

Looking back at all four stages, the same thing kept happening: as throughput improved, the bottleneck moved downward.

  1. Start with an architectural ceiling (queueing)

  2. Fix concurrency, hit kernel I/O limits

  3. Optimize I/O, hit kernel networking costs That progression — bottlenecks migrating from your code toward the kernel — is exactly what you want to see. It means you've eliminated most of what's in your control.


What I'd Do Next

If I continued this project:

  • Fully event-driven model (no blocking anywhere)

  • Better HTTP compliance (chunked encoding, more header handling)

  • Keep-alive connection tuning

  • Response and file caching

  • Built-in metrics and tracing


The Real Takeaway

This started as "build a web server."

It ended as: learn to read where performance goes by watching it move.

Frameworks are great. But rebuilding the abstractions they hide is one of the best ways to understand what they're actually doing — and what it costs.

Top comments (0)