<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Krishna Aditya Srivastava</title>
    <description>The latest articles on DEV Community by Krishna Aditya Srivastava (@krishna_adityasrivastava).</description>
    <link>https://dev.to/krishna_adityasrivastava</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861309%2F614c7a4e-3246-4d2d-933b-f3c28db64446.jpg</url>
      <title>DEV Community: Krishna Aditya Srivastava</title>
      <link>https://dev.to/krishna_adityasrivastava</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/krishna_adityasrivastava"/>
    <language>en</language>
    <item>
      <title>From Sockets to Server: What I Learned Building My Own Web Server</title>
      <dc:creator>Krishna Aditya Srivastava</dc:creator>
      <pubDate>Tue, 21 Apr 2026 21:53:59 +0000</pubDate>
      <link>https://dev.to/krishna_adityasrivastava/from-sockets-to-server-what-i-learned-building-my-own-web-server-an8</link>
      <guid>https://dev.to/krishna_adityasrivastava/from-sockets-to-server-what-i-learned-building-my-own-web-server-an8</guid>
      <description>&lt;p&gt;Most of us never think about what happens when a web server actually receives a request. Frameworks handle it. Infrastructure hides it. And that's fine — until you want to &lt;em&gt;really&lt;/em&gt; understand what's going on underneath.&lt;/p&gt;

&lt;p&gt;So I built one myself. An HTTP server in C++, starting from raw POSIX sockets. No frameworks, no libraries for the hard parts. Just system calls, byte buffers, and a lot of edge cases.&lt;/p&gt;

&lt;p&gt;What started as a learning exercise turned into something more specific: &lt;strong&gt;watching performance bottlenecks shift layers as the architecture improved.&lt;/strong&gt; That turned out to be the most interesting part.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Build This at All?
&lt;/h2&gt;

&lt;p&gt;A few questions kept nagging me that I couldn't answer confidently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How does a server know when a full HTTP request has arrived?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What actually happens when headers come in as fragments?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why does a server handle 10 users fine, then struggle at 500?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Where do production servers like NGINX actually spend their time? The only way to stop guessing was to build it and find out.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal wasn't to beat NGINX. It was to make the costs &lt;em&gt;visible&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture (Deliberately Simple)
&lt;/h2&gt;

&lt;p&gt;I kept the design modular so failures were easy to trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → Accept → HTTP Read/Parse → Route → Response → Write
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer had one job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Socket layer&lt;/strong&gt; — &lt;code&gt;socket&lt;/code&gt;, &lt;code&gt;bind&lt;/code&gt;, &lt;code&gt;listen&lt;/code&gt;, &lt;code&gt;accept&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HTTP I/O&lt;/strong&gt; — buffered reads, parsing, response writing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Router&lt;/strong&gt; — static and dynamic path matching&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Runtime&lt;/strong&gt; — thread pool and/or epoll-based execution That separation paid off. When something broke, it was obvious &lt;em&gt;where&lt;/em&gt; to look.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Networking Is Messier Than the Textbook
&lt;/h2&gt;

&lt;p&gt;The textbook version of a server looks clean:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Accept connection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Read request&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process it&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write response&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Close The real version? Reads are partial. Clients disconnect mid-write. Malformed requests arrive constantly. Keep-alive connections blur the line between "done" and "waiting."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even tiny decisions matter. Adding &lt;code&gt;SO_REUSEADDR&lt;/code&gt; — one line — prevents restart failures caused by sockets stuck in &lt;code&gt;TIME_WAIT&lt;/code&gt;. The details add up fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  HTTP Parsing: The First Humbling Moment
&lt;/h2&gt;

&lt;p&gt;My first assumption was wrong immediately:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;One&lt;/em&gt; &lt;code&gt;read()&lt;/code&gt; &lt;em&gt;call = one complete HTTP request.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Almost never true.&lt;/p&gt;

&lt;p&gt;What actually works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Accumulate incoming bytes in a buffer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scan for &lt;code&gt;\r\n\r\n&lt;/code&gt; (the end of headers)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Only &lt;em&gt;then&lt;/em&gt; parse the headers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;Content-Length&lt;/code&gt; to know how much body to expect And you need guardrails:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cap header size (16KB is common)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cap body size&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reject malformed requests early These defensive checks improved stability more than any performance optimization I made. Correctness has to come before speed.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Concurrency Problem (Where It Gets Interesting)
&lt;/h2&gt;

&lt;p&gt;My first concurrency model was simple: a &lt;strong&gt;thread pool with blocking I/O&lt;/strong&gt;. Each thread picks up a connection and handles it start to finish.&lt;/p&gt;

&lt;p&gt;This works great — until it doesn't.&lt;/p&gt;

&lt;p&gt;The breaking point: threads block while waiting for slow or idle clients. With enough connections, every thread is just &lt;em&gt;waiting&lt;/em&gt;. New requests queue up. Latency climbs. Throughput flatlines.&lt;/p&gt;

&lt;p&gt;That's when I started benchmarking seriously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarking: Watching Bottlenecks Move
&lt;/h2&gt;

&lt;p&gt;I measured throughput (req/s), latency (avg and p99), and CPU behavior across four configurations. The question I asked at every stage:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What's the bottleneck now, and why?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Stage 1 — Baseline: ~5,000 req/s
&lt;/h3&gt;

&lt;p&gt;Throughput stayed flat no matter how many connections I threw at it. Latency shot up from 10ms to 150ms+.&lt;/p&gt;

&lt;p&gt;This is textbook queueing saturation — like a single checkout lane with a growing line. The system was fully occupied. More load just meant more waiting, not more work done.&lt;/p&gt;

&lt;p&gt;The lesson: &lt;strong&gt;the architecture itself was the ceiling, not the code.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2 — Thread Pool Optimization: ~21,000 req/s
&lt;/h3&gt;

&lt;p&gt;With 4 threads handling 800 connections in parallel, throughput jumped to ~21K req/s with p99 latency around 48ms.&lt;/p&gt;

&lt;p&gt;Profiling with &lt;code&gt;perf&lt;/code&gt; showed heavy time in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Syscalls (&lt;code&gt;write&lt;/code&gt;, &lt;code&gt;do_syscall_64&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TCP stack functions (&lt;code&gt;tcp_sendmsg&lt;/code&gt;, &lt;code&gt;ip_output&lt;/code&gt;) That's a &lt;em&gt;good&lt;/em&gt; sign. The bottleneck moved from application logic to the kernel's networking stack.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stage 3 — Epoll: ~43,000 req/s
&lt;/h3&gt;

&lt;p&gt;Switching to &lt;code&gt;epoll&lt;/code&gt; roughly doubled throughput again.&lt;/p&gt;

&lt;p&gt;The old model scanned all connections to find active ones — O(N) work even for idle sockets. Epoll flips this: the kernel tells you which sockets are ready. You only touch active connections.&lt;/p&gt;

&lt;p&gt;Epoll isn't an optimization. It's a different cost model entirely. Without it, high connection counts just waste CPU on sockets that aren't doing anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4 — Epoll + Threads: ~57,000 req/s
&lt;/h3&gt;

&lt;p&gt;Combining event-driven I/O with parallel execution got close to NGINX territory. Workers stayed fully utilized. Latency held steady under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Does This Compare to NGINX?
&lt;/h2&gt;

&lt;p&gt;NGINX clocked in around &lt;strong&gt;60,000 req/s&lt;/strong&gt; — slightly better, with lower average latency.&lt;/p&gt;

&lt;p&gt;But not because of magic. The gap comes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Minimal userspace overhead&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Highly tuned event loops&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Efficient buffering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fewer syscalls per request The key realization: &lt;strong&gt;the gap isn't conceptual. It's maturity.&lt;/strong&gt; The architecture is similar. NGINX just has years of refinement on top.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Pattern That Surprised Me
&lt;/h2&gt;

&lt;p&gt;Looking back at all four stages, the same thing kept happening: as throughput improved, the bottleneck moved &lt;em&gt;downward&lt;/em&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Start with an architectural ceiling (queueing)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fix concurrency, hit kernel I/O limits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optimize I/O, hit kernel networking costs That progression — bottlenecks migrating from your code toward the kernel — is exactly what you want to see. It means you've eliminated most of what's in your control.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What I'd Do Next
&lt;/h2&gt;

&lt;p&gt;If I continued this project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Fully event-driven model (no blocking anywhere)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better HTTP compliance (chunked encoding, more header handling)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep-alive connection tuning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Response and file caching&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Built-in metrics and tracing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Real Takeaway
&lt;/h2&gt;

&lt;p&gt;This started as "build a web server."&lt;/p&gt;

&lt;p&gt;It ended as: &lt;strong&gt;learn to read where performance goes by watching it move.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Frameworks are great. But rebuilding the abstractions they hide is one of the best ways to understand what they're actually doing — and what it costs.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>cpp</category>
      <category>c</category>
      <category>http</category>
    </item>
  </channel>
</rss>
