Ethan

Posted on Oct 10

Building an HTTP Server from Scratch in C: A Journey into Network Programming

#programming #security #c #webdev

When you type a URL into your browser and hit enter, a complex dance of network protocols springs into action. But how many developers actually understand what's happening under the hood? I decided to find out by building a bare-metal HTTP/1.1 server in C—no frameworks, no libraries, just sockets and the HTTP specification.

Why Build This?

I had three main motivations for this project:

Learning by doing. Reading about TCP sockets and HTTP is one thing; implementing them yourself is entirely different. You discover edge cases the documentation never mentions and develop an intuition for how these fundamental protocols actually work.

Preparing for security work. My ultimate goal is to add TLS encryption and build security testing tools (scanner, fuzzer, analyzer) on top of this server. You can't secure what you don't understand, and understanding starts at the lowest level.

The satisfaction of first principles. There's something deeply satisfying about writing socket(), bind(), and listen() yourself and watching your browser successfully connect to your creation.

What I Built

The server currently supports:

HTTP/1.1 protocol with proper socket programming
GET requests for static file serving
POST requests with form data parsing
Multiple route support (/, /about, /submit)
Proper HTTP status codes (200, 404, 400, 405, 500)

The server listens on port 8080 and handles one connection at a time (yes, I know—more on that later).

The Technical Deep Dive

Setting Up the Socket

The foundation of any network server is the socket. Here's the basic flow I implemented:

Create address info structure using getaddrinfo() to handle both IPv4 and IPv6
Create the socket with socket()
Set socket options with SO_REUSEADDR to avoid "Address already in use" errors during development
Bind to a port so the OS knows which port belongs to our server
Listen for connections with a backlog of 1 (handling one connection at a time)

The beauty of using getaddrinfo() is that it makes the code protocol-agnostic. The same code works for IPv4 and IPv6 without modification.

Parsing HTTP Requests

Once a connection arrives via accept(), I receive the raw HTTP request into a buffer:

bytes_received = recv(new_fd, buffer, sizeof(buffer) - 1, 0);
buffer[bytes_received] = '\0';

The request comes in as plain text that looks like this:

GET /about HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0...

I parse the request line using sscanf() to extract the method, path, and HTTP version. Simple, but effective for this learning project.

Serving Static Files

For GET requests, the server maps URLs to files and serves them. The routing logic is straightforward—special paths like / map to index.html, while other paths are treated as literal filenames.

The file serving function handles several scenarios:

Success case (200 OK): Read the file, calculate its size, send proper headers with Content-Length, then send the file content.

File not found (404): Return an HTML error page explaining the file doesn't exist.

Memory allocation failure (500): Return an internal server error if we can't allocate memory for the file content.

One interesting challenge here was getting the Content-Length header right. Browsers need to know how many bytes to expect, so I use ftell() after seeking to the end of the file to get the exact size.

Handling POST Requests

POST request handling is significantly more complex than GET. Here's why:

The HTTP request body might not arrive all at once. The headers come first, terminated by \r\n\r\n, and then the body follows. The body might be small enough to arrive in the same TCP packet, or it might require multiple recv() calls.

My approach:

Extract Content-Length from the headers to know how many body bytes to expect
Find the body start by locating the \r\n\r\n delimiter
Calculate partial body bytes already received in the initial buffer
Loop and receive additional data until we have the full content length
Parse the form data (key=value pairs separated by &)
Write to output.txt for persistence

Here's the loop that ensures we get the complete body:

while(body_bytes_received < content_length) {
    ssize_t new_bytes = recv(new_fd, full_body + body_bytes_received, 
                            content_length - body_bytes_received, 0);
    if (new_bytes > 0) {
        body_bytes_received += new_bytes;
    }
}

This was crucial to learn—network programming is fundamentally asynchronous, and you can't assume data arrives all at once.

Challenges and Lessons Learned

Memory Management in C is Unforgiving

Coming from higher-level languages, C's manual memory management was a wake-up call. Every malloc() needs a corresponding free(), and forgetting one creates a memory leak. I use Valgrind during development to catch these issues.

For the file content and POST body, I dynamically allocate based on the size:

char *file_content = malloc(size + 1);  // +1 for null terminator
// ... use it ...
free(file_content);  // must not forget!

Buffer Overflows Are Real

With a fixed BUFFER_SIZE of 8192 bytes, I had to be careful about buffer overflows. Using sizeof(buffer) - 1 when receiving data and properly null-terminating strings became second nature. One typo using strcpy() instead of strncpy() could lead to security vulnerabilities.

HTTP Parsing Edge Cases

The HTTP specification has many edge cases I didn't initially consider:

What if there's no Content-Length header in a POST request?
What if the request is malformed with no \r\n\r\n delimiter?
What if the Content-Length value isn't a valid integer?

Each of these required adding defensive checks and returning proper 400 Bad Request responses.

The Debug Printf Dilemma

You'll notice the code has many printf() statements with "DEBUG:" prefixes. During development, these were invaluable for understanding the flow of data. In a production server, these would be replaced with a proper logging framework, but for learning purposes, seeing the raw data flow in the terminal was incredibly educational.

Concurrency (or Lack Thereof)

The biggest limitation of my current implementation is that it handles only one connection at a time with BACKLOG 1. While one client is being served, others must wait. The traditional solutions are:

Fork a new process for each connection (the Apache model)
Use threads with pthread
Implement non-blocking I/O with select/poll/epoll (the nginx model)

I plan to explore these approaches as the project evolves.

What's Next: Security Focus

The natural next step is adding TLS/SSL encryption, but I'm taking this further by building a security testing suite:

TLS 1.2/1.3 Implementation: Understanding encryption protocols from the ground up using OpenSSL or mbedTLS.

Security Scanner: A tool to scan web servers for common vulnerabilities (missing headers, outdated protocols, weak ciphers).

HTTP Fuzzer: Automated testing that sends malformed requests to find parsing bugs and potential crashes.

Vulnerability Analysis: Documentation of common web server vulnerabilities and how to prevent them.

The goal is to understand not just how to build a server, but how attackers might exploit one and how to defend against those attacks.

Key Takeaways

Building this HTTP server taught me more about networking in a week than years of using high-level frameworks:

Abstractions hide complexity. Express.js and Flask make web development easy, but they obscure the fundamental protocols. Understanding the foundation makes you a better developer at every level.

C demands precision. Every byte matters. Every pointer must be valid. Every buffer must be sized correctly. This discipline translates to better code in any language.

Network programming is asynchronous by nature. You can't assume data arrives all at once or in the order you expect.

Security starts at the design level. Every parsing decision, every buffer allocation, every user input is a potential vulnerability if not handled carefully.

Try It Yourself

If you're interested in learning network programming, I highly recommend building your own server. Start simple with a hello-world HTTP response, then gradually add features. The Beej's Guide to Network Programming is an excellent resource for learning socket programming in C.

The complete source code for this project is available on my GitHub, and I'll be documenting the security additions in future posts.

What low-level project are you working on? I'd love to hear about your experience in the comments below.

This is part of my journey into network security. Follow along as I add TLS encryption and build security testing tools on top of this foundation.

Top comments (2)

Youssef El Idrissi • Oct 11 • Edited

This was fun to read.

You picking a web server as a project is frankly a strategic choice because it's an intersection of multiple IT/CS fields, such as cryptography, OS, low-level programming even some networking.

I have a couple of notes to share that you might find a little bit useful.

In terms of Content-Length, there is a Header that isn't that famous called "Transfer-Encoding: chunked":

It could be used in an HTTP Request (by Clients) to send an unknown amount of data (aka no Content-Length is specified) - But this is very very rare.
It could be used in an HTTP Response (by Servers) to send an unknown amount of data, same thing as above but this one is used in SSE (Server-Sent Events)
"Partial-Encoding: chunked" is exactly what can make it possible to send real-time updates without using something like WebSockets, the downside is that it's unidirectional (because HTTP is a Request/Response protocol at the end of the day)

2- for TLS/SSL stuff, if you're that serious about it, I suggest that you Research PKI (Public Key Infrastructure), Digital Signatures, Hashing, Asymmetric and Symmetric encryption (at least just try to understand the concept behind them) and last but not least Digital Certificates (X.509), TLS is a combination of the technologies that I've mentioned, I know it's a lot but if you're taking a security approach in the future then I think it's pretty worth to know all these, they're very foundational. You could use a Note taking tool like "Obsidian" to logically structure the things you learn so that it doesn't become rote learning yk

3- If you're someone that loves to visualize the concepts that you learn or the systems that you build, then I would highly recommend using diagram tools such as Draw.io or excalidraw, I use draw.io myself, here's just an example of a diagram I made about an HTTP Header called "Connection", it has something to do with how TCP Connections are made and closed/remain opened:

It's not too pretty but you get the point, visuals/diagrams are the thing that could make you even more proud of any behind-the-scenes (backend) code that you develop, otherwise It's just text yk

I hope the yapping was not too excessive (I know it was), but yeah, I wish you good luck man

Ethan • Oct 12

Thank you so much for the insightful comment, and I'm glad you enjoyed the post! I'll read more into the HTTP chunked transfer encoding, as well as the TLS topics you mentioned.

Draw.io is such a cool tool! I'll have to use it for when I post about my TLS integration.