Douxx

Posted on Mar 16

Building a Web Server from Scratch (No, Actually)

#webdev #assembly #tutorial #programming

When I say from scratch, I mean it. No frameworks, no node_modules taking 500MB of disk space, no runtime. Just you, and your Linux kernel.

A Bit of Context

Exactly one week ago, I was in my NoSQL class, and got bored, like, really. And what does a sane person do when they're bored? Certainly not learn assembly. But that's what I did.

To be honest, the idea had been running on my mind for some time already. So I said to myself that it could be more interesting than reading papers about MongoDB, and looked for a guide.

I directly found this guide from Alex Kuleshov, and started reading. That afternoon, I read about 3 posts instead of listening to my teacher, and then went home.

Since I didn't want to digest more theory that day, I decided to do some practice. You learn more from random segfaults than from pages of theory.

The guide didn't cover exercises + answers, so I decided to use the thing that will probably steal my job in a few years: Claude. Even if it can't (yet) write good assembly code, it can create "good" exercises and correct them. So I spent the evening doing that.

The next day, I continued the course and read the final chapters. After that, I felt like I knew enough things but had clearly not enough practice.

And damn, I was so right.

I decided to create an HTTP client to train. Basically curl, but with no other feature than get-ing pages. It was a horror. Every time I took a step forward, I took three steps back due to code that stopped working, mostly because of those damn CPU registers >:[

Well, after a bit of practice, I got something working:

One day passes, we're now Tuesday, 10AM. My next project was pretty obvious: a web server. What's the point of having a web client without one?

So in the rest of this article, I'll explain how I built NASMServer, the 95% NetWide assembly web server that runs douxx.tech.

Quick note: I won't talk assembly in this article. It would require you, the reader, to have knowledge about it, and it isn't needed.

Ok, let's start!

The Basics

This article covers only x86_64 Linux. Any other OS or architecture would have different instructions.

I'll try to avoid talking directly in assembly, but I'll regularly add links to the relevant parts on the GitHub repo. You don't need assembly knowledge, but you might need some about Linux and programming in general.

Two things to keep in mind before we continue:

In Linux, everything is a file (Dev.to article)
We talk to the Linux kernel using System Calls, the bridges between our application and the hardware.

Here's a system call in C:

#include <unistd.h>
#include <sys/syscall.h>

int main() {
    const char *msg = "Hello, world!\n";
    syscall(SYS_write, 1, msg, 14);  // fd=1, buffer, length
}

And here's one in NASM:

_start:
    mov rax, 1    ; syscall number for write
    mov rdi, 1    ; fd = 1 (stdout)
    mov rsi, msg  ; buffer
    mov rdx, len  ; length
    syscall       ; call kernel

System calls will be the only thing we use for I/O, so make sure you're comfortable with them. Here's the full Linux x86_64 syscalls table for reference.

The Logic

Before writing a single line, you need to plan what the program will do and leave the how to your future self. Here's what I planned:

☐ Listen to a port
☐ Wait for requests, and accept them
☐ Read the content
☐ Parse the HTTP request
☐ Read the requested file
☐ Send a HTTP response back, with the file content

Listen To a Port

The first thing we need is something clients can connect to and "talk with us": a TCP Socket. It's, well, a file, and it's basically the way the client says "I'm here, and I want to talk to X application".

[-> program.asm]

Creating the socket alone isn't enough though. It exists, it can do its job, but it isn't accessible to anyone yet. We need to bind it to a port and an interface.

The interface is one of the IP addresses available to the system: 127.0.0.1, 192.168.x.x, etc. To simplify our lives, we'll use 0.0.0.0, "listen on every interface". The port is a value between 1 and 65535, and HTTP usually lives on 80.

We give the kernel the socket file descriptor and the interface + port to bind to. It either returns 0 (done), or a negative error code, usually meaning the port is already in use on the given interface, or we don't have enough permissions (< 1024 ports require root).

Finally, we tell the kernel we're ready to listen with the listen syscall.
[-> program.asm]

To summarize:

Create a socket: socket syscall
Bind it: bind syscall
Start listening: listen syscall

And just like that, we're reachable on 0.0.0.0:80!

☑ ~~Listen to a port~~

Wait For Requests, And Accept Them

This is where the main loop lives:

```plain text
[Wait for a request] --> [Accept it] --> [Handle it (explained later)] --> |
^------------------------------------------------------------------+




The [`accept`](https://manpages.debian.org/unstable/manpages-dev/accept.2.en.html) syscall handles both waiting (blocking) and accepting in one shot. And guess what it returns? A file!!
[[-> program.asm]](https://github.com/douxxtech/nasmserver/blob/0f7cab0cbe27963e078fb7257371919416c107b9/program.asm#L142-L156)

That file is the private space where we and the client will talk to each other.

- ☑ ~~Wait for requests, and accept them~~

## Read The Client Request

The "private space" file contains the request the client wrote. Reading it is easy: use the [`read`](https://manpages.debian.org/unstable/manpages-dev/read.2.en.html) syscall and dump it into a buffer.

[[-> program.asm]](https://github.com/douxxtech/nasmserver/blob/0f7cab0cbe27963e078fb7257371919416c107b9/program.asm#L222)
[[-> fileutils.asm]](https://github.com/douxxtech/nasmserver/blob/0f7cab0cbe27963e078fb7257371919416c107b9/macros/fileutils.asm#L113-L121)

Then we check if it's a valid HTTP request. If not, we send back a [400 Bad Request](https://developer.mozilla.org/de/docs/Web/HTTP/Reference/Status/400). A very minimal valid request looks like:



```plaintext
GET / HTTP/1.0
\r\n

Which breaks down to:

<METHOD> <path> <HTTP_VERSION>
<crlf>

As a static server, we only handle GET, and anything else gets a 405 Method Not Allowed. If the method is valid, we parse the path and append it to the document root (e.g. /var/www/html), which is the directory we'll be serving files from.

One important thing: path traversal prevention. In Linux, .. means "go to the previous directory", so a path like /../../../opt/sensitive/passwords.txt appended to /var/www/html would resolve to /opt/sensitive/passwords.txt. Not great. We simply check for any .. in the path and drop the request with a 403 Forbidden if we find one.

[-> program.asm]
[-> httputils.asm]

☑ ~~Read the content~~
☑ ~~Parse the HTTP request~~

Read The Requested File

We have a safe path, now let's actually get the file. A couple of things to handle first.

If the client requested /, we'd end up with /var/www/html/, figure out it's a directory, and go crazy. So we internally append an index file (e.g. /index.html) to the path (no redirecting the client, I see you bad programs). This works for subdirectories too: /home/ becomes /home/index.html.

"But what about directories that don't end with /?". Fair point, and we'll get there. For now, let's move on.

We use the stat syscall to check if the file exists and what type it is:

Doesn't exist → 404 Not Found
It's a directory → the trailing slash was missing, add it and loop back to the index-appending step
It's a regular file but not readable → 403 Forbidden
Otherwise → continue!

[-> program.asm]
[-> fileutils.asm]

☑ ~~Read the requested file~~

Send The Response

All edge cases handled, time to actually send something. We write to the "private space" file, starting with the HTTP header:

HTTP/1.0 200 OK
Server: NASMServer/1.0
Content-Type: text/html
Content-Length: 1442
Connection: close

[file content]

Breaking it down:

HTTP/1.0 200 OK: static string, HTTP version + status code
Server: NASMServer/1.0: not required, but nice to have
Content-Type: text/html: tells the client what it's receiving, must follow Media Types format
Content-Length: 1442: byte count of the response, grabbed from stat
Connection: close: we won't keep the connection alive after sending
\r\n: blank line separating header from body. HTTP uses CRLF

We write the header with write, send the file content with sendfile (no manual copying needed), then close up with:

shutdown: tell the client we're done
close: close the connection

Then jump back to waiting. :D

[-> program.asm]

☑ ~~Send a HTTP response back, with the file content~~

And just like that, we have a working HTTP 1.0 static file server!!

And Now?

I lied, but not entirely. This works, but it wouldn't survive being spammed. There's no proper per-request handling, so a request coming in while another is being processed will either be queued or dropped.

The fix is to fork the process on each request, and the main process immediately goes back to waiting while the clone handles it. I won't go into detail here, but the code is there if you want to look!

Other improvements are possible too, but this post only covers the basics. If you're interested, consider reading, starring, or contributing!
Github:douxxtech/nasmserver

The logic explanation ends here, feel free to leave now. Otherwise, let's talk numbers.

How Fast Is It?

Three servers, three environments, same file, no TLS:

NASMServer: fully built in assembly
BusyBox HTTPD: a really small HTTP server
Apache2: one of the most used web servers

Speed measured with cURL:

curl -o /dev/null -s -w "
DNS: %{time_namelookup}s
Connect: %{time_connect}s
TLS: %{time_appconnect}s
Start Transfer: %{time_starttransfer}s
Total: %{time_total}s
\n" http://localhost

Each command is run 10 times, results are averaged.

Environments:

localhost: staying on the machine
Windows <> WSL: servers running in Fedora WSL, testing the virtual interface
Local network: fetching over LAN

Results

Server	Localhost	Windows Host	Network	Average
BusyBox HTTPD	0.0004677s	0.0075919s	0.0038408s	0.0039668s
NASMServer	0.0005997s	0.0082924s	0.0076072s	0.0054998s
Apache2	0.0004769s	0.0102861s	0.0062916s	0.0056849s

BusyBox HTTPD wins across the board. NASMServer holds its own on localhost but falls behind on the network. Apache2 is the slowest on the Windows host by a noticeable margin, which makes sense given its heavier feature set.

NASMServer and Apache2 being slower over WSL than over LAN is likely due to WSL's virtual network interface adding overhead that a direct LAN connection doesn't have. Not 100% sure on that though.

The Final Words

I really loved building this project, writing this article, and learning assembly. I'll keep updating the server, so if you have feature ideas, bug reports, etc. feel free to reach out via GitHub issues, the dev.to comments, or mail!

Would I recommend NASMServer in production? For god's sake, NO!
Did I do it? Maybe.
Will I regret it? Surely.

But remember, I started this because I was bored in a NoSQL class.

DEV Community