Nayan Rathod

Posted on Apr 22

I benchmarked my compiled language against Node.js, Go, and Python. 1.34 million requests per second is not a typo.

#rust #programming #showdev #webdev

Last week, I published an article about building Doolang, a compiled language I made specifically to eliminate API boilerplate. At the end, I dropped a number: 1.34M RPS.

I got the same question in at least a dozen DMs: "Okay but where's the actual proof?"

Fair point. Throwing out a benchmark number in a sentence and moving on is exactly the kind of thing that deserves skepticism. So I set up a proper environment, ran it against real comparisons, and I'm going to show you the full picture, including the parts that didn't look great.

What Doolang is (60-second version)

If you didn't read the first article, here's the short version: Doolang is a compiled, statically typed language I built in Rust with an LLVM backend. You define a data schema, the compiler generates your REST endpoints, auth, validation, and rate limiting. No garbage collector, no JIT, no interpreted layer. No magic strings. Just a native binary.

DooCloud is the deployment layer I built on top. Schema to live API in one click. But this article is purely about the HTTP layer performance of that compiled binary.

The benchmark setup

Tool: wrk

wrk -t10 -c900 -d30s http://localhost:PORT/endpoint

10 threads, 900 concurrent connections, 30-second duration
Both plain text and JSON response endpoints
All servers running locally, no network latency
Each server cold-started fresh before each run
Results averaged across 3 runs

All scripts are reproducible and publicly available: github.com/nynrathod/doo-benchmark

What I tested against:

Doolang - native compiled binary, LLVM backend
Go - net/http standard library, no framework
Node.js - Fastify (the fastest mainstream Node HTTP framework)
Python - FastAPI + uvicorn

Plain text results

Stack	RPS	Avg Latency
Doolang	1,344,989	0.71ms
Go (net/http)	759,812	1.72ms
Fastify (Node.js)	48,457	15.84ms
FastAPI (Python)	4,268	201.17ms

JSON results

Stack	RPS	Avg Latency
Doolang	1,294,430	0.66ms
Go (net/http)	572,000	2.26ms
Fastify (Node.js)	46,291	21.81ms
FastAPI (Python)	4,406	196.01ms

What explains this gap

The gap isn't configuration. It's structural.

Doolang's HTTP layer is built on hyper-rs and tokio, Rust's async runtime. There's no garbage collector, no JIT, and no interpreted layer. But it's not 'zero runtime', tokio is a runtime. What it is: a minimal, compiled, zero-GC stack with no framework overhead on top.

Go is fast. Node.js with Fastify is fast for JavaScript. But they both have runtimes. Doolang doesn't. Every framework and managed language is paying a tax at the OS boundary that a native binary doesn't.

Go's net/http can be pushed higher with tuning. I've seen 250K+ out of it. The gap would still be around 7x. That ratio doesn't move much because the constraint is architectural, not configurational.

The honest caveats - I'd rather say these than have someone else say them

1. This is a micro-benchmark. Plain text and simple JSON, no database calls, no business logic, no auth middleware processing. In a real application, your bottleneck is almost always your database or I/O, not raw HTTP throughput. If you're waiting 20ms on Postgres, a 10x faster HTTP layer saves you 0.29ms.

2. Local-only. No network hops, no SSL, no load balancer. Production adds latency. The absolute numbers in production will be lower; the ratios remain similar.

3. I built Doolang. I am not a neutral party. I ran everything the same way, same wrk setup, same endpoint behavior, no tuning tricks on any single server. But you should run it yourself. The doo-benchmark repo has everything. If you find a flaw in the methodology, I want to know.

Why this matters beyond the headline number

I'm building DooCloud for early-stage product teams, mostly founders building AI-backed products who need a backend layer that doesn't need a DevOps hire. The performance story is relevant here for a non-obvious reason.

Most early products don't hit performance walls on HTTP throughput. But having headroom means:

You start on a smaller, cheaper server and stay there longer
Your API layer never becomes the bottleneck, so your Python AI backend becomes the optimization target (which is correct)
You scale vertically for longer before needing to add nodes

The numbers aren't a flex. They're a cost reduction argument for the first 6-18 months of a product's life. A 7x faster API layer on half the compute costs real money when you're pre-funded.

What I haven't benchmarked yet

WebSocket throughput under concurrent connections
Mixed load (concurrent complex and simple requests)
Memory footprint under sustained load over hours

These are in progress. I'll post the results when they're done.

Run it yourself

The doo-benchmark repo has wrk scripts for everything above. Clone it, run it on your machine, and tell me what numbers you get.

If my methodology is wrong, put it in the comments. I mean that.

Doolang compiler (open source): github.com/nynrathod/doolang
Schema to live API: doocloud.dev

DEV Community