gudhalarya

Posted on Jun 10

I Built My Own API Gateway in Rust — Here's What I Learned

#backend #rust #apigateway #systemdesign

Every backend project I've worked on eventually hits the same wall.

You start clean — one service, simple routes, everything works. Then slowly the requirements creep in. "We need rate limiting." "Can we add auth middleware?" "What happens when the user service goes down — does it take everything else with it?"

You either bolt these things onto every service individually, copy-paste the same middleware across projects, or pay for a managed gateway like Kong or AWS API Gateway and hope it does what you need.

I wanted to actually understand how these things work under the hood. So I'm building one — and this is what I've learned so far.

What is Ferrox?

Ferrox is a self-hosted, programmable API gateway written entirely in Rust. It sits in front of your backend services and handles everything a production system needs in one place:

Dynamic routing — point any path prefix to any upstream service
Authentication — JWT and API key validation on protected routes
Rate limiting — Redis-backed per-IP and per-API-key limiting
Circuit breaking — stops hammering a dead upstream service
Response caching — Redis-backed TTL cache per route
Real-time observability — WebSocket dashboard with live request stats
Prometheus metrics — plug straight into Grafana

The idea is simple. Instead of this:

Client → Service A (has its own auth, rate limiting, logging)
Client → Service B (has its own auth, rate limiting, logging)
Client → Service C (has its own auth, rate limiting, logging)

You get this:

Client
  |
  v
FERROX  (auth, rate limiting, circuit breaking, logging — once)
  |
  +--------+--------+
  |        |        |
Svc A    Svc B    Svc C
(clean)  (clean)  (clean)

Your services stay clean. Ferrox handles the cross-cutting concerns.

Why Rust?

Honest answer — I already knew Rust from my backend work. But for a gateway specifically, it felt like the obvious choice.

A gateway sits on the critical path of every single request. Every millisecond of latency it adds is latency your users feel. You need predictable performance — not "usually fast with occasional GC pauses". Rust gives you that. No garbage collector, no runtime overhead, memory safety without paying for it at runtime.

Actix-Web — which I used for the HTTP server — consistently ranks at the top of the TechEmpower benchmarks. Not because Rust magic, but because zero-cost abstractions mean you're not paying for things you don't use.

There's also something deeply satisfying about building systems infrastructure in a systems language. It just fits.

The Interesting Parts

I'm not going to walk through everything. Here are the three things I found most interesting to build.

1. The Circuit Breaker

I'd heard the term a hundred times. Never actually implemented one. Turns out it's elegant.

The circuit has three states:

CLOSED → normal, all requests go through
OPEN   → upstream is broken, reject immediately, don't even try
HALF_OPEN → timeout passed, let one request through to test

In Rust, this becomes a clean enum with a state machine:

#[derive(Debug, Clone, PartialEq)]
pub enum CircuitState {
    Closed,
    Open,
    HalfOpen,
}

The Rust type system made transitions feel natural. The compiler wouldn't let me handle an impossible state. If a variant doesn't carry data, you can't accidentally access data that isn't there. The logic just fell into place.

The rule is: 5 consecutive failures opens the circuit. After 30 seconds it moves to HalfOpen. One success closes it again. Simple. Effective.

2. Redis Rate Limiting

I expected this to be complex. It's not. It's almost embarrassingly simple once you see it.

For every request, you run two Redis commands:

INCR  rl:{route_id}:{client_ip}
EXPIRE rl:{route_id}:{client_ip}  60

INCR creates the key if it doesn't exist and increments it. EXPIRE sets a 60-second TTL — but only on the first request (when the count hits 1). After 60 seconds the key disappears and the window resets.

If the counter exceeds your limit, return 429. That's it. The whole rate limiter is about 15 lines of Rust. The insight is that Redis atomic operations do all the hard work for you.

What makes it production-ready is that it works across multiple Ferrox instances — since the counter lives in Redis, not in-process memory, horizontal scaling just works.

3. Middleware Order

This one isn't glamorous but it's where I made the most mistakes.

The order your middleware runs matters enormously. Get it wrong and you log requests that never passed auth, or you hit the database before checking the rate limit. Ferrox runs middleware in this exact order:

1. Request Logger   — assign ID, log start
2. Rate Limiter     — check Redis counter (fail fast, no DB hit)
3. Auth Validator   — JWT / API key check
4. Route Matcher    — find upstream in PostgreSQL
5. Circuit Breaker  — is upstream healthy?
6. Cache Lookup     — return early if cached
7. Request Transform — add/strip headers
8. Proxy Forward    — actually send to upstream
9. Cache Store      — cache response if TTL > 0
10. Metrics + Log   — record everything

Rate limiting before auth is intentional. An unauthenticated attacker hammering your gateway shouldn't even reach your auth logic. Fail as early and as cheaply as possible.

The Stack

For anyone curious:

Rust — core language
Actix-Web — HTTP server and middleware
reqwest — HTTP client for forwarding requests upstream
SQLx + PostgreSQL — route config, users, API keys, request logs
Redis (deadpool-redis) — rate limiting, response caching
thiserror — clean custom error types
tracing — structured logging
Prometheus crate — metrics endpoint
Docker + Nginx — production deployment on AWS EC2

What I'm Learning Along the Way

A few honest takeaways after building this:

The hard part wasn't the code. It was design decisions. What order should middleware run? Should circuit breaker state live in Redis or in-memory? When do you return a cached response vs always hitting upstream? These questions have real tradeoffs and no single right answer.

thiserror is underrated. I was writing verbose manual error implementations before someone pointed me to it. Now my entire error module is about 20 lines and the compiler handles the rest. If you're writing Rust and not using thiserror, stop what you're doing.

Structured logging from day one. I added tracing and tracing-actix-web early. Every request automatically gets a request ID, timing, status code. When something breaks — and something always breaks — having logs that actually tell you what happened is the difference between a 5-minute fix and a 2-hour debug session.

Building something real teaches you more than tutorials. I've read about circuit breakers, rate limiting, and API gateways many times. I understood maybe 60% of it. Building them has taken that to 95%. There's no substitute.

Current State & What's Next

Ferrox is a work in progress — the core proxy, JWT auth, Redis rate limiting, circuit breaking, and request logging are all functional and running locally. Currently working on getting it deployed on AWS EC2.

Still building:

WebSocket real-time dashboard (in progress)
gRPC admin API for managing routes without restarting
Full frontend dashboard in React
Load testing with k6 to get real benchmark numbers

Follow Along / Contribute

Ferrox is fully open source and actively being built.

GitHub: github.com/gudhalchauhan/ferrox

If you've built something similar, I'd genuinely love to hear how you approached the circuit breaker state or rate limiting. And if you find something broken or dumb in the code — open an issue, this is very much a learning project.

Thanks for reading. If this was useful, share it with someone who's building backend systems in Rust.

DEV Community