speed engineer

Posted on Mar 30 • Originally published at Medium

The $5 Staging Environment: How Poverty Engineering Saved Our Architecture

We purposely strangled our resources to see what would break. Everything did, and it was glorious.

The $5 Staging Environment: How Poverty Engineering Saved Our Architecture

We purposely strangled our resources to see what would break. Everything did, and it was glorious.

Sometimes the best way to test your system’s resilience is to take away its toys.

There is a distinct luxury in ignoring efficiency. When you’re deploying to a managed Kubernetes cluster with autoscaling node pools, “performance optimization” usually just means “throwing more credit card debt at the problem.” If a service is slow, we scale it horizontally. If it runs out of memory, we bump the request limits. It’s easy. It’s safe. And honestly? It made us lazy.

I realized this while looking at our cloud bill. It wasn’t astronomical, but it felt… heavy. We had twelve microservices, a message broker, a distributed cache, and a log aggregator, all for an application that, at peak, served maybe 50 requests per second.

So, I had a dumb idea.

“What if,” I asked my lead dev, “we tried to run the entire staging stack on a single $5 Droplet?”

He laughed. Then he realized I wasn’t joking. Then he looked terrified.

We weren’t trying to save money on staging — that costs peanuts anyway. We wanted to see what would happen if we took our “scalable” architecture and put it in a straightjacket. We wanted to induce failure.

Constraints force clarity.

The Setup

For those uninitiated in the world of budget hosting, the “$5 box” is the standard unit of “I want to host a blog but I want to make it difficult.” It typically gets you:

1 vCPU (shared, likely stolen from you by a crypto-miner next door)
1 GB RAM
25 GB SSD
1000 GB transfer

Our stack, by contrast, was a “modern” Go-based microservices architecture. We had a User Service, an Inventory Service, a Notification Service (why?), a frontend in React served by Nginx, Postgres for data, and Redis for caching.

We wrote a Docker Compose file to spin it all up. I SSH’d into the tiny box, pulled the repo, ran docker-compose up -d, and waited.

The Crash

It didn’t even boot.

I’m serious. The command hung for forty seconds, the SSH session lagged so hard my keystrokes were arriving via carrier pigeon, and then the connection dropped.

When I finally got back in, dmesg told the sad story:

[ 124.562112] Out of memory: Kill process 1422 (postgres) score 301 or sacrifice child  
[ 124.563441] Killed process 1422 (postgres) total-vm:204800kB, anon-rss:184320kB...

The Linux OOM (Out of Memory) killer had woken up, seen the chaotic gluttony of our container stack, and decided Postgres was the first to go.

I thought the database was the bottleneck — then I saw what the profiler was actually telling me. It wasn’t Postgres eating the RAM. It was our “observability sidecar.” We had a Java-based log shipper attached to every service that consumed 250MB of RAM just to say “hello.” On a 64GB node, you don’t notice that. On a 1GB node, that’s 25% of your capacity gone before you handle a single request.

Optimization 1: The Bloat Audit

The first lesson was immediate: Dependency bloat is invisible until you can’t afford it.

We stripped the sidecars. We configured the Go services to log directly to stdout (which Docker captures anyway) and decided to just use grep for the duration of the experiment.

But even after we got the stack to boot, it was crawling.

I’m talking 2-second latency for a simple “Get User Profile” request.

The Network is Not Free (Even on Localhost)

Here is where I had to backtrack on a fundamental belief. I always assumed that if services talk to each other over localhost (or a Docker bridge network), the latency is negligible.

Technically, the network latency is low. But the serialization cost is not.

Our architecture looked like this:

Frontend $\rightarrow$ Gateway $\rightarrow$ User Service $\rightarrow$ DB

For a single request, the Gateway would accept JSON, parse it, create a new request to the User Service, serialize that to JSON, send it. The User Service would parse that JSON, talk to the DB, serialize the result to JSON, send it back. The Gateway would parse that JSON… you get the point.

On a powerful CPU, this JSON shuffling is just background noise. On a single vCPU that is also trying to run Postgres and Redis? It’s a disaster. The CPU spent more time marshalling and unmarshalling JSON than it did executing business logic.

We were burning 40% of our CPU just translating English to English.

Optimization 2: The Logical Monolith

We couldn’t rewrite the whole app. But we could cheat.

We realized that Inventory and Notification didn't need to be separate binaries. They were separate domains , sure, but they didn't need to be separate processes.

We used Go’s interface system to create a “Service Weaver” pattern (conceptually similar to Google’s Service Weaver, but dumber).

Here is the trick. We defined our service boundaries as Interfaces, not HTTP endpoints.

// The interface defines the contract  
type InventoryProvider interface {  
    CheckStock(ctx context.Context, itemID string) (int, error)  
}  

// Implementation 1: The "Real" Remote Client (for Prod)  
type RemoteInventory struct {  
    client *http.Client  
    url    string  
}  
func (r *RemoteInventory) CheckStock(ctx context.Context, id string) (int, error) {  
    // ... expensive HTTP call ...  
}  
// Implementation 2: The In-Memory Local Version (for the $5 box... and maybe Prod?)  
type LocalInventory struct {  
    db *sql.DB  
}  
func (l *LocalInventory) CheckStock(ctx context.Context, id string) (int, error) {  
    // ... direct DB call, zero network overhead ...  
    return l.queryStock(id)  
}

By injecting LocalInventory into our main API binary, we eliminated three microservices. We compiled them into one binary.

The result? Latency dropped from 2s to 200ms. We deleted the network overhead.

Why does this matter? Because in production, even if we split them up, we now have the option not to. We realized we had microservices not because we had scaling problems, but because we had organizational boundaries.

The Noisy Neighbor and the I/O Trap

Everything was running smoother, and then, randomly, the API would stall for 10 seconds. No logs, no errors, just a freeze.

I spent hours debugging application locks. I looked for mutex deadlocks in the Go code. Nothing.

Finally, I installed iotop.

When the stall happened, the disk write speed plummeted, but the “Wait” percentage skyrocketed.

We were on a shared SSD. Someone else on that physical host (probably mining crypto or serving terabytes of illicit content) was hammering the disk. Our little Postgres transaction log (WAL) couldn’t flush to disk because the I/O queue was full.

Gotcha: In the cloud, “SSD” doesn’t mean “Fast.” It means “Not Spinning Rust.”

We couldn’t fix the neighbor. But we could fix how we reacted to it. Our application code had a default SQL timeout of… well, infinite. It would just sit there waiting for the disk.

We introduced aggressive timeouts and a circuit breaker. If the DB writes were taking > 500ms, we failed fast and returned a 503 rather than holding the connection open and piling up goroutines until the web server crashed.

// Failing fast is better than hanging forever  
ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)  
defer cancel()  

err := db.QueryRowContext(ctx, "INSERT INTO logs ...").Scan(&id)  
if err != nil {  
    // Log it, drop it, move on. Don't kill the request for a log entry.  
    log.Warn("Disk is sad, skipping audit log")  
    return nil   
}

This saved us. When the noisy neighbor flared up, we dropped non-essential writes (like audit logs) and kept the read-path alive.

The 25GB Ceiling

Two days into the experiment, the server died again.

No space left on device.

We had filled 25GB. How? Docker logs.

We were logging everything. Every HTTP header, every payload, every SQL query in debug mode. In production, we piped this to an external aggregator (Datadog/Splunk/ELK) and forgot about it. On the $5 box, it wrote to the local JSON file driver.

We were logging about 1GB per hour.

We implemented a sampling middleware. This is something I honestly wouldn’t do in prod without careful thought, but for high-volume endpoints, it’s a lifesaver.

func LoggingMiddleware(next http.Handler) http.Handler {  
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {  
        // Only log 1% of successful health checks  
        if r.URL.Path == "/health" && rand.Float32() < 0.99 {  
            next.ServeHTTP(w, r)  
            return  
        }  

        // Log errors 100% of the time  
        // ... standard logging logic ...  
    })  
}

This forced us to ask: “Do we actually read these logs?” The answer was usually no. We turned off debug logging for the database driver and reduced the HTTP logs to errors and sampled successes. The disk usage stabilized at 4GB.

The graph of a system learning to live within its means.

What We Kept

We eventually moved Staging back to a slightly larger instance (a luxurious $10/month box with 2GB RAM), but the changes we made to survive on the $5 box made it to Production.

The Monolith-ish Approach: We merged the Notification and Inventory services back into the main API. It simplified deployment, testing, and debugging. We realized we didn’t need independent scaling for those components yet.
Resource Limits: We set hard memory limits on our Kubernetes pods based on the $5 box data. We realized our containers didn’t need 512MB; they ran fine on 64MB if configured correctly. This saved us about 30% on our cluster bill.
Timeout Discipline: The aggressive timeouts we added to handle the “noisy neighbor” disk latency made our production system incredibly resilient to minor network blips.

Why You Should Do This

You don’t need to run your production bank software on a Raspberry Pi. That’s irresponsible. But you should try to run it there once.

Modern cloud architecture is often a cover-up for inefficient software. We layer caching, autoscaling, and load balancers to hide the fact that our core application is wasteful.

When you remove the infinite resources, you see the cracks. You see the chatty network calls. You see the heavy dependencies. You see the sloppy I/O.

The $5 server didn’t just teach us about our architecture. It taught us humility. It reminded us that at the bottom of all the YAML files and cloud dashboards, there is just a computer, executing instructions, trying its best not to crash.

And sometimes, the kindest thing you can do for that computer is to write better code.

Enjoyed the read? Let’s stay connected!

🚀 Follow The Speed Engineer for more Rust, Go and high-performance engineering stories.
💡 Like this article? Follow for daily speed-engineering benchmarks and tactics.
⚡ Stay ahead in Rust and Go — follow for a fresh article every morning & night.

Your support means the world and helps me create more content you’ll love. ❤️