I accidentally DDoS'd myself (and why you should too)

#devops #webdev #testing #startup

I launched my SaaS, Crate, just 72 hours ago. It’s an API Gateway designed to protect backends from traffic spikes and anomalies.

Last night, I became my own first victim.

I was working late on a new feature and needed to verify a config change. I pulled up my admin dashboard, not realizing it was the production tab and my auth token had expired hours ago.

What happened next was a classic "Frontend vs. Backend" failure that I didn't see coming.

The "Retry Storm"
I assumed my frontend was smart. I assumed that if it hit a 401 Unauthorized error, it would gracefully redirect me to the login page.

I was wrong.

Instead, a race condition in my data-fetching logic caused the dashboard to attempt to "refresh" the view immediately upon failure. Because the token was still invalid, the refresh failed again. And again.

In less than 30 seconds, my laptop rapid-fired requests to my production API, all returning 401.

If this had been a user, they would have just seen a spinning loading wheel. But because I was "Dogfooding" my own tool, I got to see the backend consequence.

The Watchdog Barks
I built Crate using Go and Redis specifically to catch this kind of noise with zero latency. I set up a "General 400" rule to track client-side errors, assuming it would mostly catch bad form submissions or broken links.

I didn't expect it to catch me.

Less than a minute after opening my dashboard, my phone buzzed.

Subject: Alert: High Volume of 4xx Errors Trigger: > 10 errors in 1 minute

The alerting engine had detected the anomaly—a 1200% spike in error rates from a single IP—and fired the notification.

Why this matters... (The Dogfooding Argument)
If I hadn't been using my own tool to build my own tool, two things would have happened:

The Bug would have shipped: I would have deployed that frontend logic to customers. A user with an expired token would have silently hammered my API in the background, wasting their bandwidth and my server resources.

I wouldn't trust the Alerting: Seeing that email hit my inbox instantly gave me more confidence in the system than 100 unit tests ever could. Even more so when I saw the resolution email 2 minutes later letting me know the spike was over.

The Fix
I patched the frontend before breakfast. I added a strict check in the API interceptor to halt all outgoing requests immediately upon receiving a 401 and force a hard redirect to /login.

The graph went flat. The alerts stopped.

The Takeaway
We talk a lot about "Chaos Engineering" and complex testing frameworks. But sometimes, the best test is just being tired, having an expired token, and trying to use your own product on a Monday night.

If you aren't dogfooding your own critical path, you aren't seeing the sharp edges.

I'm building Crate in public. If you want to stop your own "Retry Storms" from running up your serverless bill, check it out at Crate.cc.