DEV Community

Cover image for "I Accidentally DDoS'd My Own Database (And My Boss's Reaction Was... Unexpected)"
VivekLumbhani
VivekLumbhani

Posted on

"I Accidentally DDoS'd My Own Database (And My Boss's Reaction Was... Unexpected)"

The Slack Message That Made My Heart Stop
Thursday, 2:47 PM.

I'm happily coding, headphones on, in the zone. Writing beautiful,
elegant queries. Feeling like a 10x engineer.

Then Slack lights up:

@vivek why is the production database at 100% CPU?

Then another:

@vivek the website is down

Then the one that made me want to crawl under my desk:

@vivek we're getting alerts from AWS. Database bill is at $400
for the day. Normal is $20.

I pulled up the monitoring dashboard.

CPU: 100%
Memory: 97%
IOPS: Maxed out
Active connections: 2,847

Normal active connections: ~50.

Oh no.
Oh no no no no no.

I knew exactly what I'd done.

The "Clever" Code That Broke Everything
Two hours earlier, I had deployed what I thought was an improvement.
A "smart" feature to keep our dashboard data fresh.

Here's what I wrote:

// dashboard.js - Frontend React component
useEffect(() => {
const fetchData = async () => {
const devices = await getDevices();

// Fetch latest reading for EACH device
const readings = await Promise.all(
  devices.map(device => 
    fetch(`/api/readings/${device.id}`)
  )
);

setDashboardData(readings);
Enter fullscreen mode Exit fullscreen mode

};

// Update every 5 seconds to keep data "fresh"
const interval = setInterval(fetchData, 5000);

return () => clearInterval(interval);
}, []);

Looks fine, right?

Here's what I didn't think about:

  1. We had 500 devices
  2. Each dashboard refresh = 501 API calls (1 for devices + 500 for readings)
  3. 20 users had dashboards open
  4. Every 5 seconds
  5. That's 501 × 20 = 10,020 requests every 5 seconds
  6. Or 2,004 requests per second
  7. To a database that was happy with ~10 queries per second

I had essentially written a distributed denial-of-service attack
against my own database.

But with the best intentions! 🤦‍♂️

The Panic (A Timeline)
2:47 PM - First alert
I see the Slack messages. Instant dread.

2:48 PM - Confirm it's my code
Check deployment logs. My code went live 2 hours ago.
Check monitoring. CPU spiked exactly when my deployment went live.
It's definitely me.

2:49 PM - Try to think of excuses
Maybe it's a coincidence?
Maybe someone else deployed something?
Maybe there's a sudden traffic spike?

2:50 PM - Accept responsibility
Nope, it's me. I broke production. On a Thursday afternoon.

2:51 PM - Emergency Slack
Me: "I think I know what happened. Rolling back now."
Boss: "How bad is it?"
Me: "... bad"

2:52 PM - Rollback
Git revert. Deploy. Wait.

2:55 PM - Still broken
Wait, why is it still at 100%?
Oh. Right. 20 users still have the OLD version running in their
browsers.

2:56 PM - More panic
Me: "Everyone needs to refresh their dashboards NOW"
Post in company Slack: "URGENT: Please refresh all dashboards
immediately"

2:58 PM - Slowly recovering
CPU drops to 80%... 60%... 40%... 20%... normal.

3:03 PM - Crisis over
Database back to normal. Website responding.
Heart rate still at 180 BPM.

3:05 PM - The meeting
Boss: "My office. Now."

This is it. I'm getting fired. First job out of university,
lasted 4 months.

The Boss's Reaction (Not What I Expected)
I walked into his office ready to hand over my laptop.

Boss: "So, you took down production."

Me: "Yes. I'm really sorry. I didn't think about—"

Boss: "How many queries were you making?"

Me: "About... 2,000 per second."

Boss: whistles "That's impressive, actually. Did you know our
database could even handle that many?"

Me: "... No?"

Boss: "Neither did I. Interesting stress test."

Long pause

Me: "So... am I fired?"

Boss: laughs "Fired? No. But you're going to write a postmortem.
And you're going to present it to the entire engineering team.
And you're going to make sure this never happens again."

Me: "I can do that."

Boss: "Good. Also, you're going to redesign the dashboard data
fetching. We can't have 500 individual API calls. That's insane."

Me: "Agreed."

Boss: "One more thing."

Me: bracing for impact

Boss: "Welcome to engineering. Everyone breaks production eventually.
Some people just do it more spectacularly than others. Your AWS
bill is going in the company newsletter."

He was smiling.

I walked out confused but relieved. I still had a job.

What I Did Wrong (A Technical Breakdown)
Let me break down all the mistakes, because there were MANY:

Mistake #1: N+1 Query Pattern

// BAD: N+1 queries
devices.forEach(device => {
fetch(/api/readings/${device.id}); // Separate query for each!
});

// GOOD: Single query
fetch(/api/readings?deviceIds=${deviceIds.join(',')});

Lesson: Never make individual requests for related data.
Batch them.

Mistake #2: No Rate Limiting

// BAD: Unlimited requests
setInterval(fetchData, 5000);

// GOOD: Rate limiting + debouncing
const fetchWithRateLimit = useRateLimit(fetchData, {
maxRequests: 10,
perSeconds: 1
});

Mistake #3: Aggressive Polling

Why 5 seconds? I don't know. It felt right.
Spoiler: It was not right.

// BAD: Constant polling
setInterval(fetchData, 5000);

// GOOD: Smart polling based on activity
const interval = userActive ? 30000 : 120000;

Mistake #4: No Request Deduplication

If 20 users want the same data, why make 20 separate database
queries?

// BAD: Every user gets their own query
const data = await fetchFromDB(deviceId);

// GOOD: Cache and share
const data = await cachedFetch(deviceId, { ttl: 10000 });

Mistake #5: No Error Handling

When the database started failing, my code just kept retrying.
And retrying. And retrying.

// BAD: Retry forever
while (true) {
try {
await fetch(url);
} catch {
// Try again immediately!
}
}

// GOOD: Exponential backoff
await fetchWithBackoff(url, {
maxRetries: 3,
backoff: 'exponential'
});

Mistake #6: No Monitoring/Alerts

I had no idea my code was causing problems until someone told me.

Should have had:

  • Request rate monitoring
  • Database query metrics
  • Cost anomaly alerts
  • Performance budgets

Mistake #7: No Load Testing

I tested with 1 device. Works fine!
Deployed to 500 devices. Narrator: It did not work fine.

Should have:

  • Load tested with realistic data
  • Simulated multiple concurrent users
  • Monitored resource usage during testing

The Postmortem Presentation
As promised (threatened?), I had to present this to the entire
engineering team.

I made a slide titled: "How I DDoS'd Production: A Love Story"

The team loved it. Especially the part about the $400 AWS bill.

Someone made it into a meme. It's still on our Slack.

But the best part? Three other developers privately messaged me:

"I did something similar last year"
"I once took down production with an infinite loop"
"My first week, I dropped the production database"

Turns out, breaking production is a rite of passage.

Who knew?

What I Actually Learned

  1. Everyone breaks production. It's how you respond that matters.

My boss didn't fire me because:

  • I owned the mistake immediately
  • I fixed it quickly
  • I learned from it
  • I documented it for others

Hiding mistakes or blaming others? That'll get you fired.

  1. Load testing isn't optional

Test with:

  • Realistic data volumes
  • Multiple concurrent users
  • Network issues and delays
  • What happens when things fail

"It works on my machine" is not a deployment strategy.

  1. The N+1 query problem is EVERYWHERE

Before:
for (item in items) {
database.fetch(item.id) // N queries
}

After:
database.fetch(items.map(i => i.id)) // 1 query

This pattern shows up constantly. Learn to recognize it.

  1. Caching is your friend
  • Cache expensive operations
  • Share data between users when possible
  • Invalidate intelligently
  • Set reasonable TTLs

But remember: There are only two hard things in computer science -
cache invalidation and naming things.

  1. Monitor everything

Set up alerts for:

  • Request rates (sudden spikes)
  • Database CPU/memory
  • API response times
  • Cost anomalies
  • Error rates

Find out from monitoring, not from your boss.

  1. Rate limiting protects YOU

Not just from malicious users, but from yourself:

  • Prevent runaway loops
  • Catch bugs before they scale
  • Protect your infrastructure
  • Control costs
  1. Good bosses value learning

My boss could have fired me. Instead, he:

  • Helped me fix it
  • Made it a learning opportunity
  • Created psychological safety
  • Turned a mistake into a teaching moment

I'm still at this company year later, partly because of
how he handled this.

Top comments (0)