DEV Community

Sandeep Bansod
Sandeep Bansod

Posted on • Originally published at stackdevlife.com

We Deployed a "Small Fix" and Took Down Production — Here's What Actually Happened

A minor backend change caused a production outage, high CPU usage, and API failures. Here's how it happened, what we missed, and how we fixed it.

The Incident

It started as a simple task.

"Just add one more field to the API response."

No major logic change. No risky deployment.
Just a small enhancement.

We deployed it to production… and within minutes:

  • API response time jumped from 120ms → 5s
  • CPU usage hit 95%
  • Some endpoints started timing out
  • Users began reporting failures

At first, nothing made sense.

What Changed?

Here's the actual change:

// Before
const users = await User.find({ isActive: true });
// After
const users = await User.find({ isActive: true })
  .populate("orders");
Enter fullscreen mode Exit fullscreen mode

Looks harmless, right?

That .populate("orders") was the killer.

The Real Problem

Each user had multiple orders.

So instead of:

  • 1 query

We now had:

  • 1 query + N additional queries (for each user)

This is called:

N+1 Query Problem

With ~2,000 users:

  • That turned into 2,001 database queries per request

Why It Broke Production

  • MongoDB connections got saturated
  • CPU usage spiked due to excessive queries
  • API latency exploded
  • Node.js event loop got blocked

Even worse:

  • This endpoint was used in the dashboard
  • Every page load triggered this heavy query

Why We Didn't Catch It

Because:

  • Local data was small (10–20 users)
  • No load testing
  • No query monitoring in staging
  • No performance checks before deploy

Everything worked "fine" locally.

The Fix

We replaced .populate() with a controlled query:

const users = await User.find({ isActive: true }).lean();
const userIds = users.map(u => u._id);
const orders = await Order.find({
  userId: { $in: userIds }
}).lean();
const ordersMap = orders.reduce((acc, order) => {
  acc[order.userId] = acc[order.userId] || [];
  acc[order.userId].push(order);
  return acc;
}, {});
const result = users.map(user => ({
  ...user,
  orders: ordersMap[user._id] || []
}));
Enter fullscreen mode Exit fullscreen mode

Result After Fix

  • API response time: 5s → 180ms
  • DB queries: 2000+ → 2 queries
  • CPU usage normalized
  • System stable again

Lessons Learned

1. Never trust .populate() blindly

It looks simple but can be expensive at scale.

2. Always think in queries

Ask yourself:

"How many DB calls will this line generate?"

3. Test with realistic data

Your local environment lies.

4. Add performance monitoring

Track:

  • query count
  • response time
  • CPU usage

5. Use .lean() when possible

It reduces memory overhead and improves performance.

Bonus: Safer Alternative Pattern

For large datasets:

  • Use aggregation pipelines
  • Use pagination
  • Limit populated fields
  • Cache frequently used data

Final Thought

Most production outages don't come from big changes.
They come from small changes that scale badly.

Originally published at stackdevlife.com

Top comments (0)