A minor backend change caused a production outage, high CPU usage, and API failures. Here's how it happened, what we missed, and how we fixed it.
The Incident
It started as a simple task.
"Just add one more field to the API response."
No major logic change. No risky deployment.
Just a small enhancement.
We deployed it to production… and within minutes:
- API response time jumped from 120ms → 5s
- CPU usage hit 95%
- Some endpoints started timing out
- Users began reporting failures
At first, nothing made sense.
What Changed?
Here's the actual change:
// Before
const users = await User.find({ isActive: true });
// After
const users = await User.find({ isActive: true })
.populate("orders");
Looks harmless, right?
That .populate("orders") was the killer.
The Real Problem
Each user had multiple orders.
So instead of:
- 1 query
We now had:
- 1 query + N additional queries (for each user)
This is called:
N+1 Query Problem
With ~2,000 users:
- That turned into 2,001 database queries per request
Why It Broke Production
- MongoDB connections got saturated
- CPU usage spiked due to excessive queries
- API latency exploded
- Node.js event loop got blocked
Even worse:
- This endpoint was used in the dashboard
- Every page load triggered this heavy query
Why We Didn't Catch It
Because:
- Local data was small (10–20 users)
- No load testing
- No query monitoring in staging
- No performance checks before deploy
Everything worked "fine" locally.
The Fix
We replaced .populate() with a controlled query:
const users = await User.find({ isActive: true }).lean();
const userIds = users.map(u => u._id);
const orders = await Order.find({
userId: { $in: userIds }
}).lean();
const ordersMap = orders.reduce((acc, order) => {
acc[order.userId] = acc[order.userId] || [];
acc[order.userId].push(order);
return acc;
}, {});
const result = users.map(user => ({
...user,
orders: ordersMap[user._id] || []
}));
Result After Fix
- API response time: 5s → 180ms
- DB queries: 2000+ → 2 queries
- CPU usage normalized
- System stable again
Lessons Learned
1. Never trust .populate() blindly
It looks simple but can be expensive at scale.
2. Always think in queries
Ask yourself:
"How many DB calls will this line generate?"
3. Test with realistic data
Your local environment lies.
4. Add performance monitoring
Track:
- query count
- response time
- CPU usage
5. Use .lean() when possible
It reduces memory overhead and improves performance.
Bonus: Safer Alternative Pattern
For large datasets:
- Use aggregation pipelines
- Use pagination
- Limit populated fields
- Cache frequently used data
Final Thought
Most production outages don't come from big changes.
They come from small changes that scale badly.
Originally published at stackdevlife.com

Top comments (0)