MongoDB Guests for MongoDB

Posted on Apr 21

Detective mode: Fixing the MongoDB Aggregation pipeline

#database #mongodb #monitoring #performance

This article was written by Darshan Jayarama.

It was a Friday, and I was quickly wrapping up my week, finishing the tasks that were assigned. Suddenly, the board became red — another S1. I was like, not now!!!

The application team started complaining about the API responses from the database, going from 600ms to 17 seconds. Booooom!!! That’s insane. No other choice — since it was production, I had to jump in. The war room was ready. The monitoring dashboard was nothing but a heart patient’s ECG — spikes, timeouts, and the main app was choking.

What's the Mystery?

From the dashboard, we identified a query that was asked by the VP to add a breakdown by region to the dashboard. It was just one more field. But it was ONE.MORE.FIELD.

Innocent query, which was harmless for years:

db.orders.aggregate([
 {
   $match: {
     status: "completed",
     created_at: { $gte: new Date("2025-01-31") }
   }
 },
 {
   $group: {
     _id: "$customer_id",
     total_orders: { $sum: 1 },
     total_revenue: { $sum: "$amount" },
     avg_order_value: { $avg: "$amount" }
   }
 },
 {
   $sort: { total_revenue: -1 }
 },
 {
   $limit: 100
 }
])

Simple, right? top 100 customer revenue dashboard.

Execution time 500–700ms, Docs Examined ~2.5M.

We then added just one more field query:

db.orders.aggregate([
 {
   $match: {
     status: "completed",
     created_at: { $gte: new Date("2024-01-01") }
   }
 },
 {
   $lookup: { 
     from: "customers",
     localField: "customer_id",
     foreignField: "_id",
     as: "customer"
   }
 },
 {
   $unwind: "$customer"
 },
 {
   $group: {
     _id: {
       customer_id: "$customer_id",
       region: "$customer.region"
     },
     total_orders: { $sum: 1 },
     total_revenue: { $sum: "$amount" },
     avg_order_value: { $avg: "$amount" }
   }
 },
 {
   $sort: { total_revenue: -1 }
 },
 {
   $limit: 100
 }
])

Execution time >15000ms, Docs Examined 2.5M X 2.5M, memory exploded.

The Investigation: Agent Darshan reporting

Evidence #1:

MongoDB profiler, setting the status to 2, helps us collect all the queries that are being run on the given database. Set this level only when needed, else you’re gonna blow up the log file:

db.setProfilingLevel(2, { slowms: 100 })
db.system.profile.find().sort({ ts: -1 }).limit(1).pretty()

I could see no mercy in our query, brutally attacking the MongoDB cluster:

{
 "op": "command",
 "ns": "production.orders",
 "command": { /* our aggregation */ },
 "keysExamined": 2489234,      // ✅ Used index on status
 "docsExamined": 2489234,      // But still examined all matched docs
 "numYield": 18943,
 "nreturned": 100,
 "responseLength": 8234,
 "millis": 647,
 "planSummary": "IXSCAN { status: 1 }",  // ✅ Used status index
 "execStats": {
   "stage": "FETCH",
   "filter": {
     "created_at": { "$gte": ISODate("2025-01-31") }  // 👈 Filter applied AFTER index
   },
   "inputStage": {
     "stage": "IXSCAN",
     "keyPattern": { "status": 1 },
     "indexName": "status_1",
     "keysExamined": 2489234,
     "seeks": 1,
     "dupsTested": 0,
     "dupsDropped": 0,
     "direction": "forward"
   },
   "totalDocsExamined": 2489234,
   "totalKeysExamined": 2489234,
   "executionTimeMillisEstimate": 447
 }
}

Red Flags:

DocsExamined = ~2.5M, whereas nReturened 100
PlanSummary: Single key index on boolean status_1.
High yield.

Evidence #2:

Need to review and explain the plan to make the query confess to the brutal attack:

{
 "stages": [
   {
     "$cursor": {
       "queryPlanner": {
         "winningPlan": {
          "stage": "IXSCAN",
           "keyPattern": { "status": 1 },
           "filter": {
             "status": { "$eq": "completed" },
             "created_at": { "$gte": new Date("2025-01-31") }
           }
         }
       },
       "executionStats": {
         "executionTimeMillis": 647,  // $match stage: 647ms
         "totalDocsExamined": 2489234,
         "totalKeysExamined": 2489234,  //
         "nReturned": 2489234  // Passed 2.48M docs to next stage
       }
     }
   },
   {
     "$lookup": {  // 💀💀💀 THE SERIAL KILLER
       "from": "customers",
       "as": "customer",
       "executionTimeMillis": 13934,  // 13.9 seconds just for $lookup!
       "totalDocsExamined": 2489234,  // Looked up 2.48M customers
       "stage": "NESTED_LOOP_JOIN",  // No index on customers._id
       "indexesUsed": []  // 👈 DISASTER
     }
   },
   {
     "$unwind": {
       "executionTimeMillis": 823
     }
   },
   {
     "$group": {
       "executionTimeMillis": 1156,
       "usedDisk": true,  // 💀 Spilled to disk (>100MB memory)
       "spilledRecords": 847293
     }
   },
   {
     "$sort": {
       "executionTimeMillis": 87,
       "sortPattern": { "total_revenue": -1 },
       "usedDisk": false  // Lucky, only 100 docs to sort after $limit
     }
   }
 ],
 "executionTimeMillis": 16847
}

Time breakdown:

$match: 647ms → $lookup: 13.9 seconds → $unwind: 823ms → $group: 1156ms → $sort: 87ms → Total 16.9 seconds

Suspects list

Suspect #1:

Unoptimized index for $match. Even though we had a single index on status:1 and created_at:1, we didn't have a compound index on both.

Motive: Index inefficiency
Proof: was high nReturned to the next stage.

Suspect #2:

The $lookup brutalness, for each customer_id from the previous stage, finding the join on the customers collection with an absent element in indexUsed. Isn’t that brutal?

Motive: just one more field request
Proof: empty indexUsed and 13.9s execution time.

Suspect #3:

$group surrendered with memory hog, spilled ~847k documents to disk as it exceeded 100MB per stage MongoDB limit.

Motive: Grouping 2.4M records with useDisk true
Proof: 847k spilledRecords.

Suspect #4:

We were performing an expensive $lookup operation before we could filter down to 100 records.

Motive: Not thinking about pipeline optimization
Proof: Processing 2.48M docs when we only needed 100

Recapping all mistakes to fix!

We were doing $match: 647ms(inefficient index)→ $lookup: 13.9 seconds(massive 2.4M join) → $unwind: 823ms → $group: 1156ms(memory overflow) → $sort: 87ms → Total 16.9 seconds

We redesigned as,

$match with compound index for efficient filtering
$group by customer_id reduce the dataset
$sort and $limit top 100 customer and we have only 100 records now.
$lookup for the 100 customer data we wanted.
$project to add region information.

Does it fix the issues?

The Solution:

2.5M orders
↓ $match (INDEXED — fast!)
2.48M orders
↓ $group (by customer_id only)
~500K customer groups
↓ $sort + $limit
100 top customers
↓ $lookup (only 100 lookups!) ← ✅ FIXED
100 results with customer data
↓ $project (extract region)
100 final results

The best method of using $lookup is not using it!

db.orders.aggregate([
 {
   $match: {
     status: "completed",
     created_at: { $gte: new Date("2024-01-01") }
   }
 },
 {
   $group: {
     _id: "$customer_id",  // Group BEFORE lookup
     total_orders: { $sum: 1 },
     total_revenue: { $sum: "$amount" },
     avg_order_value: { $avg: "$amount" }
   }
 },
 {
   $sort: { total_revenue: -1 }
 },
 {
   $limit: 100  // Reduce to 100 BEFORE lookup
 },
 {
   $lookup: {  // Now only looking up 100 customers
     from: "customers",
     localField: "_id",
     foreignField: "_id",
     as: "customer"
   }
 },
 {
   $unwind: "$customer"
 },
 {
   $project: {
     customer_id: "$_id",
     region: "$customer.region",
     total_orders: 1,
     total_revenue: 1,
     avg_order_value: 1
   }
 }
])

Curious to know the metrics now? I am excited to display as well:

Execution time 847ms, docsExamined 2.5M, lookup performed 100 documents.

✅ Issue resolved

Root cause: $lookup before aggregation.
Fix: Aggregate first, then join the top 100
Performance: 17s → 0.8s (20x improvement)
Added indexes: (status, created_at)

The Costly Lessons Learned:

One field can kill you. That “simple” region addition cost us $24,000 in downtime.
$lookup is not free. It’s actually very expensive. Treat it like a loaded gun.
Order matters. Aggregate → Limit → Join. Not Join → Aggregate → Limit.
Indexes aren’t optional. They’re the difference between 1.8 seconds and 0.15 seconds.
Explain is your friend. If you’re not running, explain, you’re flying blind.
The 100MB limit is real. When $group says “usedDisk: true”, you’re already in trouble.
Test with production data volumes. 1,000 docs is not the same as 2,500,000 docs.

DEV Community