Franck Pachot for MongoDB

Posted on Nov 17 • Edited on Dec 3

Nested Loop and Hash Join for MongoDB $lookup

#mongodb #document #sql #database

SQL databases offer several join algorithms. The query planner selects the most efficient one by evaluating cardinality and estimated cost. For example, a nested loop join is ideal when the outer table has few rows and an index allows fast access to the inner table. In contrast, a hash join is better suited for situations where the outer table contains many rows and the inner table must be fully scanned, resulting in fewer costly loops.

While MongoDB provides similar algorithms, adapted to flexible documents, being a NoSQL database means it shifts more responsibility to the developer. Developers must design for optimal data access, already in the data model, but it has the advantage of resulting in more predictable performance.

I'll base my example on a question on Reddit: Optimizing a MongoDB JOIN with $lookup and $limit. I use a collection of users, where each user references a profile. The profile has a status. The query lists the users with no profile or with a profile with a status equal to 2.

In my demo, I set up two profile keys: "_id," which is automatically indexed in MongoDB, and an "ID" field, which is not. This setup illustrates both situations—an indexed lookup table versus a non-indexed one. Normally, you'd use just one method, depending on which join algorithm you favor.

db.profiles.drop()
db.users.drop()

db.profiles.insertMany([
  { _id:102, ID: 102, status: 2 },
  { _id:201, ID: 201, status: 1 },
  { _id:302, ID: 302, status: 2 },
  { _id:403, ID: 403, status: 3 }
]);

db.users.insertMany([
  { name: "Alice" , profileID: 403 },   // profile status = 3
  { name: "Bob", profileID: 102 },      // profile status = 2
  { name: "Charlie", profileID: 201 },  // profile status = 1
  { name: "Diana", profileID: 102 },    // profile status = 2
  { name: "Eve" },                      // no profile
  { name: "Franck" , profileID: 403 },  // profile status = 3
  { name: "Gaspar" , profileID: 403 },  // profile status = 3
  { name: "Hans" , profileID: 403 },    // profile status = 3
  { name: "Iona" , profileID: 403 },    // profile status = 3
  { name: "Jane" , profileID: 403 },    // profile status = 3
  { name: "Karl" , profileID: 403 },    // profile status = 3
  { name: "Lili" },                     // no profile
  { name: "Math" },                     // no profile
  { name: "Niall" },                    // no profile
  { name: "Oscar" , profileID: 403 },   // status = 3  
  { name: "Paula" , profileID: 102 },   // status = 2  
  { name: "Quentin" , profileID: 201 }, // status = 1  
  { name: "Ravi" , profileID: 102 },    // status = 2  
  { name: "Sofia" },                    // no profile  
  { name: "Takumi" , profileID: 403 },  // status = 3  
  { name: "Uma" , profileID: 403 },     // status = 3  
  { name: "Viktor" , profileID: 403 },  // status = 3  
  { name: "Wafa" , profileID: 403 },    // status = 3  
  { name: "Ximena" , profileID: 403 },  // status = 3  
  { name: "Yara" },                     // no profile  
  { name: "Zubair" },                   // no profile 
]);

Here is my query on this small data set:

db.users.aggregate([  
  {  
    $lookup: {  
      from: "profiles",  
      localField: "profileID",  
      foreignField: "ID",  
      as: "profile"  
    }  
  },  
  {  
    $match: {  
      $or: [  
        { profile: { $eq: [] } }, // no profile  
        { profile: { $elemMatch: { status: 2 } } } // profile status = 2  
      ]  
    }  
  },  
  {  
    $project: {  
      _id: 0,  
      name: 1,  
      "profile.status": 1 // keep only the status field from joined profile  
    }  
  }  
]);

Note that the first optimization I did is replacing the join expression (let: { userId: "$_id" },pipeline: [ { $match: { $expr: { $eq: ["$userId", "$$userId"] } } } ]) with localField/foreignField to allow the push down of the join predicate (EQ_LOOKUP)

Here is the result:

[
  { name: 'Bob', profile: [ { status: 2 } ] },
  { name: 'Diana', profile: [ { status: 2 } ] },
  { name: 'Eve', profile: [] },
  { name: 'Lili', profile: [] },
  { name: 'Math', profile: [] },
  { name: 'Niall', profile: [] },
  { name: 'Paula', profile: [ { status: 2 } ] },
  { name: 'Ravi', profile: [ { status: 2 } ] },
  { name: 'Sofia', profile: [] },
  { name: 'Yara', profile: [] },
  { name: 'Zubair', profile: [] },
]
Type "it" for more
test>

To scale the number of users, I multiply each existing user by 10,000 using the following script:

const currentUsers = db.users.find({},{_id:0, name:1, profileID:1});
currentUsers.forEach(userDoc => {  
    print(`Inserting 10,000 documents for: ${JSON.stringify(userDoc)}`);  
    for (let i = 0; i < 10000; i++) { 
      const newUsers = [];
      const clone = { ...userDoc };
      clone.name=`${i}${clone.name}`
      newUsers.push(clone);
      db.users.insertMany(newUsers);
    }
})

I now have 260,026 users.

Indexed nested loop join

I run my query with explain("executionStats") and extract the most important statistics about the time and the number of documents examined:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "_id",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
]).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The lookup stage has returned all documents, as it must be joined before filtering, in 2.5 seconds:

 nReturned: 260026
 executionTimeMillis: 2456
 totalKeysExamined: 190019
 totalDocsExamined: 450045
 EQ_LOOKUP strategy: IndexedLoopJoin

The equality lookup used an indexed loop join strategy, with an index scan for each document:

nReturned: 260026: All local documents with their joined profile arrays
executionTimeMillis: 2456: Total execution time including both join and filter stages
totalKeysExamined: 190019: Only keys that found matches in the profiles collection's index on "_id" are accounted (lookup_query_stats). The index can determine "key not found" without actually examining a key entry.
totalDocsExamined: 450045: Users collection scan (260,026) + profile documents fetched (190,019)

The number of profiles examined is high compared to the number of profiles in the collection. Another algorithm can be faster by reading all profiles once and lookup from a hash table.

Hash join on small unindexed table

MongoDB 8.0 chooses a hash join on the following conditions:

allowDiskUse: true is set (required for spilling if hash table exceeds memory)
Foreign collection is small enough—controlled by these parameters: internalQueryCollectionMaxNoOfDocumentsToChooseHashJoin (default: 10,000 docs), internalQueryCollectionMaxDataSizeBytesToChooseHashJoin (default: 100 MB), and internalQueryCollectionMaxStorageSizeBytesToChooseHashJoin (default: 100 MB)
No compatible index exists, disk use is allowed and hash join is more efficient

To show that, I use the "ID" field, that is not indexed, for the lookup:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: true } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The hash join completed 3.3x faster (750ms vs 2,456ms) with significantly different execution patterns:

 nReturned: 260026
 executionTimeMillis: 750
 totalKeysExamined: 0
 totalDocsExamined: 260030
 EQ_LOOKUP strategy: HashJoin

Hash join works in two phases, with no index required (totalKeysExamined: 0):

Build phase: Scans the foreign collection once to build an in-memory hash table keyed by foreignField values. It has read the four profiles.
Probe phase: Scans the local collection once, probing the hash table for each local key. It has read 260,026 users.

The total is 260,030 documents examined.

Nested loop join without index

A third option is a nested loop join that scans the collection in each loop, rather than using an index or building a hash table. I disable disk usage to avoid hash join and use the non-indexed field to avoid indexed nested loop:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: false } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

Here are the execution statistics:

 nReturned: 260026
 executionTimeMillis: 1618
 totalKeysExamined: 0
 totalDocsExamined: 1300130
 EQ_LOOKUP strategy: NestedLoopJoin

Like other algorithms, 260,026 users were read. Since there was no index on the foreign field, there's no index scan at all (totalKeysExamined: 0). This caused the system to scan the four profiles for each user, resulting in a total of 260,026 + 4 × 260,026 = 1,300,130 documents examined.

In this example, this approach is five times more costly than indexed loop join in terms of documents examined, and three times more than hash join because nested loop join requires repeatedly scanning the foreign collection. Interestingly, because the lookup collection is very small and sequential scans are cache-friendly, the execution time surpasses that of the indexed nested loop in this case, as index seeks incur additional costs.

Indexed nested loop with covering index

If you have your key in another field than "_id", like here with "ID", you can create an index on it and the plan will use an indexed nested loop join. This has the same performance as the query using "_id" because, except for clustered collections, all indexes are secondary in MongoDB and access the document via an internal record identifier. Using "_id" avoids creating one more index. However, one reason to create another index is to get it to cover all necessary fields:

db.profiles.createIndex({ ID: 1, status: 1 })

Now, the previous query using "ID" as the foreign field will use the index and get the same performance as the indexed nested loop join above when using "_id". However, it is not a covering index because there are no projections defined.

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
]).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

Here are the execution statistics, the same as with the index on "_id", running in 2.5 seconds:

 nReturned: 260026
 executionTimeMillis: 2484
 totalKeysExamined: 190019
 totalDocsExamined: 450045
 EQ_LOOKUP strategy: IndexedLoopJoin

I can add a projection to the lookup pipeline, but be careful, it might be slower:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    pipeline: [ 
     { $project: { _id: 0, ID: 1, status: 1 } } 
    ],
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
] ).explain("executionStats")

xs=x["stages"][1];

The index is used as a covering index but unfortunately, this doesn't use the Slot-Based Query Execution Engine (SBE) and finally takes longer:

{
  '$lookup': {
    from: 'profiles',
    as: 'profile',
    localField: 'profileID',
    foreignField: 'ID',
    let: {},
    pipeline: [ { '$project': { _id: 0, ID: 1, status: 1 } } ]
  },
  totalDocsExamined: Long('0'),
  totalKeysExamined: Long('190019'),
  collectionScans: Long('0'),
  indexesUsed: [ 'ID_1_status_1' ],
  nReturned: Long('260026'),
  executionTimeMillisEstimate: Long('21309')
}

The absence of collection scan (collectionScans: 0) confirms that an index is used, and it's the index I've created (indexesUsed: [ 'ID_1_status_1' ]). The absence of document examined (totalDocsExamined: 0) confirms that it uses a covering index. However, it took 21 seconds to execute.

When a pipeline is added to the lookup stage, SBE can no longer be used, so the classic engine plans each lookup query at runtime, which explains the longer elapsed time. SBE’s optimized execution model for joins requires the join to be completely specified at planning time, with only localField/foreignField and no pipeline (determineSbeCompatibility()). In contrast, covering indexes need a pipeline for runtime projection control, making it impossible to use both optimizations together. Additionally, the current version (findSbeCompatibleStagesForPushdown) limits lookup to fully local collections and considers whether an unwind follows. If you know there will be only one element, use { $arrayElemAt:["$profiles.status",0]} instead of $unwind: "$profiles" for better efficiency.

Nested loop with materialization

There is one more possibility that can be faster than a nested loop, similar to a hash join, but without building a hash table, just an array for each document and the join in the projection. It can compare to PostgreSQL's Nested Loop with Materialize.

To read the lookup only once, we use an empty pipeline with no join condition. As I want to test the covering index, instead of an empty pipeline (pipeline: [ ]), I add a filter and projection. This adds all profiles to each user. I define the join in a projection after the lookup, where I find the array index for the profile, with $indexOfArray, and get the status for this item:

x=db.users.aggregate([  
  {  
    $lookup: {  
      from: "profiles",  
      let: { pid: "$profileID" },  
      pipeline: [  
       { $match: { ID: { $ne: null } } },  
       { $project: { _id: 0, ID: 1, status: 1 } }  
      ],  
      as: "profiles"  
    }  
  },  
  {  
  $project: {  
    status: {  
      $arrayElemAt: [  
        "$profiles.status",  
        { $indexOfArray: ["$profiles.ID", "$profileID"] }  
      ]  
    },  
    matchFound: {  
      $ne: [  
        { $indexOfArray: ["$profiles.ID", "$profileID"] },  
        -1  
      ]  
    }  
  }  
  },
  {  
    $match: {  
      $or: [  
        { matchFound: false },       // true “no profile”  
        { status: 2 }  
      ]  
    }  
  }  
]).explain("executionStats")

The execution takes 3.5 seconds, not using the SBE, and the lookup stage reads the profile only once, from the covering index:

test> xs=x["stages"][1]

{
  '$lookup': {
    from: 'profiles',
    as: 'profiles',
    let: { pid: '$profileID' },
    pipeline: [
      { '$match': { ID: { '$ne': null } } },
      { '$project': { _id: 0, ID: 1, status: 1 } }
    ]
  },
  totalDocsExamined: Long('0'),
  totalKeysExamined: Long('4'),
  collectionScans: Long('0'),
  indexesUsed: [ 'ID_1_status_1' ],
  nReturned: Long('260026'),
  executionTimeMillisEstimate: Long('3447')
}

Rather than being a native join algorithm, this approach is similar to executing two separate queries and then joining the results in the application. However, with this method, the join is performed using aggregation pipeline operators. It reads less documents, and uses the covering index, but doesn't benefit from the SBE optimisations.

Summary

Like with any database, it is important to understand the join algorithms. When you use a lookup in the aggregation pipeline, MongoDB selects the join strategy based on the query, collections, and existing indexes. The three algorithms are:

Indexed Loop Join is used when a compatible index exists on foreignField. It is best for a low to medium match rate or when the foreign collection is large.
Hash Join is used when allowDiskUse is set, and there is no compatible index, but only when the foreign collection is small (< 10,000 docs, < 100MB). It is best with a high match rate on lots of documents.
Nested Loop Join is a fallback when the previous ones cannot be used. It is acceptable only for tiny foreign collections.

Unlike SQL databases, where the optimizer makes all decisions but can also lead to surprises, MongoDB shifts responsibility to developers. You must:

Design your schema with join performance in mind. For bounded relationships, embedding may be the best solution.
Understand your data (collection sizes, match rates) to predict which strategy will be the best.
Test different strategies by creating/dropping indexes and toggling allowDiskUse. Create an index only when you want to use it.
Measure performance using explain("executionStats") to validate your choices.

This design favors predictability and control over relying on automatic optimization. So when you hear statements like “Joins are slow” or “Lookups are fast,” take time to understand the facts and how these operations are actually executed, before forming an opinion.

Top comments (1)

Sofia Petrova • Nov 18

Great breakdown of $lookup join strategies—IndexedLoopJoin vs HashJoin vs the 'please don't' NestedLoop. The SBE caveat with pipelines is a valuable gotcha. Maybe add guidance on skewed keys and memory spill behavior?