Franck Pachot

Posted on Jan 5

3 Steps to Optimize Your Queries for Speed

#mongodb #document #database #performance

When performance matters, how you store your application objects, write queries, and design indexes can be the difference between scanning a collection for minutes and getting results in milliseconds—without touching a single document.

In this article, we’ll walk through an example in MongoDB. We’ll start with a normalized model, move to an embedded model, add an index to avoid a full collection scan, and then refine both the index and query to read only what’s needed for the result. The execution plan and execution statistics will guide us at each step.

Scenario

Our domain is an e‑commerce order‑entry application with customers, products, orders, and order lines. We want to:

Find the most recent order in Switzerland containing the book “MongoDB Data Modeling and Schema Design”.

Initial model: normalized with references

If we build the data model without considering how applications will access it—or simply migrate from a relational database by creating one collection per table—we end up with a model where a single domain aggregate is split across multiple collections.

In this model, we insert a customer, some products (books), an order for this customer, and the corresponding order lines for those books:

// Customer  
db.customers.insertOne({  
  "_id": "C0001",  
  "name": "Franck Pachot",  
  "email": "franck.pachot@example.com",  
  "address": { "city": "Geneva", "country": "Switzerland" }  
})  

// Products  
db.products.insertOne({  
  "_id": "P0001",  
  "title": "MongoDB in Action, Third Edition: Building on the Atlas Data Platform",  
  "author": "Arek Borucki",  
  "price": 59.99, "category": "Database", "format": "Paperback"  
})  
db.products.insertOne({  
  "_id": "P0002",  
  "title": "MongoDB Data Modeling and Schema Design",  
  "author": ["Daniel Coupal", "Pascal Desmarets", "Steve Hoberman"],  
  "price": 54.99, "category": "Database", "format": "Paperback"  
})  
db.products.insertOne({  
  "_id": "P0003",  
  "title": "High Performance with MongoDB: Best Practices for Performance Tuning, Scaling, and Architecture",  
  "author": ["Asya Kamsky", "Ger Hartnett", "Alex Bevilacqua"],  
  "price": 49.99, "category": "Database", "format": "Paperback"  
})  

// Order  
db.orders.insertOne({  
  "_id": "O0001",  
  "customerId": "C0001",  
  "orderDate": ISODate("2024-10-22T09:00:00Z"),  
  "status": "Processing"  
})  

// Order lines  
db.orderlines.insertMany([  
  { "orderId": "O0001", "productId": "P0001", "quantity": 1 },  
  { "orderId": "O0001", "productId": "P0002", "quantity": 1 },  
  { "orderId": "O0001", "productId": "P0003", "quantity": 1 }  
])

This is a classic relational database pattern, but querying and optimizing it is complex because joins must occur before filtering.

In MongoDB, to rebuild the domain object aggregate from such model, we use an aggregation pipeline with $lookup joins:

db.orders.aggregate([  
  // Get customer info to filter by country  
  { $lookup: {  
      from: "customers",  
      localField: "customerId",  
      foreignField: "_id",  
      as: "customer"  
    }},  
  { $unwind: "$customer" },  

  // Get product info to filter by title  
  { $lookup: {  
      from: "orderlines",  
      let: { orderIdVar: "$_id" },  
      pipeline: [  
        { $match: { $expr: { $eq: ["$orderId", "$$orderIdVar"] } } },  
        { $lookup: {  
            from: "products",  
            localField: "productId",  
            foreignField: "_id",  
            as: "product"  
          }},  
        { $unwind: "$product" }  
      ],  
      as: "orderLines"  
    }},  

  // Apply filters  
  { $match: {  
      "customer.address.country": "Switzerland",  
      "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
    }},  

  // Sort and limit  
  { $sort: { orderDate: -1 } },  
  { $limit: 1 }  
])

This query works, but it scans the entire collection. Indexes aren’t possible here because $match runs only after the $lookup and $unwind stages.

With one million orders, it would read all one million, perform three million lookups, filter on country and book title, sort a large intermediate result, and then return just one document.

That’s why joins can be slow: when data is split across many tables or collections, you lose the benefit of efficient index access—compound indexes can only target one collection. It also requires to code every query in the domain repository.

We need to rethink the data model.

Step 1: Embedding to fit the domain model

Knowing the database is for an order‑entry system, we can design a document model that matches the domain. An order should be a self‑contained document with all customer info and complete product details for each order line—because that’s how it’s displayed, printed, or emailed. If it’s a single logical document on paper, it should be a single document in MongoDB.

Here’s the embedded form:

db.orders.insertOne(
  {
    _id: 'O0002',
    customerId: 'C0001',
    orderDate: ISODate('2024-10-22T09:00:00.000Z'),
    status: 'Processing',
    customer: {
      _id: 'C0001',
      name: 'Franck Pachot',
      email: 'franck.pachot@example.com',
      address: { city: 'Geneva', country: 'Switzerland' }
    },
    orderLines: [
      {
        quantity: 1,
        product: {
          _id: 'P0001',
          title: "'MongoDB in Action, Third Edition: Building on the Atlas Data Platform',"
          author: [ 'Arek Borucki' ],
          price: 59.99,
          category: 'Database',
          format: 'Paperback'
        }
      },
      {
        quantity: 1,
        product: {
          _id: 'P0002',
          title: "'MongoDB Data Modeling and Schema Design',"
          author: [ 'Daniel Coupal', 'Pascal Desmarets', 'Steve Hoberman' ],
          price: 54.99,
          category: 'Database',
          format: 'Paperback'
        }
      },
      {
        quantity: 1,
        product: {
          _id: 'P0003',
          title: "'High Performance with MongoDB: Best practices for performance tuning, scaling, and architecture',"
          author: [ 'Asya Kamsky', 'Ger Hartnett', 'Alex Bevilacqua' ],
          price: 49.99,
          category: 'Database',
          format: 'Paperback'
        }
      }
    ]
  }
);

Duplicated data isn’t a concern here: documents are compressed on disk, and changes to product descriptions or customer's address don’t affect completed orders. Systems of record store data and relationships as they were at the time of the business event.

Querying becomes far simpler:

db.orders.find(  
  {  
    "customer.address.country": "Switzerland",  
    "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
  }  
).sort({ orderDate: -1 }).limit(1)

No lookups needed. But it still does a collection scan, filters, sorts, and finally returns one document.

I add one million documents and run the query with execution statistics:

db.orders.insertMany(Array.from({length: 333333},()=>({customer:{address:{country:"Switzerland"}}})))
db.orders.insertMany(Array.from({length: 666666},()=>({customer:{address:{country:"Germany"}}})))

db.orders.find(  
  {  
    "customer.address.country": "Switzerland",  
    "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
  }  
).sort({ orderDate: -1 }).limit(1).explain("executionStats").executionStats

{
  executionSuccess: true,
  nReturned: 1,                                -- 👀
  executionTimeMillis: 559,
  totalKeysExamined: 0,                        -- 👀
  totalDocsExamined: 1000001,                  -- ❌
  executionStages: {
    isCached: false,
    stage: 'SORT',                             -- 👀
    nReturned: 1,
    executionTimeMillisEstimate: 542,
    works: 1000004,                            -- 👀
    advanced: 1,
    needTime: 1000002,
    needYield: 0,
    saveState: 29,
    restoreState: 29,
    isEOF: 1,
    sortPattern: { orderDate: -1 },
    memLimit: 104857600,
    limitAmount: 1,
    type: 'simple',
    totalDataSizeSorted: 0,
    usedDisk: false,
    spills: 0,
    spilledRecords: 0,
    spilledBytes: 0,
    spilledDataStorageSize: 0,
    inputStage: {
      stage: 'COLLSCAN',                       -- 👀
      filter: {                                -- 👀
        '$and': [
          { 'customer.address.country': { '$eq': 'Switzerland' } },
          {
            'orderLines.product.title': { '$eq': 'MongoDB Data Modeling and Schema Design' }
          }
        ]
      },
      nReturned: 1,                            -- 👀
      executionTimeMillisEstimate: 516,
      works: 1000002,                          -- 👀
      advanced: 1,
      needTime: 1000000,
      needYield: 0,
      saveState: 29,
      restoreState: 29,
      isEOF: 1,
      direction: 'forward',
      docsExamined: 1000001                    -- 👀
    }
  }
}

In order to return a single document (nReturned: 1), one million documents were read (totalDocsExamined: 1000001), and sorted (stage: 'SORT', works: 1000004).

Now we we have all in one collection, we can optimize with an index. Our objective is to reduce totalDocsExamined to nReturned.

Step 2: Indexing for filtering, sorting, and pagination

Our query filters by:

Equality on customer country
Equality on product title
A range on order date (due to sort().limit()).

Here is the ideal compound index for those filters:

db.orders.createIndex({  
  "customer.address.country": 1,   // equality  
  "orderLines.product.title": 1,   // equality  
  "orderDate": 1                   // sort/range  
})

Here is the execution plan:

db.orders.find(  
  {  
    "customer.address.country": "Switzerland",  
    "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
  }  
).sort({ orderDate: -1 }).limit(1).explain("executionStats").executionStats

{
  executionSuccess: true,
  nReturned: 1,                                -- 👀
  executionTimeMillis: 0,
  totalKeysExamined: 1,                        -- ✅
  totalDocsExamined: 1,                        -- ✅
  executionStages: {
    isCached: false,
    stage: 'LIMIT',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    limitAmount: 1,
    inputStage: {
      stage: 'FETCH',                          -- 👀
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 1,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 0,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',                       -- 👀
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 1,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 0,
        keyPattern: {                          -- 👀
          'customer.address.country': 1,
          'orderLines.product.title': 1,
          orderDate: 1
        },
        indexName: 'customer.address.country_1_orderLines.product.title_1_orderDate_1',
        isMultiKey: true,
        multiKeyPaths: {
          'customer.address.country': [],
          'orderLines.product.title': [ 'orderLines' ],
          orderDate: []
        },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'backward',                 -- 👀
        indexBounds: {
          'customer.address.country': [ '["Switzerland", "Switzerland"]' ],
          'orderLines.product.title': [
            '["MongoDB Data Modeling and Schema Design", "MongoDB Data Modeling and Schema Design"]'
          ],
          orderDate: [ '[MaxKey, MinKey]' ]    -- 👀
        },
        keysExamined: 1,                       -- 👀
        seeks: 1,                              -- 👀
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

With this index, the query jumps straight to the target document:

totalKeysExamined: 1 from one range (seeks: 1 in stage: 'IXSCAN)
totalDocsExamined: 1 from stage: 'FETCH' without the need to sort
nReturned: 1

The result is retrieved in sorted order directly from the index, minimizing the number of documents to read. We can do better. Our next goal is to elimiate totalDocsExamined.

Step 3: Projection and covering index

If the application only needs the customer’s name and country, we can project:

db.orders.find(  
  {  
    "customer.address.country": "Switzerland",  
    "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
  },  
  { "customer.name": 1, "customer.address.country": 1, _id: 0 }  
).sort({ orderDate: -1 })

[
  {
    customer: { name: 'Franck Pachot', address: { country: 'Switzerland' } }
  }
]

Now, we can create a covering index:

db.orders.createIndex({  
  "customer.address.country": 1,  
  "orderLines.product.title": 1,  
  "orderDate": 1,  
  "customer.name": 1  
})

Here is the execution plan:

db.orders.find(  
  {  
    "customer.address.country": "Switzerland",  
    "orderLines.product.title": "MongoDB Data Modeling and Schema Design"  
  },  
  { "customer.name": 1, "customer.address.country": 1, _id: 0 }  
).sort({ orderDate: -1}).limit(1).explain("executionStats").executionStats

{
  executionSuccess: true,
  nReturned: 1,                                -- 👀
  executionTimeMillis: 1,
  totalKeysExamined: 1,                        -- ✅
  totalDocsExamined: 0,                        -- ✅
  executionStages: {
    isCached: false,
    stage: 'LIMIT',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    limitAmount: 1,
    inputStage: {
      stage: 'PROJECTION_DEFAULT',             -- 👀
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 1,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 0,
      transformBy: { 'customer.name': 1, 'customer.address.country': 1, _id: 0 },
      inputStage: {
        stage: 'IXSCAN',                       -- 👀
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 1,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 0,
        keyPattern: {
          'customer.address.country': 1,
          'orderLines.product.title': 1,
          orderDate: 1,
          'customer.name': 1
        },
        indexName: 'customer.address.country_1_orderLines.product.title_1_orderDate_1_customer.name_1',
        isMultiKey: true,
        multiKeyPaths: {
          'customer.address.country': [],
          'orderLines.product.title': [ 'orderLines' ],
          orderDate: [],
          'customer.name': []
        },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'backward',                 -- 👀
        indexBounds: {
          'customer.address.country': [ '["Switzerland", "Switzerland"]' ],
          'orderLines.product.title': [
            '["MongoDB Data Modeling and Schema Design", "MongoDB Data Modeling and Schema Design"]'
          ],
          orderDate: [ '[MaxKey, MinKey]' ],
          'customer.name': [ '[MaxKey, MinKey]' ]   -- 👀
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

This is the optimal plan where all documents returned (nReturned: 1) come from the index entries that were scanned (totalKeysExamined: 1) without any FETCH (totalDocsExamined: 0)

We can still optimize—for example, by making the "orderDate" index descending, as forward scans can be slightly faster than backward ones. But here, where only one index entry is read, it doesn’t matter. What matters most is knowing when to stop tuning and keeping only the minimal set of indexes that delivers the required performance for your application.

This index supports multiple queries. It lets you:

Find orders from a specific country
Find orders containing a specific product title, combined with the country filter, or alone (with skip scan)
Filter further by date range
Sort orders by date (ascending or descending)
Filter by customer name
Retrieve the customer name, country, and order date without accessing the full document

Conclusion

MongoDB query performance isn’t just about adding indexes—it’s about aligning your data model and queries with how your application really uses data. Turning a four-collection join with millions of lookups into a single index scan that runs in milliseconds comes down to three rules:

Model documents as aggregates in your domain layer
Embed data that’s always accessed together. If it fits on the same “piece of paper” in the real world—like an order with customer and product details—it belongs in one MongoDB document. This simplifies queries, reduces totalDocsExamined and add more indexing possibilities.
Design indexes for your query access patterns
Use compound indexes that serve your filters and pagination order (ESR guideline). This reduces totalKeysExamined and totalDocsExamined.
Refine repositories to query only the required fields
Project only the fields you need. Fully covered queries (all fields in the index) can achieve totalDocsExamined: 0 and return results directly from the index — fast and resource-efficient.

You don’t need to apply this to all queries—use these steps only when necessary. Excessive lookups and limited indexing options drove us to optimize the data model. After that, the remaining need for selective filters and the absence of a suitable index led to creating a new one. A covering projection wasn’t needed here because we only fetch one document. It becomes useful mainly when document fetch time is a significant part of the response time, or when we want to avoid loading large documents that increase memory pressure.