DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

Querying a Nested Array with `elemMatch` in Amazon DocumentDB

Let's say you have a MongoDB-like database (Amazon DocumentDB) where your collections include documents with nested arrays. You want to query documents where an element in the nested array matches a specific condition. One of the most efficient ways to do this is by using the elemMatch operator, and also indexing the fields involved to improve query performance.

Scenario

You're working with a collection called orders in your DocumentDB database, which stores customer orders. Each order document contains an array of items, and each item has several fields such as product_id, quantity, and price.

Example Document
{
  "_id": "order123",
  "customer_id": "cust_001",
  "order_date": "2023-07-10",
  "items": [
    {
      "product_id": "prod_001",
      "quantity": 2,
      "price": 20
    },
    {
      "product_id": "prod_002",
      "quantity": 1,
      "price": 40
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode
Problem

You want to efficiently query all orders that contain an item with a specific product_id and quantity that meets certain conditions, for example, orders that contain at least one item with product_id of "prod_001" and quantity greater than 1.

db.orders.find({
  "items": {
    "$elemMatch": {
      "product_id": "prod_001",
      "quantity": { "$gt": 1 }
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

This query will return orders where at least one of the items has product_id of "prod_001" and quantity greater than 1.

Solution

To improve the performance of the query, you'll need to ensure that DocumentDB can use an index to speed up lookups. Here's how you can index it.

Step 1: Indexing with elemMatch and Compound Indexes

When querying nested arrays like this, a compound index on the array field is critical for performance. For your orders collection, you can create an index that includes the items.product_id and items.quantity fields. This will allow DocumentDB to quickly find the specific items within the array that match your elemMatch query condition.

Index Creation
db.orders.createIndex({
  "items.product_id": 1,
  "items.quantity": 1
})
Enter fullscreen mode Exit fullscreen mode
Explanation:
  • items.product_id: 1: This creates an index on the product_id field inside the items array.
  • items.quantity: 1: This creates an index on the quantity field inside the items array.
  • The compound index on both fields will help optimize the query that looks for documents containing items with specific product IDs and quantities.

Step 2: Query with elemMatch

Now you can run your elemMatch query, and DocumentDB will use the compound index to speed up access to the matching documents.

db.orders.find({
  "items": {
    "$elemMatch": {
      "product_id": "prod_001",
      "quantity": { "$gt": 1 }
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

This query will return all orders that contain at least one item with a product_id of "prod_001" and a quantity greater than 1. Because of the compound index on items.product_id and items.quantity, DocumentDB can efficiently perform the query even if the orders collection is large.

Step 3: Index Monitoring and Optimization

Once the index is created, it's important to monitor the query performance over time and adjust your indexes as needed. You can do this using the following:

Profile the query: Use the explain() method to check how DocumentDB is executing your query.

db.orders.find({
  "items": {
    "$elemMatch": {
      "product_id": "prod_001",
      "quantity": { "$gt": 1 }
    }
  }
}).explain("executionStats")
Enter fullscreen mode Exit fullscreen mode

Adjust indexes: If the query still isn't fast enough, consider:

  • Adding additional fields to the index if you frequently query by other attributes (e.g., order_date or customer_id).
  • Ensure indexes are balanced and not too many to avoid overhead.

Bonus: Additional Indexing Considerations

  • Partial Index: If you're only interested in certain types of orders (e.g., only orders from a specific customer_id or within a date range), you can use a partial index that includes only those documents that meet certain criteria. This can save space and improve query performance.

Example:

db.orders.createIndex(
  { "items.product_id": 1, "items.quantity": 1 },
  { partialFilterExpression: { "order_date": { "$gte": new ISODate("2023-01-01") } } }
)
Enter fullscreen mode Exit fullscreen mode

This index would only be used for orders with an order_date after January 1, 2023, improving query performance for those specific orders.

Conclusion

To summarize:

  • Use elemMatch for querying nested arrays in DocumentDB.
  • Create a compound index on the fields inside the array that are frequently queried.
  • Monitor performance using explain() and adjust indexes as necessary.
  • Consider partial indexes for more specific query patterns to optimize storage and performance further.

Top comments (0)