Dmitry Romanoff

Posted on Jul 12

Boosting Performance with AWS DocumentDB: Exercises in Data Modeling and Optimization

#database #aws #programming #performance

When working with AWS DocumentDB, it's essential to understand how to efficiently query and manipulate large datasets. In this article, we'll go through several practical exercises, showcasing common use cases, and providing examples of how to optimize your database performance using indexes. We’ll focus on operations like inserting documents, performing aggregations, and comparing query performance before and after adding indexes. Let’s dive in!

1. Inserting Multiple Documents into AWS DocumentDB

In many real-world applications, you’ll need to handle large datasets, such as orders and customer transactions. Below, we simulate inserting 1000 orders, each containing a list of items purchased by the customer. The goal here is to bulk-insert data into AWS DocumentDB using the insertMany function.

Example: Bulk Inserting Orders

let orders = [];

for (let i = 1; i <= 1000; i++) {
  let numItems = Math.floor(Math.random() * 301); // Random number of items between 0 and 300

  let items = [];
  for (let j = 0; j < numItems; j++) {
    items.push({
      "product_id": "prod_" + String(Math.floor(Math.random() * 10) + 1).padStart(3, '0'),
      "quantity": Math.floor(Math.random() * 5) + 1, // Random quantity between 1 and 5
      "price": Math.floor(Math.random() * 50) + 10  // Random price between 10 and 50
    });
  }

  let order = {
    "_id": "order" + String(i).padStart(3, '0'),
    "customer_id": "cust_" + String(Math.floor(Math.random() * 100) + 1).padStart(3, '0'),
    "order_date": new ISODate("2023-07-" + String(i % 31 + 1).padStart(2, '0') + "T00:00:00Z"),
    "items": items // The dynamically generated items array
  };

  orders.push(order);
}

// Now insert all orders into the collection
db.orders.insertMany(orders);

In this script:

We generate 1000 orders, each with a random number of items.
The items have random product IDs, quantities, and prices.
After creating the orders, we use insertMany to bulk insert the documents into the orders collection.

Once the insert operation completes, you can check the document count using:

db.orders.countDocuments();

2. Querying Specific Product in Orders

Let’s now query for orders that contain a specific product, such as prod_006. This is a typical operation when you need to analyze how many times a product appears across all orders.

Example: Query with \$elemMatch

db.orders.countDocuments({
  "items": {
    $elemMatch: {
      "product_id": "prod_006"
    }
  }
});

In this query:

We use $elemMatch to find documents where at least one item in the items array matches the product ID prod_006.

3. Querying for Products with Specific Quantities

Another common query involves filtering based on both product and quantity. Let’s say we want to find orders that contain prod_007 with quantities greater than 1.

Example: Query with Multiple Conditions

db.orders.countDocuments({
  "items": {
    $elemMatch: {
      "product_id": "prod_007",
      "quantity": { $gt: 1 }
    }
  }
});

This query retrieves orders where:

The items array contains prod_007, and
The quantity of prod_007 is greater than 1.

4. Aggregation: Finding Min and Max Quantities

Aggregations are powerful tools for summarizing data. Let’s say we need to find the minimum and maximum quantities of a specific product, like prod_010, across all orders.

Example: Aggregation Pipeline with \$unwind

db.orders.aggregate([
  { 
    $unwind: "$items"  // Flatten the items array
  },
  { 
    $match: { 
      "items.product_id": "prod_010"  // Match documents where product_id is 'prod_010'
    }
  },
  { 
    $group: { 
      _id: null,  // Grouping all documents together
      minQuantity: { $min: "$items.quantity" },  // Find the minimum quantity
      maxQuantity: { $max: "$items.quantity" }   // Find the maximum quantity
    }
  }
]);

In this aggregation:

$unwind is used to flatten the items array, making it easier to work with individual products.
$match filters for orders containing prod_010.
$group then calculates the minimum and maximum quantities for prod_010.

5. Aggregating Quantities for Specific Product and Condition

Let’s take it further and calculate the total quantity of prod_007 when the quantity is less than 2.

Example: Aggregation with \$sum

db.orders.aggregate([
  { 
    $unwind: "$items"  // Flatten the items array
  },
  { 
    $match: { 
      "items.product_id": "prod_007",  // Match product_id = 'prod_007'
      "items.quantity": { $lt: 2 }     // Match quantity < 2
    }
  },
  { 
    $group: { 
      _id: null,  // Grouping all documents together
      totalQuantity: { $sum: "$items.quantity" }  // Sum the quantities of matching items
    }
  }
]);

Here:

We again use $unwind to flatten the items.
$match filters for items where the product is prod_007 and the quantity is less than 2.
Finally, $group sums the matching quantities.

6. Querying Specific Orders with \$unwind

Sometimes you might want to query a specific order based on its ID, customer, or order date, and also unwind the items array for further analysis. Here’s how you can do it.

Example: Querying Specific Orders

db.orders.aggregate([
  { 
    $match: { 
      "_id": "order007",  // Match the specific order by _id
      "customer_id": "cust_057",  // Optionally match by customer_id if needed
      "order_date": ISODate('2023-07-08T00:00:00.000Z')  // Match by order_date if needed
    }
  },
  { 
    $unwind: "$items"  // Unwind the items array
  }
]);

This query matches an order with the ID order007, belonging to customer cust_057, and orders placed on 2023-07-08. Then, it unwinds the items array so each item is returned as a separate document.

7. Comparing Query Performance: With and Without Indexes

Lastly, let’s discuss performance optimization. Without an index, searching through large arrays in documents can be slow. Let’s compare the performance of querying for prod_007 with and without an index on product_id and quantity.

Example: Query Performance Without Index

// Without index
const startWithoutIndex = new Date().getTime();  // Capture start time
const resultWithoutIndex = db.orders.find({
  "items": {
    $elemMatch: {
      "product_id": "prod_007",
      "quantity": { $gt: 1 }
    }
  }
});

const endWithoutIndex = new Date().getTime();  // Capture end time
const executionTimeWithoutIndex = endWithoutIndex - startWithoutIndex;  // Execution time in ms

print("Execution Time Without Index: " + executionTimeWithoutIndex + " ms");

Example: Creating an Index and Querying Again

// Create index
db.orders.createIndex({
  "items.product_id": 1,
  "items.quantity": 1
});

// With index
const startWithIndex = new Date().getTime();  // Capture start time
const resultWithIndex = db.orders.find({
  "items": {
    $elemMatch: {
      "product_id": "prod_007",
      "quantity": { $gt: 1 }
    }
  }
});
const endWithIndex = new Date().getTime();  // Capture end time
const executionTimeWithIndex = endWithIndex - startWithIndex;  // Execution time in ms

print("Execution Time With Index: " + executionTimeWithIndex + " ms");

// Compare the difference
const timeDifference = executionTimeWithoutIndex - executionTimeWithIndex;
print("Difference in Execution Time: " + timeDifference + " ms");

In this example:

We first capture the execution time of a query without an index.
Then, we create an index on product_id and quantity.
Finally, we compare the execution times, and you’ll see how much faster the query runs with the index in place.

Conclusion

Optimizing queries and understanding how to structure your data in AWS DocumentDB can make a significant difference in performance. Through these exercises, we've learned how to insert large datasets, perform common queries, aggregate data, and optimize performance with indexing. Always test performance before and after adding indexes to ensure your queries run efficiently at scale.

DEV Community