When working with AWS DocumentDB, it's essential to understand how to efficiently query and manipulate large datasets. In this article, we'll go through several practical exercises, showcasing common use cases, and providing examples of how to optimize your database performance using indexes. We’ll focus on operations like inserting documents, performing aggregations, and comparing query performance before and after adding indexes. Let’s dive in!
1. Inserting Multiple Documents into AWS DocumentDB
In many real-world applications, you’ll need to handle large datasets, such as orders and customer transactions. Below, we simulate inserting 1000 orders, each containing a list of items purchased by the customer. The goal here is to bulk-insert data into AWS DocumentDB using the insertMany
function.
Example: Bulk Inserting Orders
let orders = [];
for (let i = 1; i <= 1000; i++) {
let numItems = Math.floor(Math.random() * 301); // Random number of items between 0 and 300
let items = [];
for (let j = 0; j < numItems; j++) {
items.push({
"product_id": "prod_" + String(Math.floor(Math.random() * 10) + 1).padStart(3, '0'),
"quantity": Math.floor(Math.random() * 5) + 1, // Random quantity between 1 and 5
"price": Math.floor(Math.random() * 50) + 10 // Random price between 10 and 50
});
}
let order = {
"_id": "order" + String(i).padStart(3, '0'),
"customer_id": "cust_" + String(Math.floor(Math.random() * 100) + 1).padStart(3, '0'),
"order_date": new ISODate("2023-07-" + String(i % 31 + 1).padStart(2, '0') + "T00:00:00Z"),
"items": items // The dynamically generated items array
};
orders.push(order);
}
// Now insert all orders into the collection
db.orders.insertMany(orders);
In this script:
- We generate 1000 orders, each with a random number of items.
- The items have random product IDs, quantities, and prices.
- After creating the orders, we use
insertMany
to bulk insert the documents into theorders
collection.
Once the insert operation completes, you can check the document count using:
db.orders.countDocuments();
2. Querying Specific Product in Orders
Let’s now query for orders that contain a specific product, such as prod_006
. This is a typical operation when you need to analyze how many times a product appears across all orders.
Example: Query with \$elemMatch
db.orders.countDocuments({
"items": {
$elemMatch: {
"product_id": "prod_006"
}
}
});
In this query:
- We use
$elemMatch
to find documents where at least one item in theitems
array matches the product IDprod_006
.
3. Querying for Products with Specific Quantities
Another common query involves filtering based on both product and quantity. Let’s say we want to find orders that contain prod_007
with quantities greater than 1.
Example: Query with Multiple Conditions
db.orders.countDocuments({
"items": {
$elemMatch: {
"product_id": "prod_007",
"quantity": { $gt: 1 }
}
}
});
This query retrieves orders where:
- The
items
array containsprod_007
, and - The quantity of
prod_007
is greater than 1.
4. Aggregation: Finding Min and Max Quantities
Aggregations are powerful tools for summarizing data. Let’s say we need to find the minimum and maximum quantities of a specific product, like prod_010
, across all orders.
Example: Aggregation Pipeline with \$unwind
db.orders.aggregate([
{
$unwind: "$items" // Flatten the items array
},
{
$match: {
"items.product_id": "prod_010" // Match documents where product_id is 'prod_010'
}
},
{
$group: {
_id: null, // Grouping all documents together
minQuantity: { $min: "$items.quantity" }, // Find the minimum quantity
maxQuantity: { $max: "$items.quantity" } // Find the maximum quantity
}
}
]);
In this aggregation:
-
$unwind
is used to flatten theitems
array, making it easier to work with individual products. -
$match
filters for orders containingprod_010
. -
$group
then calculates the minimum and maximum quantities forprod_010
.
5. Aggregating Quantities for Specific Product and Condition
Let’s take it further and calculate the total quantity of prod_007
when the quantity is less than 2.
Example: Aggregation with \$sum
db.orders.aggregate([
{
$unwind: "$items" // Flatten the items array
},
{
$match: {
"items.product_id": "prod_007", // Match product_id = 'prod_007'
"items.quantity": { $lt: 2 } // Match quantity < 2
}
},
{
$group: {
_id: null, // Grouping all documents together
totalQuantity: { $sum: "$items.quantity" } // Sum the quantities of matching items
}
}
]);
Here:
- We again use
$unwind
to flatten the items. -
$match
filters for items where the product isprod_007
and the quantity is less than 2. - Finally,
$group
sums the matching quantities.
6. Querying Specific Orders with \$unwind
Sometimes you might want to query a specific order based on its ID, customer, or order date, and also unwind the items
array for further analysis. Here’s how you can do it.
Example: Querying Specific Orders
db.orders.aggregate([
{
$match: {
"_id": "order007", // Match the specific order by _id
"customer_id": "cust_057", // Optionally match by customer_id if needed
"order_date": ISODate('2023-07-08T00:00:00.000Z') // Match by order_date if needed
}
},
{
$unwind: "$items" // Unwind the items array
}
]);
This query matches an order with the ID order007
, belonging to customer cust_057
, and orders placed on 2023-07-08
. Then, it unwinds the items
array so each item is returned as a separate document.
7. Comparing Query Performance: With and Without Indexes
Lastly, let’s discuss performance optimization. Without an index, searching through large arrays in documents can be slow. Let’s compare the performance of querying for prod_007
with and without an index on product_id
and quantity
.
Example: Query Performance Without Index
// Without index
const startWithoutIndex = new Date().getTime(); // Capture start time
const resultWithoutIndex = db.orders.find({
"items": {
$elemMatch: {
"product_id": "prod_007",
"quantity": { $gt: 1 }
}
}
});
const endWithoutIndex = new Date().getTime(); // Capture end time
const executionTimeWithoutIndex = endWithoutIndex - startWithoutIndex; // Execution time in ms
print("Execution Time Without Index: " + executionTimeWithoutIndex + " ms");
Example: Creating an Index and Querying Again
// Create index
db.orders.createIndex({
"items.product_id": 1,
"items.quantity": 1
});
// With index
const startWithIndex = new Date().getTime(); // Capture start time
const resultWithIndex = db.orders.find({
"items": {
$elemMatch: {
"product_id": "prod_007",
"quantity": { $gt: 1 }
}
}
});
const endWithIndex = new Date().getTime(); // Capture end time
const executionTimeWithIndex = endWithIndex - startWithIndex; // Execution time in ms
print("Execution Time With Index: " + executionTimeWithIndex + " ms");
// Compare the difference
const timeDifference = executionTimeWithoutIndex - executionTimeWithIndex;
print("Difference in Execution Time: " + timeDifference + " ms");
In this example:
- We first capture the execution time of a query without an index.
- Then, we create an index on
product_id
andquantity
. - Finally, we compare the execution times, and you’ll see how much faster the query runs with the index in place.
Conclusion
Optimizing queries and understanding how to structure your data in AWS DocumentDB can make a significant difference in performance. Through these exercises, we've learned how to insert large datasets, perform common queries, aggregate data, and optimize performance with indexing. Always test performance before and after adding indexes to ensure your queries run efficiently at scale.
Top comments (0)