I was working with my backend colleague on a particular project, going through his code, and I was amazed by how clean and simple the codebase was. I was literally looking for the queries behind some of the implementations he did; the code was really so simple and clean. I was like "how did he do this?" until one very good day he told me to read about MongoDB Aggregation.
After reading and understanding how MongoDB aggregation works, I was really amazed, so I decided to write an article about it.
Before I get started, here is what you will be familiar with after reading this article:
- What is aggregation in MongoDB
- How does the MongoDB aggregation pipeline work
- MongoDB aggregation pipeline syntax
- Practical example of MongoDB aggregation
What is aggregation in MongoDB?
Aggregation is a way of processing a large number of documents in a collection by means of passing them through stages, and these stages are called a Pipeline. These stages in a pipeline can filter, sort, group, reshape, and modify documents, and do much more.
Key Point🎯
- A pipeline can have one or more stages.
- The order of these stages is important.
- This aggregation happens within the database engine, enabling it to handle large datasets efficiently.
How Does the MongoDB Aggregation Pipeline Work?
Here is a diagram to illustrate a typical MongoDB aggregation pipeline:
Image Credit: studio3t
Let's understand each of the stages and what they do:
-
$match
stage - It filters the documents we need to work with, those that fit our needs. -
$group
stage - this is where the aggregation happens. It groups documents by a specified key to perform calculations like sum, average, max, min, and so on. -
$sort
stage - this sorts the documents in ascending or descending order based on specified fields.
There are many more stages; we will explore some others in the example below.
Key Point🎯
- Each stage acts upon the results of the previous stage.
- There can be one or more stages in a pipeline, depending on what you are planning to achieve.
Now that we understand how the pipeline works, let's take a look at the syntax.
MongoDB Aggregate Pipeline Syntax
This is an example of how to build an aggregation query:
db.collectionName.aggregate(pipeline, options)
- where collectionName – is the name of a collection,
- pipeline – is an array that contains the aggregation stages,
- options – optional parameters for the aggregation
This is an example of the aggregation pipeline syntax:
pipeline = [
{ $match : { … } },
{ $group : { … } },
{ $sort : { … } }
]
Let's now see a practical example of how MongoDB aggregation works.
Practical Example Using MongoDB Aggregation
In this example, we'll use a sample dataset of restaurant orders. The goal of this aggregation is to retrieve the average order total and most popular dish for each day of the week, specifically for dine-in orders. The output will be a set of documents containing the day of the week and the corresponding rounded average order total and most frequently ordered dish.
Let's approach this step by step:
Step 1: Filter data by order type
db.restaurantOrders.aggregate([
{ $match: { orderType: "dine-in" } }
])
The $match stage filters the data to include only documents where the orderType is "dine-in".
Step 2: Group data by day of the week and calculate averages and most popular dish
db.restaurantOrders.aggregate([
{ $match: { orderType: "dine-in" } },
{
$group: {
_id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
avgOrderTotal: { $avg: "$orderTotal" },
dishes: { $push: "$mainDish" }
}
}
])
The $group stage groups the data by the day of the week of the orderDate, calculates the average order total, and creates an array of all main dishes ordered on that day.
Step 3: Calculate the most popular dish
db.restaurantOrders.aggregate([
{ $match: { orderType: "dine-in" } },
{
$group: {
_id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
avgOrderTotal: { $avg: "$orderTotal" },
dishes: { $push: "$mainDish" }
}
},
{
$addFields: {
popularDish: {
$reduce: {
input: "$dishes",
initialValue: { name: "", count: 0 },
in: {
$cond: [
{ $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
{ name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
"$$value"
]
}
}
}
}
}
])
The $addFields stage adds a new field popularDish, which uses $reduce to iterate through the dishes array and find the most frequently occurring dish.
Step 4: Sort data by day of the week
db.restaurantOrders.aggregate([
{ $match: { orderType: "dine-in" } },
{
$group: {
_id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
avgOrderTotal: { $avg: "$orderTotal" },
dishes: { $push: "$mainDish" }
}
},
{
$addFields: {
popularDish: {
$reduce: {
input: "$dishes",
initialValue: { name: "", count: 0 },
in: {
$cond: [
{ $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
{ name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
"$$value"
]
}
}
}
}
},
{ $sort: { "_id.dayOfWeek": 1 } }
])
The $sort stage sorts the data by the day of the week in ascending order.
Step 5: Project and format the output
db.restaurantOrders.aggregate([
{ $match: { orderType: "dine-in" } },
{
$group: {
_id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
avgOrderTotal: { $avg: "$orderTotal" },
dishes: { $push: "$mainDish" }
}
},
{
$addFields: {
popularDish: {
$reduce: {
input: "$dishes",
initialValue: { name: "", count: 0 },
in: {
$cond: [
{ $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
{ name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
"$$value"
]
}
}
}
}
},
{ $sort: { "_id.dayOfWeek": 1 } },
{
$project: {
_id: 0,
dayOfWeek: {
$switch: {
branches: [
{ case: { $eq: ["$_id.dayOfWeek", 1] }, then: "Sunday" },
{ case: { $eq: ["$_id.dayOfWeek", 2] }, then: "Monday" },
{ case: { $eq: ["$_id.dayOfWeek", 3] }, then: "Tuesday" },
{ case: { $eq: ["$_id.dayOfWeek", 4] }, then: "Wednesday" },
{ case: { $eq: ["$_id.dayOfWeek", 5] }, then: "Thursday" },
{ case: { $eq: ["$_id.dayOfWeek", 6] }, then: "Friday" },
{ case: { $eq: ["$_id.dayOfWeek", 7] }, then: "Saturday" }
],
default: "Unknown"
}
},
avgOrderTotal: { $round: ["$avgOrderTotal", 2] },
popularDish: "$popularDish.name"
}
}
])
The $project stage reshapes the output by excluding the _id field, converting the numeric day of the week to its string representation, rounding the average order total to two decimal places, and including only the name of the most popular dish.
This example demonstrates how MongoDB Aggregation can be used to process restaurant order data, grouping by day of the week, calculating averages, finding the most popular dish, and formatting the output. It showcases several aggregation stages and operators, providing insights that could be valuable for restaurant management and menu planning.
Conclusion
MongoDB Aggregation is a powerful and flexible tool that can streamline data processing workflows, reduce the complexity of application code, and enable an easy way to extract valuable insights from data more efficiently. Whether you're working with large amounts of data, complex data structures, or any other type of data, MongoDB Aggregation offers a robust set of capabilities to help harness the full potential of your data.
In our restaurant orders example, we've seen how we can use aggregation to:
- Filter data ($match)
- Group and calculate averages ($group)
- Perform complex calculations ($addFields with $reduce)
- Sort results ($sort)
- Reshape and format output ($project)
These operations, when combined, allow us to transform raw data into meaningful insights that can drive business decisions. For instance, a restaurant owner could use this aggregation to understand which days are busiest, what dishes are most popular on different days, and how average order totals vary throughout the week.
Remember, this is just scratching the surface of what's possible with MongoDB Aggregation. As you become more comfortable with these concepts, you'll find that aggregation can be applied to solve a wide variety of data processing challenges efficiently and elegantly.
For more MongoDB aggregation operators and stages, I recommend checking out the official MongoDB documentation or resources like Studio 3T's knowledge base.
Thanks for reading!
Top comments (6)
An aggregation query always returns a result / output as a cursor. You can perform various operations on the result documents using the cursor API (for example,
toArray()
orforEach()
cursor methods). An exception is that when using certain pipeline stages like$out
, the cursor will always be empty.While we develop we occasionally use the MongoDB UI to create aggregation and understand what kind of data we have in DB. Mongo DB is one of the best db out there for real. So helpful. So generous.
I really love it! Aggregation a killer feature.
Antonio, CEO & Founder at Litlyx
100% @litlyx, MongoDB is one of the most powerful databases I have seen out there. Doing complex analytics with it is just mind-blowing. I'm really grateful to you for taking the time to read through it
This is so cool. I work with MongoDB so much, this is definitely something I will be using in my other projects.
I'm glad you find it helpful 😌
I never knew mongodb is so powerful like this