Doing SQL-style ROLLUPs in MongoDB
This article was written by Robert Jones.
Have you ever found yourself wanting to create queries in MongoDB to aggregate data at multiple levels, such as aggregating total sales by city, then by state, then by country? Have you ever wished MongoDB had a SQL-style ROLLUP extension? Well, keep reading as this article will provide tips on how to leverage the power of MongoDB’s aggregation pipelines to build your own ROLLUP-style queries.
MongoDB’s aggregation pipeline supports many different stages and operators that allow users to create very powerful and expressive queries. The $group stage enables grouped aggregations and will form the foundation of the aggregation pipeline that we build to perform a ROLLUP-style query. How can we use $group to give us a query result with multiple levels of aggregation like ROLLUP? Let’s jump right in with an example!
Example scenario: Retail store ROLLUP
Let’s imagine a scenario where we need to generate a ROLLUP-style report for a retail store chain that stocks various products. The chain has many stores located in different cities, and for each store, they track the product id, product name, number of units of the product in stock, and the average selling price of the product at that store.
Here’s an example document to make things a little more clear:
The report that we want to generate should do a roll up of all stores’ stock and average selling price for all stocked products and should produce multiple levels of aggregation to include the individual stores, then aggregated to all stores in each city, then aggregated to all cities in each province, then to all provinces in each country, then finally to a result that for each product stocked will return a single nested JSON document capturing these levels of hierarchy.
An example will probably help to make this more clear:
{
"countries": [
{
"country": "Canada",
"provinces": [
{
"province": "Ontario",
"cities": [
{
"city": "Toronto",
"stores": [
{
"store_id": "STORE004",
"store_name": "Store 4",
"stock_on_hand": 166,
"average_selling_price": 8.2775
}
],
"stock_on_hand": 166,
"average_selling_price": 8.2775
}
],
"stock_on_hand": 166,
"average_selling_price": 8.2775
},
{...}
],
"stock_on_hand": 771,
"average_selling_price": 12.097333333333333
}
],
"product": "Notebook",
"product_id": "P005",
"average_selling_price": 11.374041666666667,
"stock_on_hand": 1251
As you can see in the result above, we have a JSON document that represents the Notebook product and rolls up the average selling price and stock on hand at each desired level of hierarchy. This is merely one example of how to roll up data in MongoDB. However, there are many other ways to produce the roll up of data that may better suit your needs. Courtesy of MongoDB’s aggregation pipelines, any number of different projections of roll up data can be achieved.
One advantage of structuring the roll up data in such a way is that, unlike with traditional SQL ROLLUPs, a self-contained JSON document containing the entire hierarchy of roll ups exists for each product. This makes accessing all of the roll up data from code incredibly easy compared to SQL, where you have to process the returned rows of data in order and stitch them back together into an object model that will likely end up mirroring the object structure that the JSON documents returned from MongoDB will already provide to you!
So you’re probably wondering what the MongoDB aggregation pipeline looks like to generate such a useful result of JSON roll up documents! The entire aggregation pipeline used is provided below.
It may look a little daunting but each grouping stage actually follows a very similar pattern, with the earlier grouping stages grouping across more fields since they are the more fine-grained groups and the later stages grouping across fewer fields since they are the more coarsely-grained groups. Each level of hierarchy has its own $group stage and then the top level document has a final $group stage before ending with a $project stage to provide a prettier output document.
One final tip: If you’re like me and you’re not looking forward to creating a big pipeline for a deeply nested rollup query by hand, you can use a modern LLM to generate the aggregation pipeline for you. In fact, I used an LLM to generate the sample data and the aggregation pipeline below!
Example MongoDB ROLLUP aggregation pipeline
[
// Stage 1: Group by Store Level for Product-Level Summaries
{
$group: {
_id: {
store_id: "$store_id",
store_name: "$store_name",
city: "$location.city",
province: "$location.province",
country: "$location.country",
product_id: "$product_id",
product_name: "$product_name"
},
stock_on_hand_sum: { $sum: "$stock_on_hand" },
average_selling_price_avg: { $avg: "$average_selling_price" }
}
},
// Stage 2: Group by City Level (Aggregate by City)
{
$group: {
_id: {
city: "$_id.city",
province: "$_id.province",
country: "$_id.country",
product_id: "$_id.product_id",
product_name: "$_id.product_name"
},
stores: {
$push: {
store_id: "$_id.store_id",
store_name: "$_id.store_name",
stock_on_hand: "$stock_on_hand_sum",
average_selling_price: "$average_selling_price_avg"
}
},
stock_on_hand_sum: { $sum: "$stock_on_hand_sum" },
average_selling_price_avg: { $avg: "$average_selling_price_avg" }
}
},
// Stage 3: Group by Province Level (Aggregate by Province)
{
$group: {
_id: {
province: "$_id.province",
country: "$_id.country",
product_id: "$_id.product_id",
product_name: "$_id.product_name"
},
cities: {
$push: {
city: "$_id.city",
stores: "$stores",
stock_on_hand: "$stock_on_hand_sum",
average_selling_price: "$average_selling_price_avg"
}
},
stock_on_hand_sum: { $sum: "$stock_on_hand_sum" },
average_selling_price_avg: { $avg: "$average_selling_price_avg" }
}
},
// Stage 4: Group by Country Level (Aggregate by Country)
{
$group: {
_id: {
country: "$_id.country",
product_id: "$_id.product_id",
product_name: "$_id.product_name"
},
provinces: {
$push: {
province: "$_id.province",
cities: "$cities",
stock_on_hand: "$stock_on_hand_sum",
average_selling_price: "$average_selling_price_avg"
}
},
stock_on_hand_sum: { $sum: "$stock_on_hand_sum" },
average_selling_price_avg: { $avg: "$average_selling_price_avg" }
}
},
// Stage 5: Group by Products as Top-Level
{
$group: {
_id: {
product_id: "$_id.product_id",
product_name: "$_id.product_name"
},
countries: {
$push: {
country: "$_id.country",
provinces: "$provinces",
stock_on_hand: "$stock_on_hand_sum",
average_selling_price: "$average_selling_price_avg"
}
},
stock_on_hand_sum: { $sum: "$stock_on_hand_sum" }, // Total global stock for the product
average_selling_price_avg: { $avg: "$average_selling_price_avg" } // Global average price for the product
}
},
// Stage 6: Reshape Final Output Structure
{
$project: {
_id: 0,
product: "$_id.product_name",
product_id: "$_id.product_id",
average_selling_price: "$average_selling_price_avg",
stock_on_hand: "$stock_on_hand_sum",
countries: 1
}
}
]
Closing thoughts
Now you’ve seen how easy it is to create ROLLUPs in MongoDB and how much more usable these ROLLUPs are courtesy of the natural ability of JSON to store hierarchies such as ROLLUPs.
I encourage you to try writing your own ROLLUPs in MongoDB and strongly recommend using our free interactive tool, MongoDB Compass, to help build your aggregation pipelines. I find it easiest to build and test pipelines one stage at a time which the Compass tool makes very easy to do with its aggregation builder. I also encourage you to try out the new gen AI-powered Natural Language Querying feature in Compass that can take a natural language query that you express in your own words and help to turn it into an aggregation pipeline automatically.
If you don’t already have an Atlas Cloud account, you can register an account and use MongoDB’s free cluster to get started building your own ROLLUPs in MongoDB!
Top comments (0)