If you have worked with MongoDB it's highly probable that you have used aggregation pipelines. And its a very common use-case that you need to connect documents from one collection to another, aka 'The Join Query' from our SQL days.
Let's elaborate the problem statement first. We have 2 collections, orders
and products
.
Every order has some user related information and a list of products which the user has ordered along with its quantity.
sample order object
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: [
{
productId: ObjectId(...),
quantity: 1,
},
{
productId: ObjectId(...),
quantity: 3,
},
]
...
}
sample product object
{
...
_id: ObjectId(...),
name: 'Macbook Pro M1',
description: 'All systems pro.',
price: 1200,
...
}
The result we want from aggregation
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: [
{
productId: ObjectId(...),
quantity: 1,
// this got added
product: {
...
_id: ObjectId(...),
name: 'Macbook Pro M1',
description: 'All systems pro.',
price: 1200,
...
}
},
{
productId: ObjectId(...),
quantity: 3,
// this too
product: {
...
_id: ObjectId(...),
name: 'iPhone 12',
description: 'Blast past fast.',
price: 900,
...
}
},
]
...
}
Here you can see that we have inflated the product details inside the products array.
At the time of writing this article, I couldn't find any sweet and easy way of achieving this, so I decided to dive in a little deeper.
A point to note here, if the product array would have been something like products: [ObjectId(...), ObjectId(...)]
then it would be a breeze. But since we also need to accommodate quantity
alongside the productId, it's not quite feasible (at least with the current schema).
So, let's get started with the solution.
Step 1: Split the array
Use the $unwind
operator to split the products array so that each document formed will have only one product instead of the entire array.
{
$unwind: {
path: '$products'
}
}
Now each of our unwound orders would looks something like this
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: {
productId: ObjectId(...),
quantity: 1,
}
...
}
This looks simpler, but wait, where are the rest of the ordered products?
It turns out, using $unwind
would split one document into multiple documents where all the data is identical except the one list on which $unwind
is called.
Here, it will split our sample document into 2 documents like the following
#doc 1
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: {
productId: ObjectId(...),
quantity: 1,
}
...
}
#doc 2
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: {
productId: ObjectId(...),
quantity: 3,
}
...
}
So, I hope it's clear that, if you have n orders with m products in each, it will generate a total m*n
documents.
Step 2: Inflate product details
This is the important step, fetching product details from another collection
{
$lookup: {
from: 'products',
localField: 'products.productId',
foreignField: '_id',
as: 'products.product'
}
}
this step inflates the data and makes it look something like the following
{
...
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: {
productId: ObjectId(...),
quantity: 1,
product: [
{
...
_id: ObjectId(...),
name: 'Macbook Pro M1',
description: 'All systems pro.',
price: 1200,
...
}
]
}
...
}
If you notice one thing, the product is inflated, but its an array instead of an object. The reason being, mongodb expects that multiple documents in the products
table may match with this criteria, and hence by default puts all the docs in an array. In our case we know its going to be unique, so we can just unwind it out.
Step 2a: Unwind the product inflated
{
$unwind: {
path: '$products.product'
}
}
Cool! we got the data we needed. Now we can merge those docs up to make it into original.
Step 3: Merging the docs
the key for this step is $group
. we will push all the products into an array.
But, here is a small catch!
For grouping them together, you need to preserve all the other fields as well. like userId
, createdAt
and _id
.
One way is, put everything in _id
of your group, something like
{
$group: {
_id: {
_id: '$_id',
userId: '$userId',
createdAt: '$createdAt'
}
}
}
BUT...
there are 3 shortcomings
- Difficult to execute if docs have different keys.
- High chances of forgetting/mistyping any field
- If the schema is altered even slightly, you need to go through the pipeline and rewrite this part again.
As a typical engineer who believes in writing 20 smart lines to avoid hardcoding 5 lines, I came up with a way to make this bit future proof.
The key idea is, merge on the basis of _id
alone, and then lookup itself to get a fresh copy of the doc.
Step 3a: Join on basis of _id
{
_id: '$_id',
products: {
$push: '$products'
}
}
Here, I am pushing elements to products
after picking them from products
, it might sound confusing, but what it does is that, it takes all of the products
from existing docs, put them in an array and then call it products in the end.
Step 3b: Fetch a fresh copy of orders for regeneration
{
$lookup: {
from: 'orders',
localField: '_id',
foreignField: '_id',
as: 'orderDetails'
},
$unwind: {
path: '$orderDetails'
}
}
This is pretty straightforward, shouldn't need much explaining. At this point the doc looks something like this
{
_id: ObjectId(...),
products: [
{
productId: ObjectId(...),
quantity: 1,
product: {
...
}
},
...
],
orderDetails: {
_id: ObjectId(...),
userId: ObjectId(...),
createdAt: 1622928483,
products: [
{
// No product: {...} here, since its original doc
productId: ObjectId(...),
quantity: 1
}
]
}
}
I know it looks weird, don't worry. We have some beautification left.
Step 3c: Beautify and reshape the entire doc
All we need to do is put all the stuff in their proper places.
We shall do it in 2 steps.
- Put the
products: {}
from root toorderDetails.products
. - Make the
orderDetails
as the root doc
Something like this
{
$addFields: {
'orderDetails.products': '$products'
}
}
and then
{
$replaceRoot: {
newRoot: '$orderDetails'
}
}
This one takes the contents inside orderDetails
and make it as the root document.
And here we have our required order object with inflated product details!
The complete aggregation looks something like this
[
{
$unwind: {
path: '$products'
}
},
{
$lookup: {
from: 'products',
localField: 'products.productId',
foreignField: '_id',
as: 'products.product'
}
},
{
$unwind: {
path: '$products.product'
}
},
{
$group: {
_id: '$_id',
products: {
$push: '$products'
}
}
},
{
$lookup: {
from: 'orders',
localField: '_id',
foreignField: '_id',
as: 'orderDetails'
}
},
{
$unwind: {
path: '$orderDetails'
}
},
{
$addFields: {
'orderDetails.products': '$products'
}
},
{
$replaceRoot: {
newRoot: '$orderDetails'
}
}
]
So, here we have it!
Inflating array of items using keys nested inside objects.
If there is a better way, let me know. I'd be happy to make corrections.
Plus this is my first Dev.to post, so everyone please let me know ways to improve my quality of contribution here.
Happy coding :D
Top comments (8)
Maybe you can use three stage to do this:
$lookup stage: join
products.productId
of collection products to collection orders.$addFields stage: add and overwrite the products field。The every element of old products merge a product whitch from productDetail.
$unset stage: remove the productDetail filed.
Your solution is 1 case maybe wrong. that is {$indexOfArray:['$products','$$product']} . indexOfArray return -1 if $$product not in $products.
then $arrayElemAt will return the last item when received the value is -1. It mean your case is wrong. These above will return the last item instead NOT FOUND as normally.
Thank you for this! Have not found it anywhere else!
If it helps - the answer in this post by Gary Wild allowed me to use objectIds as reference values: stackoverflow.com/questions/410936... .
To @phatlaunchdeck , don't know if this is what you asked, but if you repeat step 2 for as many times as you have different references (I have, for example, products AND inWarehouse refernces), and then move on to step 3, this aggregation still does the trick!
Man I spent countless days searching for a solution on my own project which has the exact model schema (with slightly different fields names), thank you so much!
One thing I would like to ask if you don't mind: in my schema I actually have 2 arrays that have the same structure with your "products". If I apply the above method, only the latter array would be populated, I guess that is because of the "re-fetch root document step" would override the first array that gets populated.
Any idea on this?
Hey, thats a good question. At this point I cannot find any good solution for the scenario, but one thing I've learnt lately is that if your resource is read intensive, you should consider having a structure of data that allows you to fetch the documents without multiple lookups. Even if this means duplication of data, as long as you have high read to write ratio (say 100:1) or even higher, then it still results in a optimal strategy.
Should replicate the same issue on stackoverflow :D noice job!. Thanx
Thanks for writing about this approach, this approach helped me improve API latency by up to 50%, compared to the traditional populate approach.
Glad that it helped