DEV Community

loading...

MongoDB lookup on an array of objects which contains (foreign) ObjectId as a key

Krishna Bose
A passionate coder | Interested in App development and Augmented Reality | Slowly working my way towards that 'next big thing'.
・4 min read

If you have worked with MongoDB it's highly probable that you have used aggregation pipelines. And its a very common use-case that you need to connect documents from one collection to another, aka 'The Join Query' from our SQL days.

Let's elaborate the problem statement first. We have 2 collections, orders and products.
Every order has some user related information and a list of products which the user has ordered along with its quantity.

sample order object

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
        },
        {
            productId: ObjectId(...),
            quantity: 3,
        },
    ]
    ...
}
Enter fullscreen mode Exit fullscreen mode

sample product object

{
   ...
   _id: ObjectId(...),
   name: 'Macbook Pro M1',
   description: 'All systems pro.',
   price: 1200,
   ...
}
Enter fullscreen mode Exit fullscreen mode

The result we want from aggregation

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
             // this got added
            product: {
               ...
               _id: ObjectId(...),
               name: 'Macbook Pro M1',
               description: 'All systems pro.',
               price: 1200,
               ...
            }
        },
        {
            productId: ObjectId(...),
            quantity: 3,
             // this too
            product: {
               ...
               _id: ObjectId(...),
               name: 'iPhone 12',
               description: 'Blast past fast.',
               price: 900,
               ...
            }
        },
    ]
    ...
}
Enter fullscreen mode Exit fullscreen mode

Here you can see that we have inflated the product details inside the products array.
At the time of writing this article, I couldn't find any sweet and easy way of achieving this, so I decided to dive in a little deeper.

A point to note here, if the product array would have been something like products: [ObjectId(...), ObjectId(...)] then it would be a breeze. But since we also need to accommodate quantity alongside the productId, it's not quite feasible (at least with the current schema).

So, let's get started with the solution.

Step 1: Split the array

Use the $unwind operator to split the products array so that each document formed will have only one product instead of the entire array.

{
    $unwind: {
        path: '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

Now each of our unwound orders would looks something like this

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

This looks simpler, but wait, where are the rest of the ordered products?
It turns out, using $unwind would split one document into multiple documents where all the data is identical except the one list on which $unwind is called.
Here, it will split our sample document into 2 documents like the following

#doc 1
{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
    }
    ...
}
#doc 2
{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 3,
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

So, I hope it's clear that, if you have n orders with m products in each, it will generate a total m*n documents.

Step 2: Inflate product details

This is the important step, fetching product details from another collection

{
    $lookup: {
        from: 'products',
        localField: 'products.productId',
        foreignField: '_id',
        as: 'products.product'
    }
}
Enter fullscreen mode Exit fullscreen mode

this step inflates the data and makes it look something like the following

{
    ...
    _id: ObjectId(...),
    userId: ObjectId(...),
    createdAt: 1622928483,
    products: {
        productId: ObjectId(...),
        quantity: 1,
        product: [
            {
               ...
               _id: ObjectId(...),
               name: 'Macbook Pro M1',
               description: 'All systems pro.',
               price: 1200,
               ...
            }
        ]
    }
    ...
}
Enter fullscreen mode Exit fullscreen mode

If you notice one thing, the product is inflated, but its an array instead of an object. The reason being, mongodb expects that multiple documents in the products table may match with this criteria, and hence by default puts all the docs in an array. In our case we know its going to be unique, so we can just unwind it out.

Step 2a: Unwind the product inflated

{
    $unwind: {
        path: '$products.product'
    }
}
Enter fullscreen mode Exit fullscreen mode

Cool! we got the data we needed. Now we can merge those docs up to make it into original.

Step 3: Merging the docs

the key for this step is $group. we will push all the products into an array.

But, here is a small catch!
For grouping them together, you need to preserve all the other fields as well. like userId, createdAt and _id.
One way is, put everything in _id of your group, something like

{
    $group: {
        _id: {
            _id: '$_id',
            userId: '$userId',
            createdAt: '$createdAt'
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

BUT...
there are 3 shortcomings

  1. Difficult to execute if docs have different keys.
  2. High chances of forgetting/mistyping any field
  3. If the schema is altered even slightly, you need to go through the pipeline and rewrite this part again.

As a typical engineer who believes in writing 20 smart lines to avoid hardcoding 5 lines, I came up with a way to make this bit future proof.

The key idea is, merge on the basis of _id alone, and then lookup itself to get a fresh copy of the doc.

Step 3a: Join on basis of _id

{
    _id: '$_id',
    products: {
        $push: '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

Here, I am pushing elements to products after picking them from products, it might sound confusing, but what it does is that, it takes all of the products from existing docs, put them in an array and then call it products in the end.

Step 3b: Fetch a fresh copy of orders for regeneration

{
    $lookup: {
        from: 'orders',
        localField: '_id',
        foreignField: '_id',
        as: 'orderDetails'
    },
    $unwind: {
        path: '$orderDetails'
    }
}
Enter fullscreen mode Exit fullscreen mode

This is pretty straightforward, shouldn't need much explaining. At this point the doc looks something like this

{
    _id: ObjectId(...),
    products: [
        {
            productId: ObjectId(...),
            quantity: 1,
            product: {
                ...
            }
        },
        ...
    ],
    orderDetails: {
        _id: ObjectId(...),
        userId: ObjectId(...),
        createdAt: 1622928483,
        products: [
            {
                // No product: {...} here, since its original doc
                productId: ObjectId(...),
                quantity: 1
            }
        ]
    }
}
Enter fullscreen mode Exit fullscreen mode

I know it looks weird, don't worry. We have some beautification left.

Step 3c: Beautify and reshape the entire doc

All we need to do is put all the stuff in their proper places.
We shall do it in 2 steps.

  1. Put the products: {} from root to orderDetails.products.
  2. Make the orderDetails as the root doc

Something like this

{
    $addFields: {
        'orderDetails.products': '$products'
    }
}
Enter fullscreen mode Exit fullscreen mode

and then

{
    $replaceRoot: {
        newRoot: '$orderDetails'
    }
}
Enter fullscreen mode Exit fullscreen mode

This one takes the contents inside orderDetails and make it as the root document.

And here we have our required order object with inflated product details!

The complete aggregation looks something like this

[
    {
        $unwind: {
            path: '$products'
        }
    },
    {
        $lookup: {
            from: 'products',
            localField: 'products.productId',
            foreignField: '_id',
            as: 'products.product'
        }
    },
    {
        $unwind: {
            path: '$products.product'
        }
    },
    {
        $group: {
            _id: '$_id',
            products: {
                $push: '$products'
            }
        }
    },
    {
        $lookup: {
            from: 'orders',
            localField: '_id',
            foreignField: '_id',
            as: 'orderDetails'
        }
    },
    {
        $unwind: {
            path: '$orderDetails'
        }
    },
    {
        $addFields: {
            'orderDetails.products': '$products'
        }
    },
    {
        $replaceRoot: {
            newRoot: '$orderDetails'
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

So, here we have it!
Inflating array of items using keys nested inside objects.

If there is a better way, let me know. I'd be happy to make corrections.
Plus this is my first Dev.to post, so everyone please let me know ways to improve my quality of contribution here.

Happy coding :D

Discussion (1)

Collapse
phatlaunchdeck profile image
Hoang-Phat Pham

Man I spent countless days searching for a solution on my own project which has the exact model schema (with slightly different fields names), thank you so much!

One thing I would like to ask if you don't mind: in my schema I actually have 2 arrays that have the same structure with your "products". If I apply the above method, only the latter array would be populated, I guess that is because of the "re-fetch root document step" would override the first array that gets populated.

Any idea on this?