Shubham Dutta

Posted on Feb 12, 2023 • Edited on Apr 19, 2023

MongoDB Aggregation Pipeline - $lookup

#devmeme #watercooler

Introduction:

The Aggregation Pipeline in MongoDB is a strong tool that lets you handle and examine data kept in a MongoDB collection. The $lookup step, which enables you to carry out a left outer join between two collections, is one of the crucial stages in the Aggregation Pipeline. When you need to compile data from various collections into a single document, this stage is quite helpful.

A versatile and effective tool that can handle a variety of usage situations is the $lookup stage. For instance, you could use it to unite a collection with a reference data collection to add more information to your papers or to integrate data from two collections to create a customer order report.

To use the $lookup stage, you need to specify the following parameters:

from: the collection's name that will be merged with the existing one
localField: the current collection's field that serves as the join key
foreignField: the from collection field that serves as the join key
as: The new array field's name in the result document, which will include the join's outcomes

Syntax

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

Pseudo SQL Statement

SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (
   SELECT *
   FROM <collection to join>
   WHERE <foreignField> = <collection.localField>
);

Example:

Orders Collection:

{
  "_id": 1,
  "user_id": 1,
  "total": 100
}

{
  "_id": 2,
  "user_id": 2,
  "total": 200
}

User Collection:

{
  "_id": 1,
  "name": "John"
}

{
  "_id": 2,
  "name": "Jane"
}

You can use the following aggregation pipeline to combine user and order data:

db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "user_id",
      foreignField: "_id",
      as: "user_info"
    }
  }
])

Output:

{
  "_id": 1,
  "user_id": 1,
  "total": 100,
  "user_info": [
    {
      "_id": 1,
      "name": "John"
    }
  ]
}

{
  "_id": 2,
  "user_id": 2,
  "total": 200,
  "user_info": [
    {
      "_id": 2,
      "name": "Jane"
    }
  ]
}

Join Conditions and Subqueries on a Joined Collection

A powerful tool for combining data from two or more collections is the MongoDB $lookup stage.
To further improve the results, you might need to specify conditions or run subqueries on the joined collection in some circumstances. This is where the let and pipeline options come in.
You can specify expressions that can be used in the pipeline option by using the let option. Variables that can be referred to in the pipeline stages can be defined using these expressions.

Syntax:

{
   $lookup:
      {
         from: <combined collection>,
         let: { <variable-1>: <expr>, …, <variable-N>: <expr> },
         pipeline: [ <pipeline to run on combined collection> ],
         as: <resultant array field>
      }
}

Example:

Orders Collection:

{
  "_id": 1,
  "user_id": 1,
  "amount": 300
}

{
  "_id": 2,
  "user_id": 2,
  "amount": 400
}

User Collection:

{
  "_id": 1,
  "name": "Shubham",
  "created_at": ISODate("2021-02-11")
}

{
  "_id": 2,
  "name": "Random",
  "created_at": ISODate("2022-02-12")
}

To join the Orders and Users collections and return only customers who joined in the last year, you can use the following aggregation pipeline:

db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      let: { created_at: "$created_at" },
      pipeline: [
        {
          $match: {
            $expr: {
              $gt: [ "$$created_at", new Date(ISODate().getTime() - 365 * 24 * 60 * 60 * 1000) ]
            }
          }
        }
      ],
      as: "user_info"
    }
  }
])

SQL Statement for the above query:

SELECT *
FROM Orders
JOIN Users
ON Orders.customer_id = Users._id
WHERE Users.created_at > DATEADD(year, -1, GETDATE());

Output:

{
  "_id": 2,
  "user_id": 2,
  "amount": 400,
  "user_info": {
    "_id": 2,
    "name": "Random",
    "created_at": ISODate("2022-02-12")
  }
}

Correlated Subqueries

Correlated subqueries are an effective tool for running complex queries. They enable you to improve the results of a different query by applying the findings from the first.
A correlated subquery is a subquery that references a column from the outer query.
To perform a correlated subquery in MongoDB's aggregation pipeline, you can use the $lookup stage to join a collection to itself. For example, consider the following aggregation pipeline:

db.Orders.aggregate([
  {
    $lookup: {
      from: "Orders",
      let: { user_id: "$user_id" },
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: [
                "$user_id",
                "$$user_id"
              ]
            }
          }
        },
        {
          $group: {
            _id: "$user_id",
            avg_order_amount: { $avg: "$amount" }
          }
        }
      ],
      as: "user_orders"
    }
  },
  {
    $match: {
      "users.created_at": {
        $gt: ISODate("2022-02-05T00:00:00.000Z")
      }
    }
  }
]);

The $lookup stage in this pipeline performs a left outer join with the Orders collection and with itself.
The value of the user id field to be used in the subquery is specified by the let expression. The subquery to be executed on the joined collection is specified by the pipeline expression.
The $match stage of this subquery specifies the requirement that only orders placed by the relevant user should be taken into account, and the $group stage determines the average order value for each user.
The as expression specifies the name of the new array field in the result document that will contain the results of the subquery.
Only users who joined within the past year are included in the result set after being filtered by the pipeline's final $match stage.
A document containing the user ID, order date, and average order amount for each new customer added in the previous year is the output of this pipeline.

SQL Statement for the above query:

SELECT customer_id, order_date,
(SELECT AVG(total_amount)
FROM Orders
WHERE Orders.customer_id = Customers._id) AS avg_order_amount
FROM Customers
WHERE joined_date > DATEADD(year, -1, GETDATE());

Conclusion:

An effective feature that enables users to carry out complicated data analysis and manipulation is the MongoDB aggregation pipeline. You can make the most of this feature and maximise the potential of your MongoDB data by comprehending the architecture and pipeline stages as well as the usage scenarios for which they are best suited. Follow Shubham Dutta for more content like this.

DEV Community

MongoDB Aggregation Pipeline - $lookup

Introduction:

Syntax

Pseudo SQL Statement

Orders Collection:

User Collection:

Join Conditions and Subqueries on a Joined Collection

Orders Collection:

User Collection:

Correlated Subqueries

Conclusion:

Top comments (0)