DEV Community

Cover image for Mastering MongoDB Aggregation Pipelines: A Developer’s Complete Guide
Yukti Sahu
Yukti Sahu

Posted on

Mastering MongoDB Aggregation Pipelines: A Developer’s Complete Guide

Mastering MongoDB Aggregation Pipelines: A Developer’s Complete Guide

When I started coding, I first learned SQL. I wrote queries with WHERE, HAVING, and even did JOIN across tables to find relationships and insights.

Later, when I moved to MongoDB, I mostly used it for simple things — insert, find, update, delete. I only knew a few operators like $set, $or, $in. Honestly, I thought MongoDB was just about basic CRUD.

But recently I found something new — Aggregation Pipelines.

This felt like a next level of MongoDB for me. I realized you can actually:

  • filter data like WHERE,
  • group results like GROUP BY,
  • and even do joins using $lookup.

In this article, I’ll explain everything I learned:

  • What pipelines are and how they work
  • Why operators in MongoDB start with $
  • The most useful stages like $match, $lookup, $addFields, $project
  • SQL comparisons to make it easier to understand
  • A line-by-line breakdown of a real controller I wrote in my project

By the end, you’ll see how MongoDB queries can be as powerful as SQL — just written in a different style.


What is an Aggregation Pipeline?

Think of a pipeline as a step-by-step process. Each stage takes documents, processes them, and passes the result to the next stage.

You can imagine it like a filter chain:

  1. First stage → filter users
  2. Second stage → join their subscriptions
  3. Third stage → add new fields
  4. Final stage → select only required fields

Dummy structure

db.collection.aggregate([
  { stage1 },
  { stage2 },
  { stage3 }
])
Enter fullscreen mode Exit fullscreen mode

Each {} is a stage, and the order matters. The output of one stage becomes the input of the next.

👉 Thing to remember: In the first stage, the amount of data you take (for example, if there are 100 documents and you keep 50) becomes the input for the next stage.


Why $ before operators?

In MongoDB, every stage or operator starts with $. It simply tells MongoDB — “Hey, this is a special command, not a field name.”

For example:

  • $match → like WHERE
  • $group → like GROUP BY
  • $project → like SELECT
  • $lookup → like JOIN

So whenever you see $, just think of it as an operator keyword.


Aggregation Stages with SQL Comparison

MongoDB Stage SQL Equivalent Example
$match WHERE Filter users with age > 20
$project SELECT col1, col2 Show only name and email
$group GROUP BY Count users by country
$sort ORDER BY Sort users by join date
$limit LIMIT Show only top 10 users
$lookup JOIN Join users with subscriptions

Real Example From My Project

This controller fetches a user’s channel profile (like YouTube).

const getUserChannelProfile = asyncHandler(async (req, res) => {
  // taking username from url
  const { username } = req.params;
  if (!username?.trim()) throw new ApiError(400, "Channel id is required");

  const channel = await User.aggregate([
    { $match: { username: username?.toLowerCase() } },
    {
      // first stage:
      $lookup: {
        from: "Subscription",
        localField: "_id",
        foreignField: "channel",
        as: "subscribers",
      },
    },
    {
      // second stage:
      $lookup: {
        from: "Subscription",
        localField: "_id",
        foreignField: "subscriber",
        as: "subscribedTo",
      },
    },
    {
      // third stage:
      $addFields: {
        subscribersCount: { $size: "$subscribers" },
        channelsSubscribedToCount: { $size: "$subscribedTo" },
        issSubscribed: {
          $cond: {
            if: { $in: [req.user?._id, "$subscribers.subscriber"] },
            then: true,
            else: false,
          },
        },
      },
    },
    {
      // fourth stage:
      $project: {
        fullname: 1,
        username: 1,
        avatar: 1,
        coverImage: 1,
        subscribersCount: 1,
        channelsSubscribedToCount: 1,
        issSubscribed: 1,
        email: 1,
        createdAt: 1,
      },
    },
  ]);

  if (!channel || channel?.length === 0)
    throw new ApiError(404, "Channel not found");

  return res
    .status(200)
    .json(new ApiResponse(200, "Channel fetched successfully", channel[0]));
});
Enter fullscreen mode Exit fullscreen mode

Line by Line Breakdown

Input Check

const { username } = req.params;
if (!username?.trim()) throw new ApiError(400, "Channel id is required");
Enter fullscreen mode Exit fullscreen mode

We make sure the request has a username. If not, throw an error.

First Stage → $match

{ $match: { username: username?.toLowerCase() } }
Enter fullscreen mode Exit fullscreen mode

Like SQL WHERE username = 'someuser'. This stage filters the users collection to only that channel.

Second Stage → $lookup for Subscribers

{
  $lookup: {
    from: "Subscription",
    localField: "_id",
    foreignField: "channel",
    as: "subscribers",
  },
}
Enter fullscreen mode Exit fullscreen mode

This joins the user with the Subscription collection.

  • localField: "_id" → user’s id
  • foreignField: "channel" → subscription’s channel field
  • Result stored in subscribers array

Like SQL:

SELECT *
FROM Users u
JOIN Subscription s ON u._id = s.channel
Enter fullscreen mode Exit fullscreen mode

Third Stage → $lookup for Subscribed Channels

{
  $lookup: {
    from: "Subscription",
    localField: "_id",
    foreignField: "subscriber",
    as: "subscribedTo",
  },
}
Enter fullscreen mode Exit fullscreen mode

This gets all the channels the user has subscribed to.

Fourth Stage → $addFields

{
  $addFields: {
    subscribersCount: { $size: "$subscribers" },
    channelsSubscribedToCount: { $size: "$subscribedTo" },
    issSubscribed: {
      $cond: {
        if: { $in: [req.user?._id, "$subscribers.subscriber"] },
        then: true,
        else: false,
      },
    },
  },
}
Enter fullscreen mode Exit fullscreen mode

We add new computed fields:

  • subscribersCount → count of subscribers
  • channelsSubscribedToCount → count of subscriptions
  • issSubscribed → is the current user subscribed (using $in)

Fifth Stage → $project

{
  $project: {
    fullname: 1,
    username: 1,
    avatar: 1,
    coverImage: 1,
    subscribersCount: 1,
    channelsSubscribedToCount: 1,
    issSubscribed: 1,
    email: 1,
    createdAt: 1,
  },
}
Enter fullscreen mode Exit fullscreen mode

We only keep the required fields. Like SQL SELECT fullname, username, avatar, ....


Practicing Pipelines

You can practice this in MongoDB Compass:

  1. Open your database.
  2. Go to the Aggregations tab.
  3. Start adding stages one by one.
  4. You’ll see the transformation after each stage.

MongoDB also has free sample datasets (movies, Airbnb) — great for practice.


Advanced Operators (for later)

Once you’re comfortable with the basics, these powerful operators can level you up:

  • $unwind: Deconstruct an array field and output one document per element. Great for working with array items individually.
  • $facet: Run multiple pipelines in one stage to produce multiple outputs (e.g., average, sum, count) simultaneously.
  • $bucket: Group documents into custom ranges (ideal for histograms like age or price ranges).
  • $bucketAuto: Auto-determine bucket boundaries based on data distribution.
  • $graphLookup: Recursive search for hierarchical/graph data (org charts, trees).
  • $merge: Write pipeline results to a collection (insert/update). Useful for materialized views or summary collections.

Conclusion

For a long time, I thought MongoDB was just about CRUD. But learning aggregation pipelines showed me that it’s just as powerful as SQL — only the style is different.

The key is to think step by step:

  • Filter ($match)
  • Join ($lookup)
  • Compute ($addFields)
  • Select ($project)

That’s how I wrote my own channel profile API — and it felt like building SQL queries, just in MongoDB’s way.

If you’re learning MongoDB, aggregation pipelines are the real deal. Mastering them will take your backend skills to the next level. 🚀

Thanks for reading!

Top comments (0)