Shanu

Posted on Aug 3

What is Aggregation pipeline? how to write it? Learn with example

#pubsub #mongodb #pipeline #aggregatepipeline

Have you ever wondered how data can be transformed and analyzed within a database? Imagine you have a large collection of data and you want to perform complex queries to get meaningful insights. This is where the Aggregation Pipeline in MongoDB comes into the picture!

What is the Aggregation Pipeline?

The Aggregation Pipeline is a framework for data aggregation in MongoDB. It consists of a series of stages where each stage transforms the documents as they pass through. Think of it as a conveyor belt in a factory, where each stage applies a specific operation to the items on the belt. By the end of the pipeline, you get the final transformed data ready for analysis.

Instagram like and Follower Count: Scenario

Absolutely! Let's use a real-world example that's easier to understand. We'll use an Instagram-like scenario to explain the Aggregation Pipeline. Imagine you want to find out how many followers a user has and how many accounts they follow, along with their total number of posts. Let's see how to do this step by step.

Why Use the Aggregation Pipeline?

The Aggregation Pipeline is incredibly powerful and flexible. It allows you to:

Filter data
Sort data
Group data
Transform data

How to Write an Aggregation Pipeline?

Setting Up the Example

Let’s say we have a MongoDB collection named users with the following documents:

json

[
  {
    "_id": 1,
    "username": "userA",
    "followers": [2, 3], // userB and userC follow userA
    "following": [4, 5], // userA follows userD and userE
    "posts": 10
  },
  {
    "_id": 2,
    "username": "userB",
    "followers": [1],
    "following": [3, 4],
    "posts": 5
  },
  {
    "_id": 3,
    "username": "userC",
    "followers": [],
    "following": [1],
    "posts": 8
  },
  {
    "_id": 4,
    "username": "userD",
    "followers": [1, 2],
    "following": [],
    "posts": 3
  },
  {
    "_id": 5,
    "username": "userE",
    "followers": [1],
    "following": [1, 4],
    "posts": 7
  }
]

Creating the Aggregation Pipeline

Step 1: Count Followers

We want to count how many followers each user has. Here’s how you can do it:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followerCount: { $size: "$followers" }
    }
  }
])

Explanation

$project Stage: This stage helps us to create a new structure. We keep the username field and add a new field followerCount which uses $size to count the number of elements in the followers array.

Step 2: Count Following

Next, let’s count how many accounts each user is following:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followingCount: { $size: "$following" }
    }
  }
])

Step 3: Combine the Stages

To get all the information in one go, we can combine the stages:

javascript

db.users.aggregate([
  {
    $project: {
      username: 1,
      followerCount: { $size: "$followers" },
      followingCount: { $size: "$following" },
      postCount: "$posts"
    }
  }
])

Breakdown of the Combined Pipeline

$project Stage: We keep the username, posts (renamed to postCount), and add two new fields: followerCount and followingCount. Both new fields use $size to count the elements in the respective arrays.

Running the Aggregation Pipeline

When you run this pipeline, MongoDB processes each document and returns the result:

json

[
  {
    "username": "userA",
    "followerCount": 2,
    "followingCount": 2,
    "postCount": 10
  },
  {
    "username": "userB",
    "followerCount": 1,
    "followingCount": 2,
    "postCount": 5
  },
  {
    "username": "userC",
    "followerCount": 0,
    "followingCount": 1,
    "postCount": 8
  },
  {
    "username": "userD",
    "followerCount": 2,
    "followingCount": 0,
    "postCount": 3
  },
  {
    "username": "userE",
    "followerCount": 1,
    "followingCount": 2,
    "postCount": 7
  }
]

Conclusion

The Aggregation Pipeline is like having a magic wand for your data. It simplifies complex data processing and allows you to extract meaningful insights with ease. By breaking down the operations into stages, you can transform and analyze your data efficiently.

So next time you find yourself faced with a daunting data analysis task, remember the power of the Aggregation Pipeline. It’s your go-to tool for turning raw data into valuable information, making your work not only easier but also more enjoyable.

DEV Community

What is Aggregation pipeline? how to write it? Learn with example

What is the Aggregation Pipeline?

Instagram like and Follower Count: Scenario

Why Use the Aggregation Pipeline?

How to Write an Aggregation Pipeline?

Creating the Aggregation Pipeline

Conclusion

Top comments (0)

Read next

Resolving MongoDB Error When Starting with Homebrew on macOS

Mongoose

Overcoming MongoDB Limitations with Fauna

Building a Scalable URL Shortener with Node.js (Part 1/2)