Optimizing complex relational query in MongoDB using aggregation pipeline

#mongodb #node #database #api

we all know that MongoDB is a nosql database and is mostly used for non relational database.
But you can also design a database with strong relationship between tables (or models for mongo) using MongoDB as well. But what will be the problem then? The problem will be if we use the basic APIs of MongoDB like FindOne(), FindById() etc, then we may end up writing a lot of database calls on a single API endpoint.

For example, let's assume that we have a simple classroom management system where there will be a classroom and under one classroom there can be multiple teachers and students.
So, now if we want to design a database schema for this design what we need to do? Let's first understand the relation between classroom teachers and students. Hence the relation between classroom-teacher is many to many and so as the relation between classroom-student.

Now let's design a classroomSchema using mongoose. How we will do that?
A naive approach can be like this:

but whats the problem? this database is not normalized and may lead to performance issue while scaling up.
So what can we do?

We can come up with a bit efficient solution by storing the object id of the teachers and students documents instead of keeping the whole documents.

Great, but now if we need to build a rest api endpoint which will return all the classrooms along with all the teachers and students associated with them

we can write the following code for that:

but what's the problem with this solution? this solution will lead to a larger response time because there are 3 database calls here and the complexity of the logic being applied is O(n^2) (quadratic) which both are going to make latency in the response. To get out of this problem, we need to make only one database call and get all the necessary information that we need directly from the database query instead of writing any additional logic.

But how can we do that? we are not storing the full document in our array of the classroom document, then how can we do combine classroom, teacher and student document and then make query?

We can use aggregation pipeline. Aggregation pipeline allows you to make all kinds of query that you can perform in SQL with just some different syntax. I don't know why the mongoDB tutorials I found online did not include the concept of aggregation pipeline yet it is necessary to understand.

MongoDB aggregation pipeline allows you to write multiple queries on a single database call.

Right now our goal is to get all the classrooms along with their enrolled students and teachers. Which means we need to combine classroom, teacher and student schemas together and then run the find() query.

we can write the following code for that

we can see that the model.aggregate() function receives an array of objects. In each object we can write individual query. We used first $lookup query to join classroom with teachers schema and second $lookup query to join classroom with students schema.

Let's see these in action and observe the difference.

Let's first create three endpoints
one for creating teachers
one for creating students
one for creating classrooms with 3 teachers and 3 students

Each of these endpoints will create 1000 teacher and students and 100 classroom entries in the database

now we will build two endpoints to fetch the classrooms.

GET /classrooms/unoptimized

GET /classrooms/optimized

now lets make the unoptimized query

see the response time. its around 850 ms

Now lets see the optimized endpoint response time

Just see the magic
we have managed to reduce the response time from 850 ms to just 15 ms.
HOW COOL IS THAT

Full Code -->

Don’t Become a Data Breach Headline

57% of organizations have suffered from a security incident related to DevOps toolchain exposures. Is your CI/CD protected? Check out these nine practical tips to keep your CI/CD secure—without adding friction.

Learn more