Varun Palaniappan

Posted on Mar 20

MongoDB Aggregations: Use them to improve query performance

In this podcast, Krish delves into the topic of MongoDB aggregates, offering insights into their significance for server-side developers and the advantages they present over traditional ORM or document-relational mapping layers. He shares a scenario from his product involving hierarchical charts, highlighting the performance considerations when deciding to implement MongoDB aggregates. Krish emphasizes the efficiency and speed of MongoDB aggregates in retrieving specific or large datasets, citing their ability to handle complex MongoDB structures effectively. Additionally, he discusses the importance of experimenting with different query variations using MongoDB clients and advocates for utilizing MongoDB aggregates for optimal data retrieval and query performance.

Summary

Introduction to MongoDB and Aggregates:

Krish introduces the topic of MongoDB and MongoDB aggregates in this podcast.
He explains the relevance of MongoDB for server-side developers and gives a brief overview for those unfamiliar with it.

Working with MongoDB Layers:

Krish discusses the use of ORM and document-relational mapping layers in MongoDB development.
He highlights the common practice of interacting with MongoDB through these layers.

Considerations for Performance:

Krish illustrates a scenario from his product involving the use of multiple levels of charts.
He deliberates on the performance impact of utilizing MongoDB aggregates versus deferring their implementation.

Benefits of MongoDB Aggregates:

Krish emphasizes the performance advantages of using MongoDB aggregates, especially for retrieving specific or large sets of data.
He discusses the efficiency of MongoDB aggregates in handling complex MongoDB structures.

Utilizing MongoDB Documentation and Clients:

Krish mentions the MongoDB documentation and the need for experimenting with different query variations using MongoDB clients.
He emphasizes the performance improvements achieved through optimized MongoDB aggregate queries.

Final Thoughts and Recommendations:

Krish concludes by advocating for the use of MongoDB aggregates for efficient data retrieval.
He suggests exploring MongoDB aggregates for faster query performance and highlights their effectiveness in his product.

Podcast

Check out on Spotify.

Transcript

0:00

This is Krish, hope you're doing well.In this podcast I want to talk about something quite different from the previous ones I just recorded.In this I feel like I want to talk about Mongo DB and in particular Mongo aggregates right?So if you are a server side developer or API developer who does a lot of work on the server side, you probably have a good amount of experience working with a number of different databases.

0:29

But if you're not, if you're a web developer or a UI person or a mobile developer, or if you're just getting into development, maybe I want to give some background.Just just so we're all mostly on the same page.Mongo is one of the options out there that's available if you want to use a persistence layer, non relational, no SQL database.

0:49

There are other options, this is one of them.But I like Mongo, so I'm just going to talk about Mongo and in particular Mongo is again a broad this topic Mongo DB.I'm just going to talk about aggregates and when you might want to write Mongo queries using aggregates and not using a traditional DRM or a document relational mapping layer in between, right.

1:14

So whether you're you're building your server side using Java or or Ruby or Node, whatever it is, right, a lot of the times you're going to use if it's Node, you're going to use Mongoose, if it's if it's Ruby, you might use Mongai.

1:31

And you know a variety of options.There are other there are other options out there, but these are some that come to my mind.So you would use those layers in between so you can interact with your database in a relational if it's relational or in a document relational manner, right?

1:49

Either it's ORM, Object Relational Mapping as they call it, or Document Relational Mapping in the case of no SQL databases.And that's good because you make your, you may run your queries, you create your objects, your your models in whatever library or framework you're working on, and then you interact with those models, right?

2:11

And based on how you're doing it, those names might might vary.So I'm just going to call them models because it's that that's pretty generic.So you have these models and you interact with these models and then you have collections, which is a number of models together and so on and so forth.

2:28

Nothing new there, which works for the most part, right?But now there are times you don't want to do it because your performance is going to take a hit.I'll give you one example.So in our product we have a number of charts, quite a number of charts right at different levels of hierarchy as well.

2:50

I don't we can check out the product at snopal.com, but I'm not going to go into the details of our requirements because it's not directly relevant to this this topic.It is relevant in some form may or shape, but not entirely so.

3:05

So you can check out our chats, sign up and enjoy the product.But that's a digration of 30 seconds back to what I was trying to say.We have a number of charts and the way the charts work, just to give you some context, right, we have multiple levels.

3:21

So you could, you could see, you could view these charts at the highest level.So when you do that, it's going to pull content from from across the board in the system or you could go one level deeper and then look at the same chart.

3:36

We have a variety of different charts.But let's say if you took one chart as an example, you could go one level deeper and then look at the same chart.It's going to obviously look the same because the same kind of chart, but it's going to pull lesser data because it's coped now to that particular structure.In our case, it's a key.

3:53

And then if you go further down, we have something called a block.So it's going to be much lesser data as well, right.So in all of those cases I had initially, you know, even though I knew I had to write aggregates, I was initially asking myself, right, whether I could defer that for a little bit and get that going and come back maybe a couple of months down the road and then reimplement this redo or refactor using aggregates.

4:22

But, you know, my hunch was that that might not be a possibility.And it turned out to be the case after like a few hours of me trying a few different things before writing the actual code.So so I decided OK, let me go back to writing mongo aggregates and which is why I'm finally get the purpose of this video.

4:40

Sorry of this podcast.So with Mongo aggregates, again depending on whether you're using middle layer in between, like a DRM layer, you may have to do it slightly differently.But for the most part, when you write aggregates, you're interacting with Mongo outside of these layers, outside of these tiers, in between.

5:01

So you can go look up the Mongo documentation.Actually, not bad.It's pretty decent for aggregates, but it's tricky because they can only document so much.They take a basic case and they talk about various constructs.Sorry, in terms of building these aggregates, but you will.

5:18

I guarantee you that to satisfy your particular need, whatever that is, you're going to have to try a number of those variations using like, you know, a Mongo client, right?In my case, I use Mongo Booster and then I write these queries and then I put them over to the actual code base.

5:36

Because you want to create all of the variations in terms of the data, you don't need a lot of data, but you need the unique variations of variations of the data.So you you can make sure that your query that the aggregate actually returns what you're looking for.But one thing I can tell you is the performance difference is just even.

5:55

It's almost had to explain it's you expect a difference.You expect that much of a difference actually.But even though you expect the difference, it's it's pleasantly super surprising when you actually realize the results.It's it's entirely different just because there are a number of queries that you've the actual amount of interaction that you have to do with the database and the amount of work the database needs to do by itself is a whole lot lesser when it comes to aggregates.

6:23

So it performs very, very well, right?So when you want to either get a large amount of data, or you want to get a specific set of data from a larger from a database that holds a lot of data.And you may have a number of collections, you may have embedded objects and whatever.

6:43

However complex your mongo structure and collection and document structure ends up being.If you want to get to it in the quickest possible time, there is probably no better alternative than to go with than to go with Mongo aggregates and then by using the pipelines.

7:00

There's a variety of different things you could essentially do, and if you if you haven't already, signed up to the product.But if you do and you check it out, you'll notice how fast the chats respond and trust me, they're doing a whole lot of processing and if we were to have done that outside of using aggregates, it would not perform to this level.

7:22

There's one more thing that's there even after even after making the choice to write aggregates, you actually have a couple of different options.At least a couple of different options available out there.One is you can get the data.I mean, your interest is to get to the data in the quickest possible time.

7:40

Sure, that's part of aggregation.So you get the aggregate query.Sorry.So you get the data out.Now when you get the data out, sometimes you have to do a whole lot more with your aggregates so that the data response, the structure of the data that you get is exactly how your service layer wants for it to be.

7:58

Now again this you can go about it one of two ways at at the very least you could do it in one of two ways.One is you could make your mongo aggregates more and more complex so they not only get the data in the quickest possible time, but they also return the data the in the exact way that you want for the data to be in using projections and and what not.

8:21

Right?There's several ways you could do this in Mongo aggregation.I do it a lot of times, but there are other times when I try to get the data that I need as quickly as I want, and then I actually deal with the structuring of the data in outside of Mongo, right?

8:37

So whether it could be in Node, it could be in Java, it could be in Ruby, it doesn't matter.But you know, because you may have others.The reason I mentioned that is the aggregates can get lengthy pretty quickly and sometimes they can become not so readable, right?So you want to make sure they look very clean, but no matter how you do, they do tend to get pretty long.

8:58

So in the interest of not making them lengthier than they need to be, you can scope their purpose so they get the data out quickly.But structuring the data so you send it the way your client needs it.And the client could be an iPad app, could be an iPhone app, or or a mobile app, or a web app or or just another service layer that integrates with you.

9:17

Doesn't matter who the client is, they may want it several different ways and I don't believe it's it's architecturally are prudent, at least in a lot of scenarios, to have that be done in the Mongol layer.You probably want to reuse your existing service level APIs and methods to doing that because it's it's possible and it's very likely that you have a lot of that code for other purposes in your code base.

9:41

So why try to rewrite that and as part of the Mongol aggregation and make it even more complex?Why not just get the data out quickly?Let the aggregates do what they're supposed to do in the 1st place, and then the rest of the massaging that you want to do, you can do it outside of that layer.

9:57

So that's not a bad way to go about it.At the end of the day, your client makes a request an API call, you issue, you delegate that to the monger aggregates, you get the response back, and then you do the massaging in your service layer and then send the response back.

10:14

You can.I mean, there's a ton of other things you could do around this, but you know, we'll pick that up in a subsequent podcast.Thanks for listening.