loading...

MongoDB Query Optimization: Covered Query

bladefidz profile image Hafidz Jazuli Luthfi ・3 min read

Introduction

Hi, my name Hafidz. I am just regular software developer like almost of you. Currently, I have enrolled MongoDB for Developers course from MongoDB university. This course is great and has awesome instructors. I like to share what I have already learn from this course. It is about how we optimize MongoDB query by implementing covered query.

This post has correlation with my previous post: How to Evaluate Query Performance on MongoDB. If you want to know more about how to evaluate query performance, than it is good idea to read it before read this post.

Motivation

If we are not used any indexes on the collection, then MongoDB will performs COLLSCAN. It's means, if we have N collections, than MongoDB will perform scanning in N times. Your friend who have computer science degree may said to you, "We can do better!". Yes, we can do better by implementing indexing algorithm. MongoDB has been provides lower bound data structure called B+ trees which gives us Log N performance. If you want to know more about the indexing algorithm, then try to read this documentation.

But, the question is: How we can exploit the indexing algorithm to achieve high performance query?

Query Optimization using Covered Query

MongoDB has excellent documentation about how to optimizing query. In the case to optimize such query operations that are part of such index or indexes, then we can achieve high performance operation by implementing Covered Query. In other word, we expect that such query operations only examined index field, but not document fields.

The Database and Collection

I will be using exactly same collection which used in this post. But in the case you are not have read it or just forget about it, then I will just need to re-explain the same explanation.

In this easy tutorial, we used a school database and students collection contains 1,000,000 documents. All the documents randomly created from javascript file which you can downloaded from here.
Below the description of each key in our document:

  • student_id: Unique identification of each student.
  • scores: An array of contains two keys: type (type of score) and score (float value).
  • class_id: Unique identification of each class where student belongs to.

Implementation

In this section, we want to know any students defined by student_id belong to classes 5, 15, 30. So the query is db.students.find({class_id: {$in: [5, 15, 30]}}, {_id: 0, student_id: 1, class_id: 1})

First implementation: Without indexing

Run this commands in the shell:

var exp = db.students.explain('executionStats').find({class_id: {$in: [5, 15, 30]}}, {_id: 0, student_id: 1, class_id: 1})
exp

We got:

...
"winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                                "_id" : 0,
                                "student_id" : 1,
                                "class_id" : 1
                        },
                        "inputStage" : {
                                "stage" : "COLLSCAN",
...
"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 5977,
                "executionTimeMillis" : 840,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 1000000,
...

Second implementation: With indexing and proper covered query

Run this commands in the shell:

db.students.createIndex({class_id: 1, student_id: 1})
var exp = db.students.explain('executionStats').find({class_id: {$in: [5, 15, 30]}}, {_id: 0, student_id: 1, class_id: 1})
exp

We got:

...
"winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                                "_id" : 0,
                                "student_id" : 1,
                                "class_id" : 1
                        },
                        "inputStage" : {
                                "stage" : "IXSCAN",
...
"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 5977,
                "executionTimeMillis" : 16,
                "totalKeysExamined" : 5980,
                "totalDocsExamined" : 0,
...

Conclusion

As you can see from explained result, the Covered Query achieved far higher performance than regular non-indexed query which has 2/105 execution time ratio and only need to examined 5980 keys to return 5977 documents. It is pretty good performance gain right? Your computer science friend may glad to look that proof. :)

Suggestion

Keep to play around with same collection and try to create multi-key indexes {'scores.score': -1, 'scores.type': 1}, then execute var exp = db.students.explain('executionStats'); exp.find({'scores.type': 'exam', 'scores.score': {$gte: 90}}). You may asking why the covered query did not work. Try to read this documentation on Multikey Covering section. You will found out why.

Posted on Apr 12 '18 by:

bladefidz profile

Hafidz Jazuli Luthfi

@bladefidz

Just learn mongoDB and machine learning.

Discussion

markdown guide