DEV Community

Hafidz Jazuli Luthfi
Hafidz Jazuli Luthfi

Posted on

How to Evaluate Query Performance on MongoDB

Introduction

Hi, my name Hafidz. I am just regular software developer like almost of you. Currently, I have enrolled MongoDB for Developers course from MongoDB university. This course is great and has awesome instructors. I like to share what I have already learn from this course. It is about how we evaluate query performance on MongoDB using explain("executionStats").

Motivation

Evaluating query performance may became complicated if we do not really know which part should be measured. Fortunately, MongoDB provides very handy tool which can be used to evaluate query performance: explain("executionStats"). This tool provide us some general measurements such as number of examined document and execution time that can be used to do statistical analysis.

The explain("executionStats")

You can find detailed explanation of explain("executionStats") output from this documentation. To simplify, we only need to focused on some output keys:

  • nReturned: Number of documents returned by query operation.
  • executionTimeMillis: Total time in milliseconds needed to complete the operation.
  • totalKeysExamined: Returns 0 (zero) if no indexes found in the operations, else returns number of scanned index keys in order to complete the operation.
  • totalDocsExamined: Total number of examined documents in order to complete the operation.

The Database and Collection

In this easy tutorial, we used a school database and students collection contains 1,000,000 documents. All the documents randomly created from javascript file which you can downloaded from here.
Below the description of each key in our document:

  • student_id: Unique identification of each student.
  • scores: An array of contains two keys: type (type of score) and score (float value).
  • class_id: Unique identification of each class where student belongs to.

Evaluation

In this section, we want to know how indexing can improve query performances. In order to simplify the case, we focused only on how to optimizing "Read operation".

Initial evaluation

We need to finds any students that have exam score greater than 90. So the query is: db.students.find({'scores.type': 'exam', 'scores.score': {$gte: 90}})

First evaluation: Without indexing.

Run this commands in the shell:

var exp = db.students.explain('executionStats');
exp.find({'scores.type': 'exam', 'scores.score': {$gte: 90}})

We got:

...
"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 343310,
                "executionTimeMillis" : 2843,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 1000000,
...

Second evaluation: With indexing.

Run this commands in the shell:

db.students.createIndex({'scores.score': -1, 'scores.type': 1})
var exp = db.students.explain('executionStats');
exp.find({'scores.type': 'exam', 'scores.score': {$gte: 90}})

We got:

...
"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 343310,
                "executionTimeMillis" : 2284,
                "totalKeysExamined" : 399171,
                "totalDocsExamined" : 343310,
...

Conclusion

From the two evaluations results, we knew that the operation to find any students that have exam score greater than 90 will run faster by using compound index. In this case, collection with indexes {'scores.score': -1, 'scores.type': 1} performs faster than without indexes (2284 ms vs 2843 ms).

Suggestion

Try to run this explain query:

db.students.explain('executionStats').find({'scores.score': {$gte: 90}, 'scores.type': 'exam'})

This query differs from the explain query which used in the evaluation because we flip the find parameters. Maybe you will convinced yourself that this query should be run faster than the query we used in the evaluation tutorial. But in fact, its not. Try to look winningPlan key on the explain result, you will find out why.

I have created a simple javascript code which measures means of "executionTimeMillis". Just play around with that code to do statistical analysis, such as measure the standard deviation.

Oldest comments (0)