Using Indexing To Optimize MongoDB Performance

Irfat — Thu, 30 May 2024 17:16:03 +0000

Introduction

Every system must have its database performance optimized in order to function properly. It improves the app's overall performance. When data grows, an efficient database system can handle the extra load without experiencing noticeable slowdowns and process data much more quickly and improve user experience.

What is Indexing?

Database indexing is a technique for improving it's performance by speeding up read operations. It is similar to the indexes used in books. It efficiently guides database queries, resulting in overall improved performance.

Why Indexing?

Indexing can be used to improve overall performance of MongoDB. Indexes decrease the amount of data that query operations must process, which increases the performance of read operations. This lessens the effort involved in responding to requests in MongoDB. For this tutorial, we'll look at a simple MongoDB collection called students, which has the following field in each document.

{
    _id: ObjectId, //for this example we will consider a simple number
    name: string, //name of students
    age: number, //age of students
}

Assume that the students collection contains the following documents.

_id	name	age	gender
1	Alice	25	female
2	Bob	30	male
3	Charlie	22	male
4	David	30	male
5	Eve	22	female

Query Performance Without Indexing

Now, without indexing, let us find 'Eve' using the db.students.find({name: 'Eve', age: 22}).explain('executionStats') method. Here explain('executionStats') method provides information about the performance of the query and it returns following object:

{
   queryPlanner: {
         ...
         winningPlan: {
            stage: 'COLLSCAN',
            ...
         }
   },
   executionStats: {
      executionSuccess: true,
      nReturned: 1,
      executionTimeMillis: 0,
      totalKeysExamined: 0,
      totalDocsExamined: 5,
      executionStages: {
         stage: 'COLLSCAN',
         ...
      },
      ...
   },
   ...
}

queryPlanner.winningPlan.stage: 'COLLSCAN' means this query did not use any indexing rather it performed a collection scan which is basically searching one document after another which is generally an expensive process.
executionStats.nReturned: 1 indicates that this query returned only 1 document.
executionStats.totalKeysExamined: 0 shows that this query is not using any indexing for search.
executionStats.totalDocsExamined: 5 indicates this query scanned total of 5 documents i.e. entire collection is scanned.

Query Performance With Indexing

Before creating an index let's find the current available indexes. To get the current indexes we can use db.students.getIndexes() method, it will return the following array:

[ { v: 2, key: { _id: 1 }, name: '_id_' } ]

key means the field name based on which the indexing is created that means the _id field is already indexed by default. So, searching with _id already performed using an index.

Let's create index for age and name field of our collection where age will be in ascending order and name will be in descending order.

To create the index we can use db.students.createIndex({age:1, name: -1}) 1 means ascending and -1 means descending order. Calling the getIndexes() method again it will return the following array:

[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { age: 1, name: -1 }, name: 'age_1_name_-1' }
]

New index with name age_1_name_-1 is created.

Note: Creating an index on the same field will throw an error. You need to delete the previous index using the db.collection.dropIndex() method first.

Now, let's run the query db.students.find({name: 'Eve', age: 22}).explain('executionStats') again and it returns the following result:

{
   queryPlanner: {
    ...
    winningPlan: {
      stage: 'FETCH',
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { age: 1, name: -1 },
        indexName: 'age_1_name_-1',
        ...
      }
    },
    ...
  },
   executionStats: {
         executionSuccess: true,
         nReturned: 1,
         executionTimeMillis: 1,
         totalKeysExamined: 1,
         totalDocsExamined: 1,
         executionStages: {
            ...
         },
         ...
   },
   ...
}

queryPlanner.winningPlan.inputStage.stage: 'IXSCAN' indicates that this time it used index scanning.
executionStats.nReturned: 1 means this query returned 1 document.
executionStats.totalKeysExamined: 1 shows that this query is used one key for the execution of the query.
executionStats.totalDocsExamined: 1 means this query scanned only one document which previously was 5.

So, using index scanning made a huge improvement.

How MongoDB IXScan Works

Similar to other databases, MongoDB employs B-trees to store indexes. If a document has n number of collections then using COLLSCAN has complexity of O(n) since it needs to check all the documents for worst case on the other hand B-tree has better searching complexity which is O(log n). Indexing scans the B-tree for the key and then returns the documents to which the key points. MongoDB indexing can be described in following steps:

Initiating an Empty B-tree: When you use the createIndex() method in MongoDB, an empty B-tree is created for the index. The B-tree is initially empty and will be filled out as documents are added to the collection.
Updating the Tree Upon Insertion: As documents are inserted into the collection, the B-tree index is updated to reflect these insertions. The B-tree is maintained in a balanced state to ensure efficient querying.
Traversing the Tree for Queries: When you execute a query that can utilize an index, MongoDB traverses the B-tree to find the matching documents efficiently. This traversal involves navigating the B-tree based on the values being queried. If the value is less than or equal to a node it will traverse to left subtree else right subtree.
Leaf Nodes Pointing to Documents: In a B-tree index, the leaf nodes typically contain references (pointers) to the actual documents in the collection that match the indexed values. This allows MongoDB to quickly locate the documents that satisfy the query conditions.
Selecting Documents from Pointers: When MongoDB finds leaf nodes containing pointers to documents, it retrieves those documents from the collection. These documents are then returned as query results.

The image below can help to summarize the entire process.

Fig: IXSCAN scan visualization

Note: In MongoDB, the internal structure of non-leaf nodes within B-tree indexes is determined by the MongoDB server itself. These non-leaf nodes serve as guides for queries, directing them towards the appropriate leaf nodes where the indexed data is stored.

Execution Of The Query Using COLLSCAN

Fig: COLLSCAN execution

Execution Of The Query Using IXSCAN

Fig: IXSCAN execution

Avoid Indexing When

A collection has a high write-to-read ratio, and indexing are costly because each insert requires updating any indexes.
For smaller databases, implementing indexing may not provide a substantial improvement in performance. The benefits of indexing become more apparent as the size of the database grows, but for smaller datasets, the performance gains might be minimal.

Conclusion

Indexing is an excellent approach to improve the performance of your database. It enhances the user experience by making a system quicker. This can be accomplished by indexing the most frequently used fields of a collection. It is also crucial to note that indexing has its drawbacks. It demands more storage and processing. So, it's important to understand when to utilize it and when not to.

DEV Community: Irfat