Sorting with indexes in MongoDB

#mongodb

In this article, we will explore when MongoDB uses indexes to sort documents. There are several situations to be aware of with index prefixes.

First, we will create some mock data for our database in our mongo shell. Each document will have a i field, a username field and a createdAt field.

> for( i = 0; i < 10000; i++){
... db.users.insert({ i: i, username: 'user' + i, createdAt: new Date()})
}

> db.users.find({}).count()
10000

Compound Indexes

Next, let's create a compound index on our collection. A compound index is a index on two or more fields. For this one, we will have a ascending index on username and i.

db.users.createIndex({username: 1, i: 1})

We are all set, let's now perform a sort on all the documents using our compound index. We will use the explain() method to get informations about how MongoDB performed the query:

> db.users.find({}).sort({username: 1, i:1}).explain("executionStats")

...

"winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "username" : 1,
                                        "i" : 1
                                },
                                "indexName" : "username_1_i_1",
...

The explain() output indicates that we used an index scan to perform our sort. We used a collection scan to find all the documents in our collection, and an index scan to sort them by ascending username and i.

Note that the field order in our compound index matters. If I invert i and username order in my sort predicate, everything changes:

> db.users.find({}).sort({i:1, username:1}).explain("executionStats")

...

"winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "i" : 1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "direction" : "forward"
                                }
                        }
                },
...

Now, our winningPlan is no longer using a index scan, but an in-memory sort. In-memory sorts are less efficient, so we have to be careful when we design our indexes.

Index Prefixes

Index prefixes are a subset of a compound index. In MongoDB, you can use an index scan even if you have only use certain fields of a compound index. However, there are certain rules:

Index prefixes must be continuous
Index prefixes must start from the left

A few examples. Imagine I have a compound index like so :

{username: 1, firstName: 1, lastName: 1}

These are valid index prefixes:

{username: 1, firstName: 1}
{username: 1}

These are NOT valid index prefixes:

{firstName: 1, lastName: 1}
{username: 1, lastName: 1}
{lastName: 1}
{lastName: 1, firstName: 1, username: 1}

With this in mind, the following query will still prevent an in memory sort:

db.users.find({}).sort({username: 1})

Index prefix overlap in query and sort

Your index prefixes can also overlap in your query and sort predicates. They follow the same rules than before. For example:

> db.users.find({username: {$gt: 'user305'}}).sort({i:1}).explain('executionStats')

...

 "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "i" : 1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "FETCH",
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "username" : 1,
                                                        "i" : 1
                                                },
                                                "indexName" : "username_1_i_1",
                                                "isMultiKey" : false,
                                                "multiKeyPaths" : {
                                                        "username" : [ ],
                                                        "i" : [ ]
                                                },
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 2,
                                                "direction" : "forward",
                                                "indexBounds" : {
                                                        "username" : [
                                                                "(\"user305\", {})"
                                                        ],
                                                        "i" : [
                                                                "[MinKey, MaxKey]"
                                                        ]
                                                }
                                        }
                                }
                        }
                },
...

As you can see, our winning plan still uses a index scan to query AND sort, even though they overlap our query and sort predicates. The possible combinations are exactly the same than the ones before.

Index directions

MongoDB can read indexes forwards or backwards. So far, every single time we did a sort, we read the index forward, because our sort predicates used the same sort direction than the ones in our compound index. To make MongoDB read index backwards, you have to invert the direction on every single index field. For example:

> db.users.find({}).sort({username: -1, i: -1}).explain("executionStats")

...

 "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "username" : 1,
                                        "i" : 1
                                },
                                "indexName" : "username_1_i_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "username" : [ ],
                                        "i" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "backward",
                                "indexBounds" : {
                                        "username" : [
                                                "[MaxKey, MinKey]"
                                        ],
                                        "i" : [
                                                "[MaxKey, MinKey]"
                                        ]
                                }
                        }
                },
...

We are still using an index scan like before, but you can see that the 'direction' key indicates 'backward' this time.

Again, note that every single key must be inverted. Another example:

> db.users.find({}).sort({username: -1, i: 1}).explain("executionStats")

...

 "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "username" : -1,
                                "i" : 1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "direction" : "forward"
                                }
                        }
                },
...

This doesn't use an index scan. One on my field in my compound index is not inverted. We fall back to a in-memory sort.

You can still use index prefixes while inverting the sort direction. If I only used {username: -1}, I would still get an index scan:

> db.users.find({}).sort({username: -1}).explain("executionStats")

...

                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "username" : 1,
                                        "i" : 1
                                },
                                "indexName" : "username_1_i_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "username" : [ ],
                                        "i" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "backward",
                                "indexBounds" : {
                                        "username" : [
                                                "[MaxKey, MinKey]"
                                        ],
                                        "i" : [
                                                "[MaxKey, MinKey]"
                                        ]
                                }
                        }
                },

...

Conclusion

I hope this gave you a better idea of when MongoDB uses indexes in sorts. The goal is to try to avoid inefficient in-memory sort. MongoDB allows a rather flexible approach, but it always depends on the design of your indexes in the end.