loading...
Cover image for Introduction to MongoDB Indexing

Introduction to MongoDB Indexing

akazia_it profile image M. Höller ・4 min read

This post is about database indexing for MonogDB. I will provide a quick overview on what indexes are and how to use them.

Understanding Indexes

Indexes are important for the efficient execution of queries. There is no difference for MongoDB. Let’s assume you have a large collection of documents and no indexes. To select a document that matches a certain query statement MongoDB must perform a complete collection scan. That means every single document in the entire collection is read and checked if it matches the query statement. That’s a huge overhead to find a single document.

Here come indexes to the game. Indexes are special data sets which store a partial part of the collection's data. Since the data is only a subset of the collection, it becomes easier to read this data. So if you had indexes (which support the queried field(s)), MongoDB would use these indexes to limit the number of documents that had to be searched in the collection. Get your hands dirty and walk through an example.

Walk through an example

First we need some data. Let’s retrieve a restaurant dataset and save it to a file named primer-dataset.json. Fire up your mongod to import the data into the database. You can do the import with mongoimport:

mongoimport --db test --collection restaurants --drop --file ~/downloads/primer-dataset.json

Now lets run a query and get some statistics. I will utilize the explain() Method to get some performance data. In a future posting I will discuss this method more detailed.

db.restaurants.explain("executionStats").find( { "restaurant_id" : "40356018"} );

We will get quite a bit of statistics:

    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 1,
        "executionTimeMillis" : 25,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 25359,
        "executionStages" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "restaurant_id" : {
                    "$eq" : "40356018"
                }
            },
            "nReturned" : 1,
             [...]

So we see that the complete collection was scanned (25359 documents) and that it took 25 ms.

Create an index

Now let’s create an index on the queried filed restaurant_id and re-run the query.

>db.restaurants.createIndex({"restaurant_id": 1})
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

The parameter ‘1’ indicates that the index for the field restaurant_id should be sorted in ascending order.
CreatedCollectionAutomatically indicates if the operation created a collection. If a collection does not exist, MongoDB creates the collection as part of the indexing operation.
NumIndexesBefore shows the number of indexes before the command was executed
numIndexesAfter shows the number of indexes after the command was executed
ok almost self expaining at this point, 1 for a successful execution 0 for a failure

Re-run the query

After we setup an index on the restaurant_id let’s re-run the query with the explain statement.

db.restaurants.explain("executionStats").find( { "restaurant_id" : "40356018"} );

Again we get quite some statistics, I cut out the part we need for this check:

    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 1,
        "executionTimeMillis" : 1,
        "totalKeysExamined" : 1,
        "totalDocsExamined" : 1,
        "executionStages" : {
            "stage" : "FETCH",
            "nReturned" : 1,
            "executionTimeMillisEstimate" : 0,
            "works" : 2,
            "advanced" : 1,
            "needTime" : 0,
            "needYield" : 0,
            "saveState" : 0,
            "restoreState" : 0,
            "isEOF" : 1,
            "invalidates" : 0,
            "docsExamined" : 1,
            "alreadyHasObj" : 0,
            "inputStage" : {
                "stage" : "IXSCAN",
                "nReturned" : 1,
                "executionTimeMillisEstimate" : 0,

Cool! The total execution time is down to 1 ms! As you can see there is now only one document examined instead of 25359. And we see that we only do an index scan (IXSCAN) instead of reading the complete collection (collection scan).
Setting up an index is not limited to one single field. You can pass several fields, separated with a comma, to the createIndex statement. This will result in an compound Index.

Keep track of created indexes

Now that you know how powerful indexes are, you might created some. But how to you now what is out there in the wild? Well, just query it:

> db.restaurants.getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.restaurants"
    },
    {
        "v" : 2,
        "key" : {
            "restaurant_id" : 1
        },
        "name" : "restaurant_id_1",
        "ns" : "test.restaurants"
    }
]

The result set shows that there are two indexes. The _id which is created by default and a further one based on the field restaurant_id. As mentioned above the : 1 indicates that the index field values are created in ascending order.
Unfortunately you can not simply go ahead and put an index on every field in you database. This can drastically slow down other operations such as the Insert, Delete and Update.
If there are frequent insert, delete and update operations carried out on documents, then the indexes would need to change that often, which would just be an overhead for the collection.

Removing an index

So what to do if you want to get rid of an index? - Just drop it ..

> db.restaurants.dropIndex({restaurant_id:1});
{ "nIndexesWas" : 2, "ok" : 1 }

The result set tells you which index number was effected and if the operation was successful (ok: 1)
You even can drop all indexes with the command db.restaurants.dropIndexes(). This will drop all indexes except for the _id index.

Strategies for Indexing

There are several points to consider to build the best index for your application: how many (free) RAM do you have, what read / write ratio do you expect, which are the major queries .. I do not know the one and only way to find the best index, ideally you profile a variety of index configurations with data sets that come close to production environment to see which configurations perform best. And logical, but constantly forgotten: If an index is no longer used, drop the index.

The following indexing strategies papers are published from MongoDB:
Create Indexes to Support Your Queries
An index supports a query when the index contains all the fields scanned by the query. Creating indexes that supports queries results in greatly increased query performance.

Use Indexes to Sort Query Results
To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields.

Ensure Indexes Fit in RAM
When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing.

Create Queries that Ensure Selectivity
Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

Discussion

pic
Editor guide
Collapse
damcosset profile image
Damien Cosset

Good one. Very clear and informative.