ScaleGrid for ScaleGrid

Posted on Sep 21, 2017 • Edited on Dec 26, 2019 • Originally published at scalegrid.io

How to Stop a Runaway Index Build in MongoDB

#developer #index #mongodb #database

Index builds in MongoDB can have an adverse impact on the availability of your MongoDB cluster. Â If you trigger a foreground index build on a large collection on your production server, you may find that your cluster is unresponsive until the index build is complete. Â On a large collection, this could take several hours or days, as described in the perils of index building in MongoDB.

The recommended best practice is to trigger index builds in the background, however, on large collection indexes, we've seen multiple problems with this approach. Â In the case of a three node cluster, both secondaries start building the index and stop responding to any requests. Consequently, the primary does not haveÂ quorum and moves to the secondary state taking your cluster down. Also, the default index builds triggered from the command line are foreground index builds - making this a now widespread problem. In future releases, we're hopefulÂ that this becomes background by default.

Once you've triggered an index, simply restarting the server does not solve our problem; MongoDB will pick up the index build from where it left off. If you were running a background index build previously after theÂ restart, it now becomes a foreground index build, so in this case, the restart could make the problem worse.

If you've already triggered an index build, how do you stop it? Luckily, it's relatively easy to stop an index build.

Option 1: Kill the index build process

Locate the index build process using db.currentOp() and then kill the operation using db.killOp(<opid>). The index operation will look something like this:

[code language="javascript"]
{
"opid" : 820659355,
"active" : true,
"lockType" : "write",
....
"op" : "insert",
"ns" : "xxxx",
"query" : {
},
"client" : "xxxx",
"desc" : "conn",
"msg" : "index: (2/3) btree bottom up 292168587/398486401 64%"
}
[/code]

If the node where the index is building does not respond to new connections, or the killOp does not work, use Option 2 below:

Option 2: Configuring "noIndexBuildRetry" & restart

MongoDB provides a "--noIndexBuildRetry" option which instructs MongoDB to stop building incomplete indexes on restart.

This parameter doesn't appear to be supported from the config file, only as a parameter for the mongod process. We don't prefer to runÂ mongod manually with this optionÂ because if you accidentally run the mongod process as an elevated user (E.g. root), it ends up changing the permissions of all the files. Also, once run as "root", we've had intermittent problems running the process as mongod again.

A simpler option is to edit the /etc/init.d/mongod file. Looks for this line:

[code language="javascript"]
OPTIONS=" -f $CONFIGFILE"
[/code]

Replace with this line:

[code language="javascript"]
OPTIONS=" -f $CONFIGFILE --noIndexBuildRetry"
[/code]

Detailed steps

For the purposes of this discussion, we're providing instructions for CentOS/RedHat/Amazon Linux.

Configure "--noIndexBuildRetry"
Add the "--noIndexBuildRetry" option to all your data nodes as explained above.
Restart all the nodes building the index
Look at the mongod log file for each data server and determine if it's building the index. If it is, restart the server "service mongod restart".
Drop the incomplete index
Once all of the relevant nodes are restarted, look at the list of indexes and drop the incomplete index if you see it on the list.
Remove "--noIndexBuildRetry"
Edit the /etc/init.d/mongod file to remove the --noIndexBuildRetry option that you added in step 1 so we can revert back to the default behavior of resuming the index build.

For any further questions, reach out to us at support@scalegrid.io.

Happy indexing!