DEV Community

Alexander Sack
Alexander Sack

Posted on

Elasticsearch 6.8 Dangling Indices Cleanup

Today I faced a problem that one of our elastic data nodes ran out of disk space. The interesting part was that all others nodes were happy, so I checked and found that there were many GB of garbage indices that were not properly cleaned up allocating that disk space.

Before we start

What is the hostname of your your elastic cluster? Let's set it a shell to make the next steps easier:

export YOURHOST=your.elastic.hostname

ATTENTION

This worked for me, but don't blame me if you loose data; for sure double check that your data is replicated to other nodes and proceed with caution

Step 1: Check Status

Before you start, check that all indices that are relevant to you (e.g. that you don't want to loose) are:

  1. green (all shards properly replicated)
  2. have a number_of_replicas setting of larger than 0

The curl should have zero hits:

curl -s "http://$YOURHOST:9200/_cat/indices?format=json" \
     | jq '.[] | select ( .health != "green" or .rep == "0" )'

Step 2: Delete the Backing Store

Once you know you are "safe" you can stop the node that has too much dangling data that is not referenced anymore and delete the data.

First let's find the directory where data is stored.

By default the data is in ${es.path.home}/data, e.g. using the official docker containers:

# the directory where all index data is stored looks like below
$ ls -la /usr/share/elasticsearch/data
total 12
drwxr-xr-x 3 elasticsearch elasticsearch 4096 Jul 10 08:32 .
drwxrwxr-x 1 elasticsearch root          4096 Mar 18 23:28 ..
drwxrwxr-x 3 elasticsearch root          4096 Jul 10 08:32 nodes

Once you have found the directory, double check again all indices are fine (see Step 1) and stop elastic, then delete the data directory:

rm -rf /usr/share/elasticsearch/data/*

Now you can start elastic again.

Step 3: Forget Dangling Indices

Once your elastic node is running again, most indexes should be in yellow state, but some might be in red! Those are the ones that were actually dangling (e.g. not replicated) and now have been lost.

We should hence finish our cleanup by deleting these:

# lets find all red ones using powers of jq
curl -s "http://$YOURHOST:9200/_cat/indices?format=json" \
     | jq -r '.[]
           | select ( .health == "red" )
           | .index'

Do those look reasonable? OK to drop them? If so lets go ahead:

curl -s "http://$YOURHOST:9200/_cat/indices?format=json" \
     | jq -r '.[]
           | select ( .health == "red" )
           | .index' \
     | while read -r line; do 
           curl -s -XDELETE "http://$YOURHOST:9200/$line"; \
       done

Step 5: Wait for green

Check from time to time how your elastic indices that are yellow are recovering and turning into green.

For that you could count the hits of NOT green indices like:

# at this point there are 48 indexes still not green; try again in a couple minutes to see the diff
$ curl -s "http://$YOURHOST:9200/_cat/indices?format=json" \
     | jq -r '.[] | select ( .health != "green" )' \
     | wc -l
48

Eventually you will reach "0" indexes that are not green, which means you are done and everything is back and running.




Top comments (0)