Molly Struve (she/her)

Posted on Feb 13, 2019 • Edited on Apr 11, 2019

Performance Testing Elasticsearch

#elasticsearch #ruby #performance #devops

When you have a large Elasticsearch cluster and you want to make an index change, whether it be to the mappings or settings, you want to have some assurances that the change is going to improve performance. The change also might be hard to implement, so you want to know if the performance gain is worth the time you are going to have to take to update the code.

We Will Do it Live!

I kid you not, in the early days at Kenna, when we were small, we would literally test on our production cluster. Think a new shard setup might be better? Sure, let's reconfigure everything and see what happens. More than once we ended up with late night battles and fire fighting thanks optimizations gone wrong. It was a steep learning curve, but in the end, we survived!

Now that we are a lot bigger, and a bit wiser 😉, we have chosen to go about making these sorts of changes a bit more responsibly, by fully testing them first! 🎉

How We Test Elasticsearch Changes

TL;DR

We now test Elasticsearch changes using a Ruby SearchTesting class that we wrote. This class allows us to make requests to two different indexes at the same time from our application. Both requests, are wrapped in monitoring code which allows us to track and view them in our monitoring service. With the tracking, we can compare things like request time to see which index performs better. Below is one example of when we used this class.

Optimization: Storing IDs As Keywords

In Elasticsearch training, we were told over and over, that storing IDs as keywords could give us a performance boost when it came to search. The reason is because integer data types in Elasticsearch are optimized for range queries. Keywords are optimized for terms queries. A terms query looks like this:

{
  "terms": {
    "id": [1, 5, 9]
  }
}

If you are only ever executing terms queries with your IDs, then you should store them as keywords. This sounded like solid advice, but it was going to take some work to update our mappings and reindex all 3 billion of our documents, so we wanted to make sure we were really going to see some benefits before we made any changes.

Testing Setup

Before we could start a test, we first had to create a duplicate index of our production one. Except, this new test index would have all the IDs stored as keywords. We reindexed(copied) all of the data from the production index into the new test index so they both contained the exact same data.

Next, we had to put a hash into Redis to signal to our app that we were going to be testing this index. When our Search Testing class makes an Elasticsearch request, it first checks Redis to see if we are running a test on the requested index. If we are, it will send the request not just to the original index, but also to the testing index.

We created this hash directly with our SearchTesting class using the method below.

def self.set_test_index(original_index_name, test_index_name)
  index_hashes = cache_index_hashes
  index_hashes[original_index_name] = {
    :index_name => test_index_name,
    :test_searching => true,
    :test_indexing => true
  }
  update_index_cache(index_hashes)
end

The resulting hash looks like this in Redis:

{
  "test_indexes" => {
    "prod_index" => {
      "index_name" => "test_index",
      "test_searching" => true,
      "test_indexing" => true
    }
  }
}

As soon as that hash was in place, the testing would start! There were two things we wanted to test with our SearchTesting class, search speed and indexing speed.

Testing Search Speed

When we introduced this new Search Testing class, we implemented it in such a way that every request we made to Elasticsearch went through it .

@connection.send(m, *args, &block)
# was replaced with
SearchTesting.new(@connection).send(m, *args, &block)

@connection here is a connection object from our elasticsearch-ruby gem that allows us to talk to Elasticsearch. We pass that to our SearchTesting class, which then defines the methods we want to test. Any undefined methods simply get forwarded on to Elasticsearch with no interference.

def method_missing(m, *args, &block)
  connection.send(m, *args, &block)
end

Since we wanted to test the search method, we defined it like so:

def search(*args)
  self.index_name = args&.first&.dig(:index)
  test_hash = cache_index_hashes.dig(index_name)

  test_search_thread = (Thread.new { test_search(*args.deep_dup, test_hash) } if test_searching_enabled?(test_hash))

  connection.search(*args)
end

Few things going on here I want to zoom in on. First, we are getting the name of the index we are making the request to. Then we check to see if that index is being tested.

self.index_name = args&.first&.dig(:index)
test_hash = cache_index_hashes.dig(index_name)

If a test hash is present in Redis AND searching is enabled for that test hash, we create a new thread to run the test request in.

test_search_thread = (Thread.new { test_search(*args.deep_dup, test_hash) } if test_searching_enabled?(test_hash))

That test_search method is pretty simple. All it does is replace the original index name with the test index name and then executes the request against the test index.

def test_search(*args, test_hash)
  test_index_name = test_hash[:index_name] # fetch test index name
  args.first[:index] = test_index_name # replace original index name with test
  connection.search(*args) # execute request against the test index
end

Since the test_search method is running in a different thread we don't have to worry about waiting for it to complete. Once we have kicked off the test_index request, then we make the request to the original index and return those results!

Now, search is only half of the picture. We also wanted to make sure indexing was still performant.

Testing Indexing Speed

In order to test indexing speed, we had to test using the bulk method since all of our indexing is done in bulk. We did this similar to search by creating a separate thread to run the test indexing in.

def bulk(*args)
  test_bulk_thread = Thread.new { test_bulk(*args.deep_dup) }
  connection.bulk(*args)
end

The hard part here is that sometimes hashes going to different indexes are grouped together. Due to this, we have to check each indexing hash individually. If there are any hashes going to a production index that is being tested, then we will also index those hashes to the test index.

def test_bulk(*args)
  new_bulk_hashes = test_bulk_hashes(*args) # select hashes to go to the test index
  return unless new_bulk_hashes.any? # if there are none return 
  connection.bulk({ :body => new_bulk_hashes }) # index hashes to the test index
end

def test_bulk_hashes(*args)
  @test_bulk_hashes ||= args.first[:body].map do |index_hash|
    index_name = test_index_name(index_hash.values.first[:_index])
    next unless index_name # if there is an index name, a test is running
    index_hash.values.first[:_index] = index_name # replace original index with test index
    index_hash # return new index hash to the map
  end.compact
end

Same as with search, since the indexing is happening in a separate thread, once we kick it off we can move on to indexing the documents to the original index and return that result.

Don't Rush It

One of the keys when you are performance testing indexes in Elasticsearch is to give the test time to run. Elasticsearch will build up cache's as you are searching it. This means your original index, which has caches built up, will likely be faster initially than your new index. You want to let the test run long enough to build up the same caches, so the test is a true 1-1 test. We usually let our tests run for at least a few days. Not only does this allow the indexes time to build up caches, but it also allows for a wide variety of search and indexing requests to be made.

So far we have used this search class to test changing IDs to keywords and shard count changes. Changing IDs to keywords gave us a 30% increase in search speed with no change in indexing speed. Decreasing our shard counts hurt our performance across the board so we did not move forward with that change in our code base.

I wanted to share this code so hopefully others can use this class to setup their own Elasticsearch tests in Ruby! If you have any questions or I did not explain something clearly, please don't hesitate to ask!

Top comments (1)

Evaldas Buinauskas • Feb 14 '19

I would have never thought that changing integers to keywords would increase search speed. Thanks for a great tip!