loading...
Cover image for Getting Started with Elasticsearch and Ruby

Getting Started with Elasticsearch and Ruby

molly_struve profile image Molly Struve (she/her) ・10 min read

Recently, DEV has started the migration from Algolia to Elasticsearch. Since I am often asked what is the best way to get started with Elasticsearch, I figured I would share how we have been making the switch. Hopefully, you can use this post as a template if you decide to implement Elasticsearch in your Rails or Ruby app in the future.

Before I get started I want to preface this post by saying that this article assumes you understand the basics of Elasticsearch. You should be familiar with the terms index, mappings, and documents since we will be covering those. If you need a refresher or want to learn about how Elasticsearch works I highly recommend the Elastic docs!

1) Install Elasticsearch

Alt Text

Well, it isn't quite that easy . Before you start hacking away at your code you need to get Elasticsearch up and running so you can talk to it. There are a million different ways to do this depending on your environment so I am going to point you towards the Installing Elasticsearch docs for getting started.

Many of us at DEV use Macs and ended up installing from archive since the Homebrew install seemed to be broken for the majority of us. Once you have Elasticsearch up and running the next step is to get your code talking to it.

2) Install the Elasticsearch Ruby gem

Related Pull Request

The Elasticsearch ruby gem installs just like any other gem, all you have to do is add a line to your Gemfile.

gem "elasticsearch", "~> 7.4" 
Enter fullscreen mode Exit fullscreen mode

One important thing to note is what version of Elasticsearch you are planning on using. The gem versions are numberered to match the Elasticsearch versions. If you are on Elasticsearch version 5 then you will want to use the latest version 5 release of the gem.

Another thing you might notice in the pull request that I reference above is that we also installed the Typhoeus gem.

gem "typhoeus", "~> 1.3.1"
Enter fullscreen mode Exit fullscreen mode

The Elasticsearch gem docs suggest using an HTTP library such as Typhoeus for optimal performance because it supports persistent ("keep-alive") connections.

Once the gem has been successfully installed then you need to create a client within your code to talk to Elasticsearch. We choose to do this through an initializer file, config/initializers/elasticsearch.rb and it looks like this.

require "elasticsearch"

SearchClient = Elasticsearch::Client.new(
  url: ApplicationConfig["ELASTICSEARCH_URL"],
  retry_on_failure: 5,
  request_timeout: 30,
  adapter: :typhoeus,
  log: Rails.env.development?,
)
Enter fullscreen mode Exit fullscreen mode

Let's go over the arguments we are passing in here.

  • url: (required) We are passing the client a URL param. You communicate to Elasticsearch via HTTP so you need a URL that your client can use to make requests to. In development, by default, this will be http://localhost:9200

The rest of the arguments are optional.

  • retry_on_failure: The number of times the client will retry before it gives up
  • request_timeout: Sets the time limit for a request to get a response. Any request that takes over 30 seconds to respond will timeout.
  • adapter: The HTTP library in ruby we want to use to help us make these requests. As stated above, ideally you want to use Typhoeus because of its support for Keep Alive connections.
  • log: Determines whether your client is outputting logs for each request you are making.

There are many other options you can pass to your client but these are the basic ones that we use. At this point, some people might be inclined to start writing code to throw things in Elasticsearch. I'm not one of those people.

Alt Text

Whenever I add a new external dependency like a database I like to deploy the interface for using it, in this case, the gem, by itself. This way you can deploy and then jump into a console and make sure everything is hooked up correctly before you start using it in your code. If there are any configuration tweaks that need to be made then you can make those without having to worry about the code breaking.

To validate that you have the cluster hooked up correctly you can jump into a Rails console and issue this command with your new SearchClient:

[1] pry(main)> SearchClient.info
ETHON: Libcurl initialized
ETHON: performed EASY effective_url=http://localhost:9200/ response_code=200 return_code=ok total_time=0.392646
=> {"name"=>"mollys_computer",
 "cluster_name"=>"elasticsearch",
 "cluster_uuid"=>"123abc456",
 "version"=>
  {"number"=>"7.5.2",
   "build_flavor"=>"default",
   "build_type"=>"tar",
   "build_hash"=>"8bec50e1e0ad29dad5653712cf3bb580cd1afcdf",
   "build_date"=>"2020-01-15T12:11:52.313576Z",
   "build_snapshot"=>false,
   "lucene_version"=>"8.3.0",
   "minimum_wire_compatibility_version"=>"6.8.0",
   "minimum_index_compatibility_version"=>"6.0.0-beta1"},
 "tagline"=>"You Know, for Search"}
Enter fullscreen mode Exit fullscreen mode

If you get a 200 response back like the one above then you know everything is configured correctly. With the gem setup correctly the next step is to start using Elasticsearch, and we are going to do that by making our first index!

2) Setting Up the Tag Index

Related Pull Request

For this example, I am going to show you how we set up our very simple Tag index. The capabilities of Elasticsearch are tremendous but I want to keep it simple with this example so you have a good base to get you started.

Alt Text

To start, we need to do a couple of different things. First, we need to create our index.

index_settings = { number_of_shards: 1, number_of_replicas: 0 }
settings = { settings: { index: index_settings } }
SearchClient.indices.create(index: "tag_development", body: settings)
Enter fullscreen mode Exit fullscreen mode

Here, we are creating a simple index with 1 shard and 0 replicas. In development, you will often only have a single node, so keeping indexes to a single shard is usually the way to go. However, in production, depending on your data size and number of requests you are making, you may want more shards for your index.

You can run the above command in a console to see it in action. A successful response will look like this:

[37] pry(main)> SearchClient.indices.create(index: "molly", body: settings)
ETHON: performed EASY effective_url=http://localhost:9200/molly response_code=200 return_code=ok total_time=0.65619
2020-02-24 16:00:54 -0500: PUT http://localhost:9200/molly [status:200, request:0.660s, query:n/a]
{"acknowledged":true,"shards_acknowledged":true,"index":"molly"}
Enter fullscreen mode Exit fullscreen mode

Once your index is created, the next thing you will need to do is define your mappings. This is where you will define the fields you want to search for.

I HIGHLY suggest when you are working with Elasticsearch for integrated search within an application that you set your mapping dynamic value to strict. Setting the value to strict means that if you try to index a field that is not in your mappings Elasticsearch will raise an error. When doing integrated search you want to keep your documents lean and mean and this ensures that you don't end up with any surprise fields from possible indexing bugs.

Below are the mappings for our tags index.

{
  "dynamic": "strict",
  "properties": {
    "id": {
      "type": "keyword" 
    },
    "name": {
      "type": "text",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    },
    "hotness_score": {
      "type": "integer"
    },
    "supported": {
      "type": "boolean"
    },
    "short_summary": {
      "type": "text"
    },
    "rules_html": {
      "type": "text"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Before I move on, I want to point out a couple of things here. You probably noticed that we are mapping our id field as a keyword rather than an integer. This is because keywords are optimized for terms queries which is what we will be doing with our ID field. However, for a field like hotness_score, we want to use an integer because we will be searching that using range queries with things like greater or less than.

Another thing you will notice is that name has two types. The text datatype means that we will analyze the field and break it up into tokens to make it easier to full-text search. The keyword datatype is viewed by calling name.raw. Our raw field is storing the name as is, in one complete string. Having two field types allows us to search the tokens of the tag name or the entire name itself.

Ok, now that you understand a little bit about our mappings, lets talk about how we apply them to our newly created index. To keep our linters happy we have the mappings stored in a JSON file and then we import them into our Ruby file like so:

MAPPINGS = JSON.parse(File.read("config/elasticsearch/mappings/tags.json"), symbolize_names: true).freeze
Enter fullscreen mode Exit fullscreen mode

Once we have the mappings set, the next step is to apply them to the new index we just created. You can do this by executing the code below

SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)
Enter fullscreen mode Exit fullscreen mode

If the request is successful you should get a response like this

[38] pry(main)> SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)
ETHON: performed EASY effective_url=http://localhost:9200/tags_development/_mapping response_code=200 return_code=ok total_time=0.079915
2020-02-24 16:45:56 -0500: PUT http://localhost:9200/tag_development/_mapping [status:200, request:0.095s, query:n/a]
2020-02-24 16:45:56 -0500: > {"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"text","fields":{"raw":{"type":"keyword"}}},"hotness_score":{"type":"integer"},"supported":{"type":"boolean"},"short_summary":{"type":"text"},"rules_html":{"type":"text"}}}
2020-02-24 16:45:56 -0500: < {"acknowledged":true}
Enter fullscreen mode Exit fullscreen mode

Even though you got a 200 response back, you might still want to double-check that your index was created correctly. Once again, you can do this in a console like so:

[2] pry(main)> SearchClient.indices.get(index: "tags_development")
ETHON: performed EASY effective_url=http://localhost:9200/tag_development response_code=200 return_code=ok total_time=0.048122
=> {"tags_development"=>
   "mappings"=>
    {"dynamic"=>"strict",
     "properties"=>
      {"hotness_score"=>{"type"=>"integer"},
       "id"=>{"type"=>"keyword"},
       "name"=>{"type"=>"text", "fields"=>{"raw"=>{"type"=>"keyword"}}},
       "rules_html"=>{"type"=>"text"},
       "short_summary"=>{"type"=>"text"},
       "supported"=>{"type"=>"boolean"}}},
   "settings"=>
    {"index"=>
      {"creation_date"=>"1581527116462", "number_of_shards"=>"1", "number_of_replicas"=>"0", "uuid"=>"kO-MGUiFSJObSMY_22mrzg", "version"=>{"created"=>"7050299"}, "provided_name"=>"tag_development"}}}}
Enter fullscreen mode Exit fullscreen mode

Now that we have verified that our index is created and has the proper mappings, it's time to start filling it with data!

Alt Text

3) Indexing a Tag Document

Related Pull Request

Before we can send data to Elasticsearch, we first have to get it in the proper format by serializing it. To handle serializing our ActiveRecord model we use the Fast JSON API serializer.

module Search
  class TagSerializer
    include FastJsonapi::ObjectSerializer

    attributes :id, :name, :hotness_score, :supported, :short_summary, :rules_html
  end
end
Enter fullscreen mode Exit fullscreen mode

Once you have a way to serialize your model data, then all that is left to do is make the request to send it to Elasticsearch. Here is how we do that with our SearchClient:

tag = Tag.find(id)
serialized_data = Search::TagSerializer.new(tag).serializable_hash.dig(:data, :attributes)
SearchClient.index(id: tag.id, index: "tags_development", body: serialized_data)
Enter fullscreen mode Exit fullscreen mode

Here is what a successful response to the index request above will look like:

{"_index"=>"tags_development", "_type"=>"_doc", "_id"=>"39", "_version"=>10, "result"=>"created", "_shards"=>{"total"=>1, "successful"=>1, "failed"=>0}, "_seq_no"=>351, "_primary_term"=>3}
Enter fullscreen mode Exit fullscreen mode

Another way we can validate that our indexing worked correctly, is by asking Elasticsearch for the tag document using a GET request.

SearchClient.get(id: tag.id, index: "tags_development") 
Enter fullscreen mode Exit fullscreen mode

The above request will give you a response containing all of your tag data in the _source param of the response hash.

{"_index"=>"tags_development",
 "_type"=>"_doc",
 "_id"=>"39",
 "_version"=>10,
 "_seq_no"=>351,
 "_primary_term"=>3,
 "found"=>true,
 "_source"=>
  {"id"=>39,
   "name"=>"coolbean",
   "hotness_score"=>4,
   "supported"=>false,
   "short_summary"=>nil,
   "rules_html"=>""}}
Enter fullscreen mode Exit fullscreen mode

Now that our index is set up and we have data in it, it's time for the best part.

Alt Text

4) Searching Tags

Related Pull Request

For this search example, I am only going to show you how to set up a query string search. However, search is where Elasticsearch(obviously) really shines, so I highly encourage you to checkout the search docs they have and explore all of the possibilities.

Let's say we want to search for all tags who have a name that starts with "python" AND we want to sort them by hotness_score. Here is how we would do that:

SearchClient.search(
  index: "tags_development",
  body: {
    query: {
      query_string: {
        query: "name:python*",
        analyze_wildcard: true,
        allow_leading_wildcard: false
      }
    },
    sort: { hotness_score: "desc" }
  }
)
Enter fullscreen mode Exit fullscreen mode

This request is running a basic query, python*, on the name field in our index. We have also added a wildcard character, *, to indicate that we want all tags that have a name that starts with python. When you run that query you are going to get a result that looks like this:

=> {"took"=>251,
 "timed_out"=>false,
 "_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0},
 "hits"=>
  {"total"=>{"value"=>3, "relation"=>"eq"},
   "max_score"=>nil,
   "hits"=>
    [{"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"10",
      "_score"=>nil,
      "_source"=>{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[2]},
     {"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"40",
      "_score"=>nil,
      "_source"=>{"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[0]},
     {"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"41",
      "_score"=>nil,
      "_source"=>{"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[0]}]}}
Enter fullscreen mode Exit fullscreen mode

BOOM prince harry gif

BOOM! We just ran our first Elasticsearch query! The last thing we need to do is dig out the document hits, aka tags, from our response.

results = SearchClient.search(...)
results = search(query_string)
  results.dig("hits", "hits").map { |tag_doc| tag_doc.dig("_source") }
end

=> [{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},
    {"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},
    {"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil}]
Enter fullscreen mode Exit fullscreen mode

Your turn!

Now that you have all of the pieces, it is time for you to go out and start integrating Elasticsearch into your own Ruby or Rails application. Let me know if you have any questions. Happy Searching! 😃

Alt Text

PS I've been on a Schitt's Creek binge lately, your welcome for all the GIFs

Discussion

pic
Editor guide
Collapse
artoodeeto profile image
aRtoo

Nice article ma'am awesome as always. Question though. So after getting the results, you will still have to query in your own database so you can get the full information needed? like the comments of the posts, the owner, users who liked, and some metadata? thank you again. :)

Collapse
molly_struve profile image
Molly Struve (she/her) Author

If you want other data associated with tags then yes. Or you store some of that data in Elasticsearch in another index and fetch it that way.

Collapse
storrence88 profile image
Steven Torrence

Hey, Molly! Thanks for writing such a great article/walkthrough. I saw there are gems that act as a wrapper for elastic search like Searchkick.

How do things like that differ from the implementation you’ve shown above? Are there performance benefits to doing it either way?

Collapse
molly_struve profile image
Molly Struve (she/her) Author

The benefit of using the plain ruby wrapper is that you have much more control over what and how you are searching. The more you abstract the Elasticsearch interactions away like with Searchkick the less control you have. The trade-off is that it can be very easy to get up and running quickly with minimal understanding of Elasticsearch itself.

Collapse
storrence88 profile image
Steven Torrence

I see. Makes total sense. Thanks for the quick reply!

Collapse
thejoezack profile image
Joe Zack

Yay for Elasticsearch! We run with docker locally and it makes life easier for us, especially for upgrades and it's nice to be able to wipe our volumes to start with a clean slate.

Great write up, as always!

Collapse
corymcdonald profile image
Cory McDonald

Thank you so much for writing this! I'm leveraging a lot of what Forem is doing for Brave's creators site. Plus the docs written in the repo are great. 😄

Collapse
qdequele profile image
Quentin de Quelen

Hi Molly! Thanks for this article. I would like to know what made you decide to switch from Algolia to Elastic? Is that the price? Or just because you love Elastic in general?

Collapse
molly_struve profile image
Molly Struve (she/her) Author
  • Price
  • More control over creating indexes and queries
  • Elasticsearch is open-source which makes it accessible to everyone and contributes to our efforts in making the DEV platform completely open-source
Collapse
qdequele profile image
Quentin de Quelen

Cool! As expected. I was wondering because I'm kind of in the business.

I'm working on an Algolia open-source alternative. It's free, self-hosted and built-in user-facing search. I'm putting the link here just in case. github.com/meilisearch/meilisearch

Thread Thread
rhymes profile image
rhymes

Interesting! A FTS engine in Rust! Starred :)