DEV Community

Cover image for Better results for product search on Elasticsearch
Ayrton Fidelis
Ayrton Fidelis

Posted on

Better results for product search on Elasticsearch

Intro

This week we managed to significantly improve the search feature in one of the platforms of our company with a tricky but simple technique that took some considerable research time, and because it was really hard to find references about how to implement it I decided to write about what I found and maybe help some random dev out there, so here we go:

Scenario

First of all, we're talking about a "product search" feature. It happens by fetching products from a bunch of APIs and storing them in Elasticsearch to... you know... search.

These APIs often provide too little data to be searched on. Descriptions are too vague and breaf, titles are often confuse, no categorization, but that's not explainable to our users. When a user searches for "laptops", they mostly want a laptop, not a book about laptops or even a toy laptop.

That's where creativity entered into play.

Patterns arise

When brainstorming about ways to improve this product search, we whitnessed a pattern: the title of the products would most of the times start with it's name and follow with some characteristics and features, like in "Laptop Asus i7 8GB RAM", but the results that our users were getting included "Barbie Laptop" and "How to fix laptops - The definitive guide".

The user looking to buy laptops would often search for "laptops" or include a combination of desired characteristics like "laptop 16GB i7", so we realized that our users would get more relevant products in the first results if we increased the score of results that matched the first word of the search term with the first word of the product title. That's what we did.

The Span First Query

The right tool for this job would be the Span First Query, which matches "spans" near the beginning of a field. This query expects an end property that defines the number of terms in the beggining of the field to perform a query, and a match property that defines another Span Query to be performed against these first terms.

In this case we're looking to match terms, so we're using the Span Term Query. Here's what our final query looks like:

GET /products/_search
{
  "query": {
    "span_first": {
      "match": {
        "span_term": {
          "title": {
            "term": "laptops"
          }
        }
      },
      "end": 1
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

But the often overlooked detail about the Span Term Query is that it compares the term that we provide with the normalized tokens of the field (like when you store the words "Dog" and "dogs" and Elasticsearch lowercases and de-pluralize them, more details in this amazing video that anyone ever touching Elasticsearch should watch), but the Span Term Query expects the term that we provide to be already normalized, so we should take a step back and take a look at the Analyze API in order to achieve this.

Normalizing the search term

We need to turn the first word of our search string into a normalized token to be matched with the first term of the Span Query, and for that we use the Analyze API. We must be sure that the analyzer that's being used here is the same used when indexing the field that we'll be matching against. The easiest method to do that is to provide the name of the field to derive this setting from the field mapping.

GET /products/_analyze
{
  "field" : "title",
  "text" : "laptops"
}
Enter fullscreen mode Exit fullscreen mode

The response of this request will contain information about the tokens:

{
  "tokens": [
    {
      "token": "laptop",
      "start_offset": 0,
      "end_offset": 6,
      "type": "word",
      "position": 0
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

And this way we got out normalized token "laptop", which is what we must use in the Span Term Query. Remember this, it's important.

Now we need to embed our Span First query into the query that we're using to search for products and increase ther score of the results that match with this new clause of the query.

Using the Span First Query to boost our previous results

Here's a simplified version of our previous query, with a must property saying that all of our results must include the search term in the title, and a should property saying that results which include the search term in the description as well should have a higher score:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "laptops"
          }
        }
      ],
      "should": [
        {
          "match": {
            "description": "laptops"
          }
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

To increase the score of results with matches on the first term, we'll put the span query on the should clause:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "laptops"
          }
        }
      ],
      "should": [
        {
          "match": {
            "description": "laptops"
          }
        },
        {
          "span_first": {
            "match": {
              "span_term": {
                "title": {
                  "term": "laptop"
                }
              }
            },
            "end": 1
          }
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The results

The results are great.

Our users now do find what they're looking for in the first products returned by our search feature. This change led to an increase in the value perception of our product and stakeholders are happy, but remember that this technique worked really well for the patterns detected in our data and our user's behavior, so do some investigation to see if it is the right solution for your use case.

Cover image from Andrea Piacquadio.

Top comments (0)