2.9 - ElasticSearch Autocomplete Indexing

Release 2.9 was a partial success on my part, cause in the end I did manage to get Author autocomplete indexing into the back-end.

Admittedly, I've put this blog off a little too long. Partly, because my other courses needed my explicit attention lest I want to fail them all, but also, because I'm rather afraid of having to explain just exactly how I got it working.

A long story short, lots of time playing around in the Kibana dev tools, lots of time pouring into the oddly structured ElasticSearch documentation, and lots of time waiting for docker images to be rebuilt and posts to be re-indexed. All in all, many many trial and errors.

But why does it work? Well, that's a whole other story. I can not guarantee complete accuracy, and can only explain it to how I understand it.

Explaining the code

To begin with, here's the code, and in case this ever gets lost, the permalink to the code with a huge blob of comment consisting of many links to various documentation.

const setupPostsIndex = async () => {
  try {
    const response = await client.indices.exists({ index });
    // If the index doesn't exist, 404 statusCode is returned
    if (response.statusCode === 404) {
      await client.indices.create({
        index,
        body: {
          settings: {
            analysis: {
              analyzer: {
                autocomplete_analyzer: {
                  tokenizer: 'autocomplete',
                  filter: ['lowercase', 'remove_duplicates'],
                },
                autocomplete_search_analyzer: {
                  tokenizer: 'lowercase',
                },
              },
              tokenizer: {
                autocomplete: {
                  type: 'edge_ngram',
                  min_gram: 1,
                  max_gram: 20,
                  token_chars: ['letter', 'digit'],
                },
              },
            },
          },
          mappings: {
            properties: {
              author: {
                type: 'text',
                fields: {
                  autocomplete: {
                    type: 'text',
                    analyzer: 'autocomplete_analyzer',
                    search_analyzer: 'autocomplete_search_analyzer',
                  },
                },
                analyzer: 'standard',
              },
            },
          },
        },
      });
    }
  } catch (error) {
    logger.error({ error }, `Error setting up ${index} index`);
  }
};

For starters, ElasticSearch will complain if we try to create an index if it already exists. To save ourselves from the error spam in case ElasticSearch data was stored and the index already exists, an extra if-statement was added.

It was decided to use the Edge N-gram tokenizer. To make use of that we'd need to set up a custom analyzer which would be set to use our custom tokenizer of the Edge N-gram type and with our chosen settings. All of this is to be done in the analysis process inside of index settings.

After that, we'd need to specifically map which properties (in this case, author) are to use the custom analyzer. For Telescope, I used a trick to designate a custom field inside the author property to use the custom analyzers instead. This way, our original search query on the index wouldn't be affected.

The decision to create an index instead of updating the already existing index is because

The index.create() function we use to index posts will create the index (if it doesn't already exist), and add a document to this index. If there weren't any previous settings and mappings, it'd only apply the standard options.
We can only update an index setting on a closed index, this would involve closing, updating, and re-opening the index.
Furthermore, the Update Settings API hasn't been implemented yet.

Now with this new indexing, while searching, instead of querying on the author property for an exact match, we can now search on the author.autocomplete field for more detailed, possible matches.

    query: {
      match: {
        'author.autocomplete': {
          query: author,
          operator: 'and',
        },
      },
    },
    highlight: {
      fields: {
        'author.autocomplete': {},
      },
    },

highlight is an ES Search option to show where the query matches are in the search results of our designated field.

Since each author could have more than one post, we had a problem with multiples of the same author showing up in the queries. Josue (alumni) introduced me to the reduce() function as a possibility to deal with it.

Now a query on the new authors/autocomplete/author=r l will look something like this.

Conclusion

This may have looked easy, but one thing I learned the hard way was that we had to specify settings and mappings in the body in JavaScript mode, while this wasn't needed in console mode. Hopefully this knowledge will save time for anyone else implementing autocomplete stuff in the future.

Amusingly, before I got a chance to break Telescope, I also encountered Telescope seemingly breaking my WSL, during the many times of re-indexing and re-building docker images.

During my local runs I like to run

docker-compose --env-file ./config/env.development up --build (whatever images I want to build)

and

docker-compose --env-file ./config/env.development down

to build and stop images.
Sometimes, my WSL trips up and returns a message such as this.

Traceback (most recent call last):
  File "docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 81, in main
  File "compose/cli/main.py", line 200, in perform_command  
  File "compose/cli/command.py", line 40, in project_from_options
  File "compose/config/environment.py", line 67, in from_env_file
  File "compose/config/environment.py", line 57, in _initialize
FileNotFoundError: [Errno 2] No such file or directory      
[22287] Failed to execute script docker-compose

When this happens, my solution is to close all the existing terminals, open up WSl, and run the commands again. Doing it this way always works, so my only conclusion is that somehow my WSL got confused and messed up in between running all those commands.

The next part is to try get this indexing into Parser, which surprisingly giving me some problems. Then, implementing autocomplete in the front-end. Finally, it'd be to add some tests so we can catch errors when we try to update to ES 8.0+.