YASSINE SABRI

Posted on Jun 21, 2020

How to synchronize MySQL Database with ElasticSearch And perform data querying in a Spring Boot Application

#spring #mysql #elasticsearch #docker

Sometimes the need for using an advanced searching mechanism such as ElasticSearch is not mandatory at the beginning of a project, but as it grows with time this need comes to the surface.

In order to satisfy this technical need and without altering the existing project’s architecture, we can implement a logic to synchronize the database data and index it in ElasticSearch and perform complex querying without impacting the performance of the database in production.

Let’s dive into it and explore how we can implement this …

Disclaimer: I will share only the important code snippets in order to keep the reading quick, you can check the link to the dedicated repo for this article on GitHub at the end.

Setting up the project

First of all, let's create a simple spring boot application with a MySQL database.

Tip: you can use Spring Initializr to set up the application's skeleton structure

Project configuration

In order to interface with ElasticSearch we need to do the following :

Add the data-elasticsearch dependency to the pom.xml

Tip: the version of the engine depends on spring-jpa dependency used within the project (see the compatibility matrix)

2. Add the following properties to the application.properties

_spring.data.elasticsearch.cluster-name: _the cluster name, CLUSTER_NAME is an environment variable that we use if we're running the application in a docker container
spring.data.elasticsearch.cluster-nodes: the cluster nodes
spring.data.elasticsearch.repositories.enabled: to enable repositories so we can use JPA interfaces with ElasticSearch.

3. Define the package's path where the elastic's repositories reside

In our example it's "com.example.springmysqlelastic.repo.elastic".

Project structure

For demonstration purposes, we're going to create a single table in our database and wrap it in its own index. In a real project, we will have a much complex schema and indexes to create.

Create a User entity with first name and last name attributes

Create a model class to represent our User index within ElasticSearch

Create a user repo and extend it from ElasticsearchRepository

We're done settings up the main components for our app.

Synchronize data with ElasticSearch

In order to synchronize our data from MySQL with the user index in ElasticSearch, we're gonna be using a Scheduler in which we will implement the synchronization logic.

First of all, we need to enable Spring's scheduled task execution capability by adding the EnableScheduling annotation :

After that and since we can't sync all the records from the database each time the scheduler is running, we will add a modification date attribute to our user entity and use the UpdateTimestamp annotation to delegate the updating process of its value to Hibernate each time the entity will be changed.

Next, we need to create our scheduler service.

In this example, the scheduler will run every 3 minutes, this period will change from a project to another depending on the importance of synched data.

Each time the main method of the scheduler is executed, we will sync our users.

In our syncUsers() method, we have two scenarios :

When the index is empty at the first launch of the application, we will send all the user records to ElasticSearch
Otherwise, we retrieve only the records that have been changed within the last INTERVAL_IN_MILLISECONDE starting from the time of the execution of the scheduler (as explained in the getModificationDatePredicate _method above)._

We're done with the synchronization, let's move on to the next part.

Querying data from ElasticSearch

We will start by adding a new rest Endpoint to take the search query as input and return the result as a ResultQuery object that will create later on.

After that, we will create a search service where we will build our custom request and retrieve data from ElasticSearch.

In our main function of the service _searchFromQuery() _we do the following :

First, we build the body of the request, in our case, we will create a multi-index search using the query_string function and add the list of fields where we want the engine to look for the giving query.

The body of the request will look something like this:

The fields of the query-string are a static list in the code, but they can be parametrized in the application to have more control and flexibility.

Second, we will execute the post request with the body that we created earlier and wrap the response from ElasticSearch in the ResultQuery

Our ResultQuery class contains three main attributes :

timeTook: ElasticSearch response time in seconds
numberOfResults: the number of total elements retrieved
_elements: a _stringified JSON that represents the total hits found.

Testing the application

To run the application you can either do it locally or in a Docker container. (visit the README file for more info).

I added some records to the database and synced them to the user index.

Tip: you can visualize the index content using the ElasticSearch Head extension in Chrome/Mozilla browsers

As an example, let's look for the users that their first names or last names contain _"est" _:

_Voilà, _we've found 2 records that satisfy our criteria in 0.017 ms.

That's all folks!

Learn more by:

Visiting the Github repo to see the full code of the application.
Visiting the ElasticSearch documentation for more complex querying syntaxes.
For Medium fans you can read the article there.