Improving Backend Performance Part 2/3: Using Database Indexes

#java #sql #database #mariadb

Database indexes are a concern of the developers. They have the potential to improve the performance of search and filter features that use an SQL query in the backend. In the second part of this series of articles, I'll show the impact that a database index has in speeding up filters using a Java web application developed with Spring Boot and Vaadin.

Read part 1 of this series if you want to learn how the example application that we'll use here works. You can find the code on GitHub. Also, and if you prefer, I recorded a video version of this article:

The Requirement

We have a web page with a grid that shows a list of books from a MariaDB database:

We want to add a filter to allow users of this page to see which books were published on a given date.

Implementing the Repository Query and Service

We have to make some changes in the backend to support filtering data by the publish date. In the repository class, we can add the following method:

@Repository
public interface BookRepository extends JpaRepository<Book, Integer> {

    Page<Book> findByPublishDate(LocalDate publishDate, Pageable pageable);

}

This uses lazy loading as we saw in part 1 of this series of articles. We don't have to implement this method—Spring Data will create it for us at runtime.

We also have to add a new method to the service class (which is the class that the UI uses to run business logic). Here's how:

@Service
public class BookService {

    private final BookRepository repository;

    ...

    public Stream<Book> findAll(LocalDate publishDate, int page, int pageSize) {
        return repository.findByPublishDate(publishDate, PageRequest.of(page, pageSize)).stream();
    }

}

Adding a Filter to the Web Page

With the backend supporting filtering of books by publish date, we can add a date picker to the web page (or view) implementation:

@Route("")
public class BooksView extends VerticalLayout {

    public BooksView(BookService service) {

        ...

        var filter = new DatePicker("Filter by publish date");
        filter.addValueChangeListener(event ->
                grid.setItems(query ->
                        service.findAll(filter.getValue(), query.getPage(), query.getPageSize())
                )
        );

        add(filter, grid);
        setSizeFull();
    }

    ...
}

This code just creates a new DatePicker object that listens to changes in its value (via a value change listener). When the value changes we use the service class to get the books published on the date selected by the user. The matching books are then set as items of the Grid.

Testing the Slow Query

We have implemented the filter; however, it is extremely slow if you have, for example, 200 thousand rows in the table. Try it! I wrote an article that explains how to generate realistic demo data for Java applications. With this number of rows, the application took several seconds to show the data on the web page on my machine (MacBook Pro 2,3 GHz Quad-Core Intel Core i5). This completely ruins the user experience.

Analyzing Queries With “Explain Query”

If you enabled query logging, you can find the query that is generated by Hibernate in the server's log. Copy it, replace the questions marks with actual values, and run it in an SQL database client. In fact, I can save you some time. Here's a simplified version of the query:

SELECT id, author, image_data, pages, publish_date, title
FROM book
WHERE publish_date = '2021-09-02';

MariaDB includes the EXPLAIN statement that gives us useful information about how the engine estimates that is going to run the query. To use it, just add EXPLAIN before the query:

EXPLAIN SELECT id, author, image_data, pages, publish_date, title
FROM book
WHERE publish_date = '2021-09-02';

Here's the result:

The documentation has everything you need to know about it, but the important bit is the value in the type column: ALL. This value tells us that the engine estimates that it will have to fetch or read all the rows in the table. Not a good thing.

Creating an Index

Fortunately, we can easily fix this by creating an index on the column that we are using to filter the data: publish_date. Here's how:

CREATE INDEX book\_publish\_date_index ON book(publish_date);

A database index is a data structure created by the engine, usually a b-tree (b for balanced), and that speeds up the process of finding a certain row in a table, that is, searching for a row given the value in the column on which the index is built. The process is faster thanks to the nature of b-trees—they keep the data ordered reducing the time complexity from O(N) to O(log(N)) and even O(log(1)) in some cases.

Testing the Improvement

With the index built, we can run the EXPLAIN statement again and see that the type column shows the value ref instead of ALL:

The ref value means that the engine will use the index when we run the query. It's important that you check this when you add indexes to your more complex queries. Always use the EXPLAIN statement to double-check that you are gaining in performance when you introduce an index.

If you try the web application in the browser and select another date in the date picker (no need to restart the server), you'll see a huge difference! For example, the data is retrieved in less than a second on my machine in contrast to several seconds before we created the index!