Digging into Hibernate's Query Cache

#java #hibernate #cache #querycache

I haven't used Hibernate for a long time, and I haven't blogged about it for even longer. Recently, I was working on a blog post in the context of my job on setting up an evergreen cache. When I was coding the demo, I faced some issue regarding Hibernate's Query Cache: it didn't work as I expected it to. Finally, after some time, I managed to fix the issue.

This post aims to dig deeper into Hibernate's Query Cache in order to help fellow developpers confronted with the same issue.

Entity caches

Hibernate offers different kind of caches, for different objects. Chief among them are entities:

Entities represent persistent data stored in a relational database automatically using container-managed persistence. They are persistent because their data is stored persistently in some form of data storage system, such as a database: they do survive a server failure, failover, or a network failure. When an entity is reinstantiated, the state of the previous instance is automatically restored.

Hibernate offers a two-tiered entity cache, unimaginatively named Level 1 Cache and Level 2 Cache.

Level 1 Cache:

The L1C is enabled by default - and it's not possible to disable it AFAIK. It's automatically managed by the Session object. The cache's lifecycle is bound to the session's.

In standard web applications, a Hibernate Session is opened for each HTTP request. That means that every new request starts from a cold cache. Hence, data needs to be reloaded from the database.
Level 2 Cache:

The L2C needs to be enabled explicitly. It relies on a third-party caching solution e.g. EHCache, JCache, Hazelcast, etc. Hibernate offers a SPI for providers to be used by implementations to interface with the framework. Cache capabilities are implementation-dependent i.e. the storage format, whether data is distributed or not, etc.

The L2C is managed through the SessionFactory: in web applications, a single instance is initialized at startup. That means that L2C, in opposition to L1C, can be (and is) used across multiple HTTP requests. +

In other words, only L2C has an impact on performance: because it's disabled by default and because it caches across requests.

Entity caches are used in two different cases:

When the entity is loaded by its primary key. This is the case when the EntityManager.find(Class<T> clazz, Object primaryKey) method is called. With Spring Data JPA, the latter is wrapped by the CrudRepository.findById(ID id) method implementation.
When any other method of the EntityManager generates a SELECT query. This happens in a lot of cases e.g. with any of the createXXXQuery() method or when using the more typesafe CriteriaBuilder. With Spring Data JPA, this is the case for any of the custom methods added in one's of JPA repositories.

Entities that are loaded - in the 1^st case, and queried - in the 2^nd case, are stored in the cache. However, only in the 1^st case are entities read from the cache. Consider the following example:

@Entity
@Cache(region = "entities", usage = READ_WRITE)
class Thing(@Id val id: Long, val text: String)

interface ThingRepository : JpaRepository<Thing, Long>

@SpringBootApplication
class Sample {

    @Bean
    fun init(repo: ThingRepository) = CommandLineRunner() {
        repo.findAll()                                       // 1
        repo.findById(1L)                                    // 2
    }
}

All entities are loaded from the database, and stored in the cache
Entity with PK 1L will be loaded from the cache - if it exists

Now, let's change the init() function implementation:

@Bean
fun clr(repo: ThingRepository) = CommandLineRunner() {
    repo.findAll()                                          // 1
    repo.findAll()                                          // 2
}

All entities are loaded from the database, and stored in the cache
Though all entities are in the cache, it's not used: entities are still loaded from the database

The Query Cache

It's possible to actually read the cached results from general SELECT queries, the 2^nd case above. That requires an additional cache, the query cache. Enabling the query cache is two-steps process:

Enable the query cache proper.

For example, in Spring Boot, just add the following to the application.yml:

spring:
  jpa:
    properties:
      hibernate:
        cache:
          use_query_cache: true

Enable the query cache per each query that needs to be cached:

With Spring Data JPA, each query method must be annotated with @QueryHints(QueryHint(name = HINT_CACHEABLE, value = "true")).
If the method is not custom i.e. it's already provided by the parent JpaRepository e.g. findAll(), it needs to be overriden and the overriding method annotated:
```
interface ThingRepository : JpaRepository<Thing, Long> {

    @QueryHints(QueryHint(name = HINT_CACHEABLE, value = "true"))
    override fun findAll(): List<Thing>
}
```

With the same code as above, the results will be returned from the cache, and the database won't be accessed:

@Bean
fun clr(repo: ThingRepository) = CommandLineRunner() {
    repo.findAll()                                          // 1
    repo.findAll()                                          // 2
}

All entities are loaded from the database, and stored in the cache
The entities will be returned from the cache

A sample demo project

I've created a simple Spring Boot demo project, using Spring Data JPA and Spring Shell.
Both the L2C and the Query Cache are enabled, as well as the Hibernate statistics.
The project offers several commands:

entities reads all entities from the database using the findAll() method
cache() displays the content of the L2C
queryCache() displays the content of the query cache

Let's use them in order:

After startup, the L2C is empty:

shell:> cache

+--+----+---------+
|id|text|timestamp|
+--+----+---------+

The query cache is also empty:

shell:> query-cache

+-----+----+
|query|keys|
+-----+----+

Let's now load the entities using the findAll() method.

shell:> entities

+--+----+
|id|text|
+--+----+
|1 |Foo |
|2 |Bar |
|3 |Baz |
+--+----+

Let's make sure the L2C is now filled:

shell:> cache

+----------------------------------+----+----------------+
|id                                |text|timestamp       |
+----------------------------------+----+----------------+
|ch.frankel.blog.querycache.Thing#1|Foo |6508639341051904|
|ch.frankel.blog.querycache.Thing#2|Bar |6508639341240320|
|ch.frankel.blog.querycache.Thing#3|Baz |6508639341244416|
+----------------------------------+----+----------------+

This is also the case of the query cache:

shell:> query-cache

+------------------------------------------------------------------------------+-------+
|query                                                                         |keys   |
+------------------------------------------------------------------------------+-------+
|sql: select thing0_.id as id1_0_, thing0_.text as text2_0_ from thing thing0_;|[1,2,3]|
|parameters: ;                                                                 |       |
|named parameters: {};                                                         |       |
|transformer: org.hibernate.transform.CacheableResultTransformer@110f2         |       |
+------------------------------------------------------------------------------+-------+

Note that the query cache doesn't store the entities themselves, but only their primary keys. Entities are then loaded from the L2C.

The caching behavior can be confirmed by calling the entities command again, and having a look at the Hibernate statistics:

Session Metrics {
    20810 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    0 nanoseconds spent preparing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    1171505 nanoseconds spent performing 4 L2C hits;
    2443442 nanoseconds spent performing 1 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    2799 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
}

It mentions 4 cache hits: one for the query cache, and the other 3 for the L2C.

Conclusion

Caching is a trade-off: it boosts performance by accepting that cached data might be stale. When the cache itself is the only path that updates the underlying database, the risk of stale data is zero.

A L2C can return individual entities from the cache; a query cache allows to return them in bulk. Think about using the latter along with the former in your projects for an instant performance improvement.

Acknowledgements:

Thanks to my friend Vlad "Vladuts" Mihalcea for his help in reviewing this post.

To go further:

Originally published at A Java Geek on July 5^th, 2020