<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lalitha Govada</title>
    <description>The latest articles on DEV Community by Lalitha Govada (@lalithagovada).</description>
    <link>https://dev.to/lalithagovada</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843206%2F6054eb71-0f64-454b-8d68-01370a79e675.jpeg</url>
      <title>DEV Community: Lalitha Govada</title>
      <link>https://dev.to/lalithagovada</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lalithagovada"/>
    <language>en</language>
    <item>
      <title>How I Built a Two-Level Cache to Serve Millions of Lookups in Under a Millisecond</title>
      <dc:creator>Lalitha Govada</dc:creator>
      <pubDate>Wed, 25 Mar 2026 20:04:53 +0000</pubDate>
      <link>https://dev.to/lalithagovada/how-i-built-a-two-level-cache-to-serve-millions-of-lookups-in-under-a-millisecond-47hg</link>
      <guid>https://dev.to/lalithagovada/how-i-built-a-two-level-cache-to-serve-millions-of-lookups-in-under-a-millisecond-47hg</guid>
      <description>&lt;p&gt;Every high-traffic system eventually hits the same wall: your data store can't keep up. For us, the breaking point came when a simple product lookup — backed by Elasticsearch — started showing tail latencies creeping past 80ms. At scale, that's the kind of number that keeps you up at night.&lt;br&gt;
The solution wasn't a faster cluster. It was rethinking where data lives before it ever reaches Elasticsearch at all. This post walks through the two-level caching strategy we built using Caffeine as an in-process L1 cache and Redis as a distributed L2 cache, with Elasticsearch sitting behind as the source of truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem with Single-Layer Caching&lt;/strong&gt;&lt;br&gt;
Most teams reach for Redis the moment they need a cache. It's fast, it's familiar, and it works. But Redis still lives over the network. Even on a low-latency internal network, you're paying 1–5ms per hop. Do that a few thousand times per second across many services, and it adds up.&lt;br&gt;
The other option — caching inside the application process using something like Caffeine — gives you sub-millisecond reads, but it doesn't survive restarts, it doesn't share state between instances, and it can balloon your heap if you're not careful.&lt;br&gt;
Neither option alone was good enough. What we needed was both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architecture: Three Layers, One Request Path&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0baekqgqznye73h7du5a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0baekqgqznye73h7du5a.png" alt=" " width="800" height="595"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The lookup flow works like this:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A request arrives. We check Caffeine first. If the key exists in the local heap, we return immediately — no network call, no serialisation, typically under 0.1ms.&lt;/li&gt;
&lt;li&gt;On a Caffeine miss, we check Redis. If Redis has the value, we return it and asynchronously backfill Caffeine so the next request doesn't pay the Redis cost again.&lt;/li&gt;
&lt;li&gt;On a Redis miss, we hit Elasticsearch. We fetch the result, write it back into both Redis and Caffeine, and return the value to the caller.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each layer is a safety net for the one above it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Caffeine for L1?&lt;/strong&gt;&lt;br&gt;
Caffeine is the go-to in-process cache for the JVM. It's built on a variant of the W-TinyLFU eviction policy, which gives it near-optimal hit rates in practice. It supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time-based expiry&lt;/strong&gt; — both after write and after last access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Size-based eviction&lt;/strong&gt; — bounds by entry count or byte weight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async loading&lt;/strong&gt; — blocking only the first thread for a cold key, queuing subsequent requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For our use case, we set a short TTL (30–60 seconds) and a bounded size per service instance. The goal isn't to cache everything — it's to absorb the hot tail of your access distribution, the keys that every instance sees repeatedly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CaffeineConfig.java&lt;/span&gt;
&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;caffeineProductCache&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Caffeine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;expireAfterWrite&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ttlSeconds&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;// default 30 s&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maximumSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxSize&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                              &lt;span class="c1"&gt;// default 5 000&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;recordStats&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;   &lt;span class="c1"&gt;// exposes hit rate to Micrometer / Prometheus&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;recordStats() wires Caffeine's internal hit/miss counters into Micrometer — your hit rate shows up in Prometheus or Cloud Monitoring with zero extra code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Redis for L2?&lt;/strong&gt;&lt;br&gt;
Redis bridges the gap between the ephemeral local cache and the durable source of truth. Its role here is twofold: it survives application restarts, and it's shared across all service instances. When a new pod spins up and Caffeine is cold, Redis absorbs the load that would otherwise spike straight to Elasticsearch.&lt;br&gt;
We use Redis with a longer TTL than Caffeine — typically 5–15 minutes depending on the data domain — and we're careful about serialisation. We lean on a compact binary format (MessagePack) rather than JSON to reduce memory footprint and deserialisation cost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// RedisConfig.java — MessagePack gives ~32% smaller payloads vs JSON&lt;/span&gt;
&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt; &lt;span class="nf"&gt;msgpackObjectMapper&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ObjectMapper&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MessagePackFactory&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;registerModule&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JavaTimeModule&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;disable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SerializationFeature&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;WRITE_DATES_AS_TIMESTAMPS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RedisTemplate&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;redisTemplate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;RedisConnectionFactory&lt;/span&gt; &lt;span class="n"&gt;connectionFactory&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt; &lt;span class="n"&gt;msgpackObjectMapper&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;serializer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Jackson2JsonRedisSerializer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;msgpackObjectMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RedisTemplate&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;();&lt;/span&gt;
    &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setConnectionFactory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connectionFactory&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setKeySerializer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StringRedisSerializer&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setValueSerializer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serializer&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;afterPropertiesSet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing worth being explicit about: &lt;strong&gt;Redis is not a replacement for your data store&lt;/strong&gt;. It is a cache. Plan your TTL strategy accordingly and make sure your invalidation logic is correct.&lt;br&gt;
On GCP, point REDIS_HOST at your &lt;strong&gt;Memorystore for Redis&lt;/strong&gt; instance IP — fully managed, private VPC, automatic failover. It's a drop-in replacement, no code changes needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Cache-Aside Pattern in Code&lt;/strong&gt;&lt;br&gt;
The core logic is where everything comes together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;java&lt;/span&gt;&lt;span class="c1"&gt;// ProductCacheService.java&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt; &lt;span class="nf"&gt;getProduct&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentTimeMillis&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// L1 — Caffeine&lt;/span&gt;
    &lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;caffeineCache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getIfPresent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromCaffeine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// L2 — Redis&lt;/span&gt;
    &lt;span class="nc"&gt;Object&lt;/span&gt; &lt;span class="n"&gt;redisRaw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redisTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;opsForValue&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redisKey&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redisRaw&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;redisProduct&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;backfillCaffeineAsync&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redisProduct&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// don't block the caller&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromRedis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redisProduct&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// L3 — Elasticsearch&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;productRepository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;esProduct&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;backfillRedis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;esProduct&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;       &lt;span class="c1"&gt;// TTL 10 min&lt;/span&gt;
                &lt;span class="n"&gt;caffeineCache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;esProduct&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// warm L1 too&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromElasticsearch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;esProduct&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
            &lt;span class="o"&gt;})&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;orElseGet&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;notFound&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key discipline here is that every miss at a higher layer backfills that layer on the way back. This is the cache-aside (lazy-loading) pattern, and it keeps your caches warm without requiring a separate warming process.&lt;br&gt;
Notice that the Caffeine backfill after an L2 hit is async — the caller doesn't wait. If it fails, the next request just hits Redis again. No correctness issue.&lt;br&gt;
The LookupResult record tells the caller which layer answered, which makes warm-up visible in real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;java&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;servedBy&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;latencyMs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt; &lt;span class="nf"&gt;fromCaffeine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"L1_CAFFEINE"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt; &lt;span class="nf"&gt;fromRedis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"L2_REDIS"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;LookupResult&lt;/span&gt; &lt;span class="nf"&gt;fromElasticsearch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;LookupResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"L3_ELASTICSEARCH"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Try it yourself — watch the layer flip as the cache warms up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bashdocker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
./mvnw spring-boot:run

&lt;span class="c"&gt;# Create a product&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/api/products &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"id":"p1","name":"Widget Pro","category":"widgets","price":9.99,"inStock":true}'&lt;/span&gt;

&lt;span class="c"&gt;# First call — cold cache&lt;/span&gt;
curl http://localhost:8080/api/products/p1
&lt;span class="c"&gt;# → {"servedBy":"L3_ELASTICSEARCH","latencyMs":45}&lt;/span&gt;

&lt;span class="c"&gt;# Second call — Caffeine is warm&lt;/span&gt;
curl http://localhost:8080/api/products/p1
&lt;span class="c"&gt;# → {"servedBy":"L1_CAFFEINE","latencyMs":0}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Invalidation: The Hard Part&lt;/strong&gt;&lt;br&gt;
Cache invalidation in a two-level setup is where most teams get caught out. You now have two places holding potentially stale data, and they expire on different schedules.&lt;br&gt;
Our approach has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;TTL-based expiry as the baseline. Short TTLs in Caffeine mean local caches self-heal quickly. Longer TTLs in Redis reduce ES load for moderately static data.&lt;/li&gt;
&lt;li&gt;Event-driven invalidation for critical updates. When a product is updated, we publish an event. Each service instance subscribes and evicts the key from both Caffeine and Redis immediately. This ensures strong consistency when it matters.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;invalidate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;caffeineCache&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;invalidate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;                              &lt;span class="c1"&gt;// local L1&lt;/span&gt;
    &lt;span class="n"&gt;redisTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redisKey&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;                        &lt;span class="c1"&gt;// shared L2&lt;/span&gt;
    &lt;span class="n"&gt;redisTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;convertAndSend&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;INVALIDATION_CHANNEL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// notify all pods&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Redis pub/sub for cross-instance Caffeine invalidation. Without this, the problem is subtle but painful. Pod A updates a product and evicts its own Caffeine copy. Pods B and C still hold the old value in their local heap and will keep serving it until their 30-second TTL expires naturally. In a cluster of 10 pods, 9 out of 10 requests could still be hitting stale data, and you'd have no visibility into it from your metrics. The pub/sub channel costs almost nothing and closes that window from 30 seconds down to milliseconds. We use a lightweight Redis pub/sub channel so that an invalidation event on one instance propagates the eviction to all running instances within milliseconds.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;java&lt;/span&gt;
&lt;span class="c1"&gt;// CacheInvalidationListener.java — runs on every pod&lt;/span&gt;
&lt;span class="nd"&gt;@Override&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Message&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;productId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getBody&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;productCacheService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;evictLocalCaffeine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;productId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, a product update on pod A evicts A's Caffeine but pods B and C serve stale data for up to 30 seconds. With it, eviction propagates to all pods within milliseconds at near-zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;br&gt;
After rolling this out across our product lookup service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P50 latency&lt;/strong&gt; dropped from ~15ms to ~0.3ms for hot keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elasticsearch request volume&lt;/strong&gt; fell by over 70% during peak traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache hit rate&lt;/strong&gt; across both layers held above 95% for our access pattern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is not novel — most high-scale systems use some variant of this. But the implementation details matter, and getting those details right in production is where the real engineering lives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Not to Use This Pattern&lt;/strong&gt;&lt;br&gt;
Two-level caching adds operational complexity. Before reaching for it, ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is your data highly cacheable? Frequently changing data will see poor hit rates and risk serving stale values.&lt;/li&gt;
&lt;li&gt;Do you have strong consistency requirements? If stale reads are unacceptable, caching may not be appropriate at all without careful invalidation guarantees.&lt;/li&gt;
&lt;li&gt;Are you actually bottlenecked at the data layer? Profile first. Premature caching is its own kind of technical debt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to all three is yes, this pattern will likely cause more pain than it relieves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Thoughts&lt;/strong&gt;&lt;br&gt;
The two-level cache pattern is one of the highest-leverage architectural moves you can make for read-heavy systems. Caffeine keeps your hottest data at heap speed. Redis absorbs cross-instance and restart volatility. Elasticsearch stays your source of truth without being beaten to death by repetitive reads.&lt;br&gt;
The full working implementation — including config, tests, docker-compose, and GCP deployment notes — is all in the repo:&lt;br&gt;
👉 &lt;a href="https://github.com/lalithaGovada/two-level-cache-demo" rel="noopener noreferrer"&gt;https://github.com/lalithaGovada/two-level-cache-demo&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
