Mahmoud Nawwar

Posted on May 1

From an article to a library: hybrid caching for Spring Boot, completed

#java #opensource #showdev #springboot

A while ago I published Hybrid Cache Strategy in Spring Boot: A Guide to Redisson and Caffeine Integration. It walked through the core idea: Caffeine in front for sub-millisecond reads, Redis behind for cross-node coherence, integrated cleanly into Spring's @Cacheable abstraction. That article was the starting point — a working pattern that demonstrated the approach.

This is the follow-up. The pattern that started in the article is now a published Spring Boot library: hybrid-cache-spring-boot-starter, available on Maven Central. Going from a working pattern to a library you can deploy in production turned out to require eight specific things that the article didn't cover. This post walks through each of them — what the production-shaped solution looks like, why each piece matters, and how it all fits together.

If you read the original article and want to use the pattern, you can stop hand-rolling the integration and pull in the library instead:

<dependency>
    <groupId>io.github.nwwarm</groupId>
    <artifactId>hybrid-cache-spring-boot-starter</artifactId>
    <version>0.2.0</version>
</dependency>

The rest of this article walks through what's inside.

1. Cross-node invalidation, the complete pipeline

The original article showed the basic invalidation flow: when Redis evicts an entry, the local Caffeine cache evicts it too. That handles the simple case. The complete pipeline needs to handle three more cases:

put overwrites. When a node updates an existing key, other nodes need to invalidate their L1 copy.
TTL expirations. When Redis expires an entry, listeners on other nodes should know.
Self-skip. The originating node should not receive its own invalidation message, otherwise it would invalidate the L1 entry it just populated and re-fetch from Redis on its very next read.

The library implements this with a single Redis pub/sub topic and a central dispatcher. Every cache instance publishes invalidation messages to one global topic. A central InvalidationDispatcher subscribes once per JVM and routes incoming messages to the right cache by name. The originating node skips its own messages via a node ID embedded in the message.

public record InvalidationMessage(
    String nodeId,
    String cacheName,
    String op,
    String key
) implements Serializable {
    public static final String OP_INVALIDATE = "I";
    public static final String OP_CLEAR = "C";
}

@Override
public void onMessage(CharSequence channel, InvalidationMessage msg) {
    if (nodeId.equals(msg.nodeId())) return;          // self-skip
    Cache cache = cacheManager.getCache(msg.cacheName());
    if (cache == null) return;
    switch (msg.op()) {
        case "I" -> cache.evict(msg.key());
        case "C" -> cache.clear();
    }
}

Why one topic instead of one per cache: Redis pub/sub doesn't scale well to high channel cardinality, and the cost of filtering by cache name in user code is negligible. Hazelcast, Coherence, and Apache Ignite all use this pattern for the same reason. The single-topic design also makes the invalidation channel debuggable — you can redis-cli SUBSCRIBE cache:invalidate and watch every cache event in your application live, which is invaluable during incident response.

A subtle but important detail: keys are stringified at the cache boundary. Cache keys arrive from Spring's @Cacheable as typed Java objects (Long, UUID, SimpleKey), but at the L2 boundary they need to round-trip through Redis pub/sub. Without explicit handling, type information is lost on the wire. Stringifying at the boundary unifies the wire format with the existing string-based Redis bucket naming. The contract: cache keys must have stable, unique toString() representations. Built-in types satisfy this; custom key types are the application's responsibility.

2. O(1) cache clears via generation counters

@CacheEvict(allEntries = true) is one of those operations that looks simple in a tutorial and gets dangerous at scale. The naive implementations are:

Iterate-and-delete (e.g., SCAN + DEL). O(n) in cache size. For a cache with millions of entries, this is a multi-second blocking operation that affects throughput for other Redis clients.
flushdb. Clears the entire Redis, including data from other applications. Almost never what you want.

The library uses generation counters. Every cache key is written under a generation prefix:

{cacheName}:{generation}:{key}

To clear the cache, increment the generation atomically. Old keys are now unreachable — no read will look for them — and Redis's natural TTL eventually evicts them. The clear operation becomes O(1): one INCR plus one pub/sub publish, regardless of cache size.

private RBucket<Object> bucket(Object key) {
    return redisson.getBucket(cacheName + ":" + currentGeneration() + ":" + key);
}

@Override
public void clear() {
    caffeineCache.clear();
    Long newGen = breaker.executeSupplier(distributedGeneration::incrementAndGet);
    localGeneration.set(newGen);
    publishInvalidation(InvalidationMessage.OP_CLEAR, null);
}

Two things to know about this design:

Memory cost. Old keys orphan in Redis until their TTL expires. If you clear the cache more frequently than the TTL window (e.g., clearing every 30 minutes with a 24-hour TTL), orphan memory accumulates. Generation-counter clear is correct for caches that clear rarely (deploys, manual operations) and the wrong choice for caches cleared every few minutes — for those, eager SCAN+DEL is better. The library plans a clearImmediate() method for that case.

Cross-node visibility. Other nodes need to learn the new generation number. The library has them refresh lazily (every 1 second) and immediately on receiving the clear message via pub/sub. So a clear on node A is visible to node B within ~1 second worst case, ~10ms typical case.

3. Two-tier single-flight to prevent thundering herds

A "thundering herd" or "cache stampede" happens when a popular key expires and many concurrent requests all try to load it from the source at once. With N application instances each fielding M concurrent requests, you get N×M backend loads. For a database-backed loader on a hot key, this can take down the database.

The library uses a two-layer single-flight pattern:

Local single-flight uses Caffeine's atomic Cache.get(key, mappingFunction). Caffeine guarantees only one thread per JVM enters the function for a given key; other concurrent threads wait. This protection works whether Redis is healthy, slow, or completely unreachable.

Cross-node single-flight uses a Redisson RLock inside the local loader function. The first node to acquire the lock loads from the source; other nodes wait. After the holder populates Redis and releases the lock, waiters re-check the cache, find the value, and return without calling the loader.

@Override
public <T> T get(Object key, Callable<T> valueLoader) {
    String stringKey = stringify(key);
    ValueWrapper wrapper = get(stringKey);
    if (wrapper != null) return (T) wrapper.get();

    com.github.benmanes.caffeine.cache.Cache<Object, Object> nativeCaffeine =
        (com.github.benmanes.caffeine.cache.Cache<Object, Object>) caffeineCache.getNativeCache();

    Object stored = nativeCaffeine.get(stringKey, k -> {
        // Caffeine guarantees only one thread per JVM enters here for key k.
        Object distValue = readFromL2(k);
        if (distValue != null) return distValue;
        Object loaded = loadWithDistributedLock(k, valueLoader);
        return loaded == null ? NullValue.INSTANCE : loaded;
    });
    return (T) (stored instanceof NullValue ? null : stored);
}

The two layers compose naturally. Per-JVM herd is always eliminated by Caffeine. Cross-JVM herd is eliminated when Redis is healthy and the lock can be acquired. When Redis is unreachable, cross-node coordination fails open — local single-flight still protects each JVM, so the worst case during a Redis outage is N database loads across N nodes, not N × threads-per-node.

A subtle but important detail: the re-check after acquiring the distributed lock. Without it, every waiter that queues on the lock would execute the loader sequentially after the holder releases — you'd serialize the herd instead of eliminating it. The re-check is what turns "N concurrent loads" into "1 load + N-1 cache reads."

4. Why Redisson, specifically

The library uses Redisson rather than Spring Data Redis with hand-rolled distributed primitives, and the choice is worth justifying.

Redisson's RLock has watchdog auto-renewal. While the holder thread is alive, Redisson runs a background task that periodically extends the lease (default: re-renews every 10 seconds for a 30-second lease). If your loader unexpectedly takes 35 seconds when you sized the lease for 30, the lock doesn't expire under it.

The alternative — a hand-rolled SET NX PX lock with a Lua unlock script — has no watchdog. The lease must therefore exceed worst-case loader latency with margin. If your loader p99 is 10s, you set lease to 30s. If a query plan regression makes it 35s, the lock expires, another waiter acquires, and two threads run the loader. Stampede protection just degraded.

Redisson's lock waiters use Redis pub/sub for wake-up. When the holder unlocks, Redisson publishes to a channel; waiters subscribed to that channel wake up immediately rather than polling. A hand-rolled lock falls back to polling intervals (typically 50ms), which means up to 50ms of unnecessary delay on every lock release. For a cache lock under high contention, this matters.

Redisson handles cluster redirects transparently. In Redis Cluster, operations on keys that have moved between shards receive MOVED or ASK redirects from Redis. Redisson catches these and retries against the new owner shard automatically. With raw Lettuce commands you'd write that logic yourself.

Richer primitives map directly to the design. The library uses RBucket (key-value with TTL), RAtomicLong (the generation counter), RTopic (pub/sub), and RLock. With Lettuce these would each require building on top of raw commands — about 30% more code in the library, particularly in the lock implementation.

The tradeoff is dependency footprint. Redisson pulls in Netty, JBoss Marshalling, and several other transitive dependencies. For a library where the value proposition is correctness under realistic production conditions — single-flight that actually works, locks that don't expire under slow loaders, generation counters that survive failover — the dependency cost is worth the primitive quality.

5. Failure modes are the actual product

What distinguishes a production cache from a tutorial implementation is what happens when Redis goes wrong. Slow, unreachable, partitioned, replicating, failing over — your cache library either has answers for these scenarios or it makes your application's outages worse.

The library wraps every L2 operation in a Resilience4j circuit breaker. When Redis becomes slow or unreachable, the breaker opens after a configurable threshold and stops sending requests for a configurable cooldown period. During the cooldown, reads degrade to L1-only (hits return cached values, misses invoke the loader and populate only L1) and writes update L1 only.

private Object readFromL2(Object key) {
    try {
        return breaker.executeSupplier(() ->
            l2GetLatency.recordCallable(() -> bucket(key).get()));
    } catch (CallNotPermittedException e) {
        l2BreakerOpen.increment();
        return null;        // breaker open → degrade to local-only
    } catch (Exception e) {
        l2Failures.increment();
        return null;        // single L2 failure → treat as miss
    }
}

The breaker configuration the library defaults to:

CircuitBreakerConfig.custom()
    .slidingWindowSize(20)
    .minimumNumberOfCalls(10)
    .failureRateThreshold(50.0f)
    .slowCallDurationThreshold(Duration.ofMillis(500))
    .slowCallRateThreshold(80.0f)
    .waitDurationInOpenState(Duration.ofSeconds(30))
    .permittedNumberOfCallsInHalfOpenState(3)
    .build();

A few things in this configuration are worth defending explicitly:

slowCallDurationThreshold = 500ms. Healthy Redis on a LAN responds in single-digit milliseconds. 500ms means something is wrong — connection pool exhausted, network degraded, Redis itself thrashing. Treating slow calls as failures is the key insight: a Redis that responds in 30 seconds is operationally identical to a dead Redis, and waiting 30 seconds per call to learn that is the failure mode the breaker exists to prevent.

waitDurationInOpenState = 30s. This is how long the application serves from L1 only before probing Redis again. Too short and you hammer a struggling Redis; too long and a transient outage causes prolonged L2-disabled mode.

minimumNumberOfCalls = 10. Don't open the breaker on the first failure during low-traffic periods. The minimum-calls guard prevents a single failed health check at 3am from tripping production into degraded mode for 30 seconds.

The library's failure-mode commitments are documented in the README and verified by @SpringBootTest integration tests against real Redis instances in single, cluster, and sentinel topologies. Senior engineers evaluating a cache library read this section first; getting it right is what signals the library is production-aware.

There's also a subtle ordering detail in the evict path worth highlighting. The natural-feeling order — local evict, then L2 delete, then publish invalidation — has a failure mode: if the L2 delete fails, this node is correct but L2 still has the value. The publish goes out anyway, other nodes evict their L1, then they read from L2 and re-populate L1 with stale data. The library reorders to L2 delete first, then publish only if the delete succeeded, then local evict last. If the L2 delete fails, no publish happens — the canonical state hasn't changed, so other nodes don't need to be told anything.

6. Polymorphic deserialization, locked down

Caching JPA entities (or any polymorphic Java types) as JSON requires Jackson's activateDefaultTyping so values deserialize back to their concrete types rather than LinkedHashMap. A List<Animal> containing Dog and Cat round-trips correctly through Redis only if the class names are embedded in the JSON.

The catch: unrestricted polymorphic deserialization is a known attack vector. CVE-2017-7525 and a long tail of related Jackson CVEs all exploit applications that deserialize attacker-controlled JSON without restricting allowed types. The fix that was added to Jackson is BasicPolymorphicTypeValidator, which lets you constrain which base types can be deserialized.

Cache contents are not typically attacker-controlled, so the threat model is narrower than a public web API. But it's not zero — any vector that lets an attacker write to your Redis (a misconfigured Redis exposed to the internet, a compromised application instance, a multi-tenant Redis with insufficient ACLs) becomes a deserialization vulnerability.

The library narrows the validator via configuration and refuses to start if the configuration is missing:

cache:
  allowed-packages:
    - com.example.domain.
    - com.example.dto.

Only types whose fully-qualified name starts with one of these prefixes can be deserialized. java.util., java.time., and java.lang. are always allowed (collections, dates, primitive wrappers).

Two design choices around this configuration are worth knowing:

The trailing dot is required. Jackson's allowIfBaseType(prefix) does a startsWith match. Without a trailing dot, com.example would also accept com.examplerogue.AnyClass — a known footgun. The library validates at startup that each entry ends with . or is a fully qualified class name, and rejects entries that match neither with a clear error message.

The library fails to start without configuration. Earlier versions logged a WARN and proceeded with a permissive validator; this is a deliberate change. Production safety must not depend on operators reading WARN logs. The startup failure is loud enough that operators see it immediately and configure the property correctly. The error message names the property, references CVE-2017-7525, and lists three remediation options (configure the allowlist, switch to Kryo with explicit class registration, supply a custom RedissonClient bean).

The same fail-closed posture applies to the Kryo codec. Kryo's setRegistrationRequired(false) mode has the same threat model as Jackson default-typing. The library forces setRegistrationRequired(true) and requires cache.kryo.registered-classes to enumerate the allowed types.

7. Three tiers, because not every cache has the same shape

In real applications, different caches need different shapes:

Per-node rate limiters. A Caffeine-backed counter that intentionally doesn't synchronize across instances. Each node has its own rate limit; that's the design.
Sessions. Shared across nodes, no L1 (you don't want stale sessions). Fits a Redis-only model.
Product catalogs. Read-heavy, can tolerate ~100ms staleness, want L1 for latency. Fits a near-cache model.

The library exposes three tiers, selectable per cache name in YAML:

cache:
  caches:
    products:
      tier: NEAR_CACHE        # L1 + L2, default
    user-sessions:
      tier: DISTRIBUTED_ONLY  # L2 only
    request-rate-limits:
      tier: LOCAL_ONLY        # L1 only

Application code is identical regardless of tier — @Cacheable("products") and @Cacheable("user-sessions") look the same. The tier is a deployment-time decision in YAML. This is the right place for it because the same business logic might run with a near-cache in production (with Redis available) and a local-only cache in tests (no Redis), and switching shouldn't require code changes.

Each tier is implemented as a class implementing Spring's Cache interface, dispatched by a custom CacheManager:

@Override
protected Cache getMissingCache(String name) {
    CacheProperties.CacheSpec spec = properties.specFor(name);
    return switch (spec.tier()) {
        case LOCAL_ONLY ->
            new LocalOnlyCache(buildCaffeine(name, spec), meterRegistry);
        case DISTRIBUTED_ONLY ->
            new DistributedOnlyCache(name, spec, redisson, breaker, meterRegistry);
        case NEAR_CACHE ->
            new NearCache(buildCaffeine(name, spec), spec, redisson,
                          dispatcher, breaker, nodeId, meterRegistry);
    };
}

Each tier has its own concerns. DistributedOnlyCache has no L1 to lean on for single-flight, so it uses a ConcurrentMap<String, CompletableFuture<Object>> for the local layer — concurrent threads on a cold key collapse onto one future, the first runs the load, others wait. LocalOnlyCache is the simplest — a thin wrapper around Caffeine with metrics — and skips all of the breaker, generation, invalidation machinery.

A note on the integration shape: the library extends Spring's AbstractCacheManager and registers a single CacheManager bean. Spring's @EnableCaching automatically wires this through its default SimpleCacheResolver. No custom CacheResolver is needed; @Cacheable annotations work without modification.

8. Production Redis is rarely a single server

Real production deployments use one of three Redis topologies, and a library that targets production needs to support all three:

Single server. Development, staging, small production. Default.

Redis Cluster. Sharded across multiple masters. Used when the dataset doesn't fit on one node. AWS ElastiCache Cluster Mode Enabled, Azure Cache Premium with clustering, GCP Memorystore for Redis Cluster — all use this protocol.

Redis Sentinel. Primary-with-replicas plus a Sentinel quorum that orchestrates failover. Used when you need HA but don't need horizontal sharding.

The library's connection mode is selected via YAML and validated at startup:

# Cluster mode
cache:
  server:
    mode: CLUSTER
    addresses:
      - redis://node1:6379
      - redis://node2:6379
      - redis://node3:6379

# Sentinel mode
cache:
  server:
    mode: SENTINEL
    master-name: mymaster
    addresses:
      - redis://sentinel1:26379
      - redis://sentinel2:26379
      - redis://sentinel3:26379

Misconfiguration fails fast with a mode-aware error message — not at first cache operation. This matters because a cache that "works" but routes to the wrong topology produces silent staleness, which is much worse than a clear startup failure.

There's a subtle but consequential detail in the cluster and sentinel modes that's worth highlighting: Redisson's default ReadMode is SLAVE. In cluster and sentinel deployments, this routes reads to replicas. For a near-cache library that publishes invalidation messages and expects read-your-writes coherence across nodes, replica reads break the contract — node A writes, the invalidation reaches node B, B drops its L1, B reads from L2, but L2 might still be a replica that hasn't caught up.

The library overrides this default to MASTER for both cluster and sentinel:

case CLUSTER -> {
    var cluster = config.useClusterServers()
        .setReadMode(ReadMode.MASTER)
        .setPassword(server.password())
        .setScanInterval(2000);
    server.addresses().forEach(cluster::addNodeAddress);
}
case SENTINEL -> {
    var sentinel = config.useSentinelServers()
        .setReadMode(ReadMode.MASTER)
        .setMasterName(server.masterName())
        .setPassword(server.password());
    server.addresses().forEach(sentinel::addSentinelAddress);
}

This is the kind of configuration that's easy to overlook and silently wrong if you do. The library defaults to correctness; applications that want replica reads (accepting eventual consistency for higher read throughput) can override the RedissonClient bean. Most Redis-backed cache libraries don't address this — their tests are single-server, the default replica routing is never exercised, and users in production hit subtle staleness that gets blamed on "eventual consistency in distributed systems" rather than a misconfiguration.

How to use it

The integration is the same shape as the original article. Add the dependency:

<dependency>
    <groupId>io.github.nwwarm</groupId>
    <artifactId>hybrid-cache-spring-boot-starter</artifactId>
    <version>0.2.0</version>
</dependency>

Configure caches in application.yml:

cache:
  server:
    address: redis://localhost:6379
  allowed-packages:
    - com.example.domain.
  caches:
    products:
      tier: NEAR_CACHE
      ttl: 6h
      maximum-size: 50000

Use @Cacheable as normal:

@Cacheable("products")
public Product findById(Long id) { ... }

@Cacheable(cacheNames = "products", sync = true)  // single-flight protection
public Product findExpensive(Long id) { ... }

@CacheEvict("products")
public void invalidate(Long id) { ... }

@CacheEvict(cacheNames = "products", allEntries = true)
public void reload() { ... }

That's the full integration. The library plugs in below Spring's cache abstraction; the CacheManager is wired automatically. No annotation changes, no aspect configuration, no custom interceptors. Existing @Cacheable annotations across an existing application work unchanged.

Operational concerns

The library exposes per-cache Micrometer metrics (cache.gets, cache.l2.gets, cache.l2.get.latency, cache.l2.failures, cache.l2.breaker.open) and the Resilience4j breaker state gauge. The README includes example Prometheus alert rules for the three signals worth monitoring: breaker open for sustained periods, L1 hit rate collapse, and L2 latency p99 spikes.

A Spring Boot Actuator HealthIndicator reports Redis connectivity, breaker state, and per-cache hit rates — usable directly for Kubernetes readiness probes. The indicator is non-blocking: Redis ping has a short timeout to avoid blocking /actuator/health requests during incidents.

The README's full failure-behavior table covers fourteen scenarios: Redis becoming unreachable, slow, recovering, cluster shard failover, cluster slot migration, sentinel primary failover, sentinel quorum loss, network partitions, L2 delete failure during evict, and several others. Each row corresponds to specific code and is verified by @SpringBootTest integration tests against Testcontainers Redis instances.

What's next

The library is at 0.2.0. The roadmap to 1.0.0 includes:

Per-cache circuit breakers. Currently a single global breaker fronts all caches; a per-cache breaker registry isolates failures to the specific cache experiencing them.
Loader stampede semaphore. Per-cache configurable limit on concurrent loader invocations. Protects the source even when Redis is down and cross-node coordination is unavailable.
TTL jitter on L1 entries to prevent coordinated expiration spikes.
clearImmediate() for caches needing eager memory reclaim instead of generation-counter clear.
Stale-while-revalidate refresh strategy for hot keys that should never block on synchronous reload.
Reactive loader support for applications using Mono/Flux return types.

None of those are required for production use today; all of them are real production concerns to address before claiming API stability at 1.0.

If you're considering adopting it, the things I'd suggest checking against your environment:

Is your read-to-write ratio high enough that L1 caching pays off (rough rule: at least 10:1)?
Are your cache values small enough that Redis serialization isn't the bottleneck (rough rule: under ~100 KB per value)?
Does your application tolerate ~10ms cross-node staleness on writes? (If you need stricter, the DISTRIBUTED_ONLY tier is the right fit, not a near-cache.)
Are you on Redis 6+ for the cluster and sentinel features to work cleanly?

If those check out, give it a try. The library is on Maven Central, the source is on GitHub, the README walks through every configuration option, and the failure scenarios are all documented and tested. Feedback and issues are welcome.

The original article showed the pattern. This one shows the production-shaped version. I hope it's useful.

Library: github.com/nwwarm/spring-redis-hybrid-cache
Maven Central: io.github.nwwarm:hybrid-cache-spring-boot-starter
Original article: Hybrid Cache Strategy in Spring Boot