DEV Community

날다람쥐
날다람쥐

Posted on

layercache: Stop Paying Redis Latency on Every Hot Read

Every Node.js backend hits the same wall eventually.

Your Redis cache is working, latency is acceptable, and then traffic doubles. Suddenly the Redis round-trip that felt like nothing at 200 req/s starts dominating your p95 at 2,000 req/s. You add an in-process memory cache on top, wire up some invalidation logic by hand, and three months later you are maintaining a fragile two-layer system with no stampede protection and no cross-instance consistency.

layercache is a TypeScript-first library that solves this problem once, cleanly. It stacks memory, Redis, and disk behind a single unified API and handles the hard parts — stampede prevention, cross-instance invalidation, graceful degradation under Redis failures — out of the box.

This post walks through what it does and what the benchmark numbers actually look like on a real Redis backend.


The Core Idea

your app ──▶ L1 Memory   ~0.006 ms  (per-process, sub-millisecond)
                │
             L2 Redis    ~0.2 ms    (shared across instances)
                │
             L3 Disk     ~2 ms      (optional, persistent)
                │
             Fetcher     runs once  (even under high concurrency)
Enter fullscreen mode Exit fullscreen mode

On a cache hit the fastest available layer responds and the result backfills any warmer layers automatically. On a miss the fetcher runs exactly once, no matter how many concurrent requests arrived at the same time.

That last part — the single-flight guarantee — is where most hand-rolled hybrid caches fall apart.


Getting Started

npm install layercache
Enter fullscreen mode Exit fullscreen mode

Memory only (no Redis needed):

import { CacheStack, MemoryLayer } from 'layercache'

const cache = new CacheStack([
  new MemoryLayer({ ttl: 60, maxSize: 1_000 })
])

const user = await cache.get('user:123', () => db.findUser(123))
Enter fullscreen mode Exit fullscreen mode

Memory + Redis layered setup:

import { CacheStack, MemoryLayer, RedisLayer } from 'layercache'
import Redis from 'ioredis'

const cache = new CacheStack([
  new MemoryLayer({ ttl: 60, maxSize: 2_000 }),
  new RedisLayer({ client: new Redis(), ttl: 300, prefix: 'myapp:' })
])

const user = await cache.get('user:123', () => db.findUser(123))
Enter fullscreen mode Exit fullscreen mode

The API is the same regardless of how many layers you add. Your application code doesn't change when you add or remove a layer.


Benchmark Results

I ran layercache v1.2.9 against a real Redis 7 backend (Docker, not a mock) on Linux. Here is what the numbers look like.

Warm Hit Latency

The most important number for a cache library is how fast the hit path is.

Mode Avg ms P95 ms
No cache (origin) 5.175 8.742
Memory only 0.009 0.014
Memory + Redis 0.005 0.006

Memory-only warm hits averaged 0.009ms. With a Redis layer added, the hot path still resolves from L1 memory and came in at 0.005ms — both are firmly sub-millisecond and effectively the same class of latency for production purposes.

Stampede Prevention

This is where the library earns its keep. 75 concurrent requests for the same missing key, repeated 5 times:

Mode Avg ms Origin Executions
No cache 409.5 375
Memory only 6.9 5
Memory + Redis 36.7 5

Without a cache, 75 × 5 = 375 origin calls. With layercache, the fetcher ran exactly 5 times — once per round, regardless of concurrency. The layered case is slower than memory-only because it pays Redis coordination costs, but the correctness guarantee is the same.

HTTP Throughput

Under sustained load with autocannon (40 connections, 8 seconds):

Route Avg Latency P97.5 Req/s
No cache 249 ms 271 ms 161
Memory only 1.82 ms 4 ms 16,705
Memory + Redis 1.74 ms 4 ms 17,184

Caching moved the service from 161 req/s to over 17,000 req/s — roughly a 100× improvement in throughput. Average latency dropped from 249ms to under 2ms. The memory-only and layered routes performed nearly identically in steady state because hot requests stay in L1 after warm-up.


What Happens When Redis Is Slow or Dead?

This is the question that separates a library you can actually run in production from one you can only trust in demos.

Slow Redis

I measured three scenarios with injected TCP latency:

Redis Delay L1 hot hit L2 hit Cold miss
0ms 0.407ms 2.655ms 12.259ms
100ms 0.119ms 101.172ms 504.167ms
500ms 0.196ms 501.404ms 2506.013ms

The key insight: L1 hot hits stayed fast regardless of Redis latency. If a request can be served from in-process memory, slow Redis does not matter at all. The latency penalty only applies when a request needs to reach L2 or perform a cold miss.

Cold misses scaled hard with injected delay because the request paid both the Redis round-trip and the write-back path. If you have traffic patterns with many cold misses, a slow Redis will drag your tail latency even with gracefulDegradation enabled — the benchmark showed graceful and strict modes performing nearly identically under slow conditions.

Dead Redis

Under a fully paused Redis instance:

  • Warm L1 hits: still worked — both strict and graceful modes served from memory normally
  • Cold misses: timed out at 2000ms — both modes failed

This is important to understand. gracefulDegradation keeps warm traffic alive when Redis goes down. It does not create a fast fallback path for cold keys. New keys and expired keys that need a Redis write-back will stall until the timeout.

Operationally this means: if your L1 TTL is shorter than your expected Redis outage window, you will see degraded cold-miss behavior. Size your L1 TTLs with this in mind.


Queue Amplification Under Slow Redis

A follow-up benchmark asked: if Redis is slow and 500 concurrent requests pile up on L2-hit traffic, does latency stay bounded or blow up?

Redis Delay Concurrency 1 Concurrency 500 Amplification
100ms 100.8ms 128.9ms 1.28×
500ms 501.1ms 515.8ms 1.03×

No runaway queue amplification. At 500 concurrent requests against a 500ms-latency Redis, wall-clock time only grew by about 15ms above the single-request baseline. The library appears to batch or overlap L2 requests within a shared Redis client rather than serializing them, which keeps the curve nearly flat.


Memory Pressure and Eviction

With maxSize: 25 and 180 unique keys inserted (each with a 256KB payload), then revisiting the earliest 25 keys:

Evictions L1 Retained Revisit Avg Origin Fetches
180 25 1.332ms 0

Eviction was predictable. L1 held exactly maxSize entries after the fill phase. When evicted keys were revisited, they reloaded from Redis L2 rather than hitting the origin — zero origin fetches despite L1 having evicted everything. GC activity was measurable (36 events, 78ms total) but no stop-the-world pauses appeared at this payload size.


Multi-Instance and Cross-Process Features

Single-process benchmarks only tell part of the story. layercache ships with primitives for distributed deployments:

import {
  CacheStack, MemoryLayer, RedisLayer,
  RedisInvalidationBus,
  RedisSingleFlightCoordinator
} from 'layercache'

const redis = new Redis()

const cache = new CacheStack(
  [
    new MemoryLayer({ ttl: 60, maxSize: 10_000 }),
    new RedisLayer({ client: redis, ttl: 3600 })
  ],
  {
    invalidationBus: new RedisInvalidationBus({
      publisher: redis,
      subscriber: new Redis() // separate connection for pub/sub
    }),
    singleFlightCoordinator: new RedisSingleFlightCoordinator({ client: redis }),
    gracefulDegradation: { retryAfterMs: 10_000 }
  }
)
Enter fullscreen mode Exit fullscreen mode

The edge benchmark verified both of these features work:

  • Cross-instance invalidation: Instance B observed the updated value after Instance A invalidated and repopulated the key.
  • Distributed single-flight: 60 concurrent requests split across two instances triggered exactly 1 origin fetch total.

TTL expiry stampedes are also deduplicated. In the benchmark, 40 concurrent requests hitting the same expired key across 5 rounds produced only 5 origin executions — one per expiry round.


Framework Integrations

layercache ships middleware and adapters for the major Node.js frameworks:

Express:

app.get('/api/users', createExpressCacheMiddleware(cache, {
  ttl: 30,
  tags: ['users'],
  keyResolver: (req) => `users:${req.url}`
}), handler)
Enter fullscreen mode Exit fullscreen mode

NestJS:

@Module({
  imports: [CacheStackModule.forRoot({
    layers: [
      new MemoryLayer({ ttl: 20 }),
      new RedisLayer({ client: redis, ttl: 300 })
    ]
  })]
})
export class AppModule {}
Enter fullscreen mode Exit fullscreen mode

Fastify, Hono, tRPC, GraphQL resolver wrappers, and Next.js App Router are also covered.


Payload Size Matters for Redis Reads

One benchmark result worth highlighting explicitly: payload size has almost no effect on L1 memory hits, but has a large effect when Redis is on the read path.

Mode 1KB avg 1MB avg
Memory hit 0.012ms 0.018ms
Redis hit 0.200ms 4.170ms

If you are storing large objects — full page renders, heavy API responses — and relying on Redis as the primary read path without a warm L1 in front, you will feel the serialization and network overhead. Keep large objects in L1 where possible, or enable compression at the Redis layer.


When to Use layercache

Good fit:

  • Services handling repeated reads for the same keys under any meaningful concurrency
  • Multi-instance deployments that need consistent cache state across processes
  • Situations where Redis slowdowns or outages should degrade gracefully rather than cascade
  • Teams that want observable caching with hits/misses/latency metrics without building the instrumentation themselves

Less relevant:

  • Pure write-heavy workloads with no repeated reads
  • Environments where an in-process memory cache is prohibited for compliance reasons
  • Very simple single-key caches where a plain Map with a TTL is already sufficient

Summary

Scenario Key number
Warm L1 hit latency ~0.006ms
HTTP throughput gain (no cache → cached) ~100×
Stampede dedup (75 concurrent, 5 rounds) 375 fetches → 5
Distributed single-flight (60 requests, 2 instances) 60 fetches → 1
Slow Redis impact on hot L1 traffic None
Dead Redis impact on warm L1 traffic None
Dead Redis impact on cold-miss traffic Timeout

The library makes a clear promise: stack your layers, wire up your fetcher, and it handles the coordination. The benchmarks back that promise up on a real backend.


Links:

Top comments (0)