날다람쥐

Posted on Apr 9

layercache: Stop Paying Redis Latency on Every Hot Read

#redis #node #typescript #npm

Every Node.js backend hits the same wall eventually.

Your Redis cache is working, latency is acceptable, and then traffic doubles. Suddenly the Redis round-trip that felt like nothing at 200 req/s starts dominating your p95 at 2,000 req/s. You add an in-process memory cache on top, wire up some invalidation logic by hand, and three months later you are maintaining a fragile two-layer system with no stampede protection and no cross-instance consistency.

layercache is a TypeScript-first library that solves this problem once, cleanly. It stacks memory, Redis, and disk behind a single unified API and handles the hard parts — stampede prevention, cross-instance invalidation, graceful degradation under Redis failures — out of the box.

This post walks through what it does and what the benchmark numbers actually look like on a real Redis backend.

The Core Idea

your app ──▶ L1 Memory   ~0.006 ms  (per-process, sub-millisecond)
                │
             L2 Redis    ~0.2 ms    (shared across instances)
                │
             L3 Disk     ~2 ms      (optional, persistent)
                │
             Fetcher     runs once  (even under high concurrency)

On a cache hit the fastest available layer responds and the result backfills any warmer layers automatically. On a miss the fetcher runs exactly once, no matter how many concurrent requests arrived at the same time.

That last part — the single-flight guarantee — is where most hand-rolled hybrid caches fall apart.

Getting Started

npm install layercache

Memory only (no Redis needed):

import { CacheStack, MemoryLayer } from 'layercache'

const cache = new CacheStack([
  new MemoryLayer({ ttl: 60, maxSize: 1_000 })
])

const user = await cache.get('user:123', () => db.findUser(123))

Memory + Redis layered setup:

import { CacheStack, MemoryLayer, RedisLayer } from 'layercache'
import Redis from 'ioredis'

const cache = new CacheStack([
  new MemoryLayer({ ttl: 60, maxSize: 2_000 }),
  new RedisLayer({ client: new Redis(), ttl: 300, prefix: 'myapp:' })
])

const user = await cache.get('user:123', () => db.findUser(123))

The API is the same regardless of how many layers you add. Your application code doesn't change when you add or remove a layer.

Benchmark Results

I ran layercache v1.2.9 against a real Redis 7 backend (Docker, not a mock) on Linux. Here is what the numbers look like.

Warm Hit Latency

The most important number for a cache library is how fast the hit path is.

Mode	Avg ms	P95 ms
No cache (origin)	5.175	8.742
Memory only	0.009	0.014
Memory + Redis	0.005	0.006

Memory-only warm hits averaged 0.009ms. With a Redis layer added, the hot path still resolves from L1 memory and came in at 0.005ms — both are firmly sub-millisecond and effectively the same class of latency for production purposes.

Stampede Prevention

This is where the library earns its keep. 75 concurrent requests for the same missing key, repeated 5 times:

Mode	Avg ms	Origin Executions
No cache	409.5	375
Memory only	6.9	5
Memory + Redis	36.7	5

Without a cache, 75 × 5 = 375 origin calls. With layercache, the fetcher ran exactly 5 times — once per round, regardless of concurrency. The layered case is slower than memory-only because it pays Redis coordination costs, but the correctness guarantee is the same.

HTTP Throughput

Under sustained load with autocannon (40 connections, 8 seconds):

Route	Avg Latency	P97.5	Req/s
No cache	249 ms	271 ms	161
Memory only	1.82 ms	4 ms	16,705
Memory + Redis	1.74 ms	4 ms	17,184

Caching moved the service from 161 req/s to over 17,000 req/s — roughly a 100× improvement in throughput. Average latency dropped from 249ms to under 2ms. The memory-only and layered routes performed nearly identically in steady state because hot requests stay in L1 after warm-up.

What Happens When Redis Is Slow or Dead?

This is the question that separates a library you can actually run in production from one you can only trust in demos.

Slow Redis

I measured three scenarios with injected TCP latency:

Redis Delay	L1 hot hit	L2 hit	Cold miss
0ms	0.407ms	2.655ms	12.259ms
100ms	0.119ms	101.172ms	504.167ms
500ms	0.196ms	501.404ms	2506.013ms

The key insight: L1 hot hits stayed fast regardless of Redis latency. If a request can be served from in-process memory, slow Redis does not matter at all. The latency penalty only applies when a request needs to reach L2 or perform a cold miss.

Cold misses scaled hard with injected delay because the request paid both the Redis round-trip and the write-back path. If you have traffic patterns with many cold misses, a slow Redis will drag your tail latency even with gracefulDegradation enabled — the benchmark showed graceful and strict modes performing nearly identically under slow conditions.

Dead Redis

Under a fully paused Redis instance:

Warm L1 hits: still worked — both strict and graceful modes served from memory normally
Cold misses: timed out at 2000ms — both modes failed

This is important to understand. gracefulDegradation keeps warm traffic alive when Redis goes down. It does not create a fast fallback path for cold keys. New keys and expired keys that need a Redis write-back will stall until the timeout.

Operationally this means: if your L1 TTL is shorter than your expected Redis outage window, you will see degraded cold-miss behavior. Size your L1 TTLs with this in mind.

Queue Amplification Under Slow Redis

A follow-up benchmark asked: if Redis is slow and 500 concurrent requests pile up on L2-hit traffic, does latency stay bounded or blow up?

Redis Delay	Concurrency 1	Concurrency 500	Amplification
100ms	100.8ms	128.9ms	1.28×
500ms	501.1ms	515.8ms	1.03×

No runaway queue amplification. At 500 concurrent requests against a 500ms-latency Redis, wall-clock time only grew by about 15ms above the single-request baseline. The library appears to batch or overlap L2 requests within a shared Redis client rather than serializing them, which keeps the curve nearly flat.

Memory Pressure and Eviction

With maxSize: 25 and 180 unique keys inserted (each with a 256KB payload), then revisiting the earliest 25 keys:

Evictions	L1 Retained	Revisit Avg	Origin Fetches
180	25	1.332ms	0

Eviction was predictable. L1 held exactly maxSize entries after the fill phase. When evicted keys were revisited, they reloaded from Redis L2 rather than hitting the origin — zero origin fetches despite L1 having evicted everything. GC activity was measurable (36 events, 78ms total) but no stop-the-world pauses appeared at this payload size.

Multi-Instance and Cross-Process Features

Single-process benchmarks only tell part of the story. layercache ships with primitives for distributed deployments:

import {
  CacheStack, MemoryLayer, RedisLayer,
  RedisInvalidationBus,
  RedisSingleFlightCoordinator
} from 'layercache'

const redis = new Redis()

const cache = new CacheStack(
  [
    new MemoryLayer({ ttl: 60, maxSize: 10_000 }),
    new RedisLayer({ client: redis, ttl: 3600 })
  ],
  {
    invalidationBus: new RedisInvalidationBus({
      publisher: redis,
      subscriber: new Redis() // separate connection for pub/sub
    }),
    singleFlightCoordinator: new RedisSingleFlightCoordinator({ client: redis }),
    gracefulDegradation: { retryAfterMs: 10_000 }
  }
)

The edge benchmark verified both of these features work:

Cross-instance invalidation: Instance B observed the updated value after Instance A invalidated and repopulated the key.
Distributed single-flight: 60 concurrent requests split across two instances triggered exactly 1 origin fetch total.

TTL expiry stampedes are also deduplicated. In the benchmark, 40 concurrent requests hitting the same expired key across 5 rounds produced only 5 origin executions — one per expiry round.

Framework Integrations

layercache ships middleware and adapters for the major Node.js frameworks:

Express:

app.get('/api/users', createExpressCacheMiddleware(cache, {
  ttl: 30,
  tags: ['users'],
  keyResolver: (req) => `users:${req.url}`
}), handler)

NestJS:

@Module({
  imports: [CacheStackModule.forRoot({
    layers: [
      new MemoryLayer({ ttl: 20 }),
      new RedisLayer({ client: redis, ttl: 300 })
    ]
  })]
})
export class AppModule {}

Fastify, Hono, tRPC, GraphQL resolver wrappers, and Next.js App Router are also covered.

Payload Size Matters for Redis Reads

One benchmark result worth highlighting explicitly: payload size has almost no effect on L1 memory hits, but has a large effect when Redis is on the read path.

Mode	1KB avg	1MB avg
Memory hit	0.012ms	0.018ms
Redis hit	0.200ms	4.170ms

If you are storing large objects — full page renders, heavy API responses — and relying on Redis as the primary read path without a warm L1 in front, you will feel the serialization and network overhead. Keep large objects in L1 where possible, or enable compression at the Redis layer.

When to Use layercache

Good fit:

Services handling repeated reads for the same keys under any meaningful concurrency
Multi-instance deployments that need consistent cache state across processes
Situations where Redis slowdowns or outages should degrade gracefully rather than cascade
Teams that want observable caching with hits/misses/latency metrics without building the instrumentation themselves

Less relevant:

Pure write-heavy workloads with no repeated reads
Environments where an in-process memory cache is prohibited for compliance reasons
Very simple single-key caches where a plain Map with a TTL is already sufficient

Summary

Scenario	Key number
Warm L1 hit latency	~0.006ms
HTTP throughput gain (no cache → cached)	~100×
Stampede dedup (75 concurrent, 5 rounds)	375 fetches → 5
Distributed single-flight (60 requests, 2 instances)	60 fetches → 1
Slow Redis impact on hot L1 traffic	None
Dead Redis impact on warm L1 traffic	None
Dead Redis impact on cold-miss traffic	Timeout

The library makes a clear promise: stack your layers, wire up your fetcher, and it handles the coordination. The benchmarks back that promise up on a real backend.

Links:

DEV Community