Every Node.js backend hits the same wall eventually.
Your Redis cache is working, latency is acceptable, and then traffic doubles. Suddenly the Redis round-trip that felt like nothing at 200 req/s starts dominating your p95 at 2,000 req/s. You add an in-process memory cache on top, wire up some invalidation logic by hand, and three months later you are maintaining a fragile two-layer system with no stampede protection and no cross-instance consistency.
layercache is a TypeScript-first library that solves this problem once, cleanly. It stacks memory, Redis, and disk behind a single unified API and handles the hard parts — stampede prevention, cross-instance invalidation, graceful degradation under Redis failures — out of the box.
This post walks through what it does and what the benchmark numbers actually look like on a real Redis backend.
The Core Idea
your app ──▶ L1 Memory ~0.006 ms (per-process, sub-millisecond)
│
L2 Redis ~0.2 ms (shared across instances)
│
L3 Disk ~2 ms (optional, persistent)
│
Fetcher runs once (even under high concurrency)
On a cache hit the fastest available layer responds and the result backfills any warmer layers automatically. On a miss the fetcher runs exactly once, no matter how many concurrent requests arrived at the same time.
That last part — the single-flight guarantee — is where most hand-rolled hybrid caches fall apart.
Getting Started
npm install layercache
Memory only (no Redis needed):
import { CacheStack, MemoryLayer } from 'layercache'
const cache = new CacheStack([
new MemoryLayer({ ttl: 60, maxSize: 1_000 })
])
const user = await cache.get('user:123', () => db.findUser(123))
Memory + Redis layered setup:
import { CacheStack, MemoryLayer, RedisLayer } from 'layercache'
import Redis from 'ioredis'
const cache = new CacheStack([
new MemoryLayer({ ttl: 60, maxSize: 2_000 }),
new RedisLayer({ client: new Redis(), ttl: 300, prefix: 'myapp:' })
])
const user = await cache.get('user:123', () => db.findUser(123))
The API is the same regardless of how many layers you add. Your application code doesn't change when you add or remove a layer.
Benchmark Results
I ran layercache v1.2.9 against a real Redis 7 backend (Docker, not a mock) on Linux. Here is what the numbers look like.
Warm Hit Latency
The most important number for a cache library is how fast the hit path is.
| Mode | Avg ms | P95 ms |
|---|---|---|
| No cache (origin) | 5.175 | 8.742 |
| Memory only | 0.009 | 0.014 |
| Memory + Redis | 0.005 | 0.006 |
Memory-only warm hits averaged 0.009ms. With a Redis layer added, the hot path still resolves from L1 memory and came in at 0.005ms — both are firmly sub-millisecond and effectively the same class of latency for production purposes.
Stampede Prevention
This is where the library earns its keep. 75 concurrent requests for the same missing key, repeated 5 times:
| Mode | Avg ms | Origin Executions |
|---|---|---|
| No cache | 409.5 | 375 |
| Memory only | 6.9 | 5 |
| Memory + Redis | 36.7 | 5 |
Without a cache, 75 × 5 = 375 origin calls. With layercache, the fetcher ran exactly 5 times — once per round, regardless of concurrency. The layered case is slower than memory-only because it pays Redis coordination costs, but the correctness guarantee is the same.
HTTP Throughput
Under sustained load with autocannon (40 connections, 8 seconds):
| Route | Avg Latency | P97.5 | Req/s |
|---|---|---|---|
| No cache | 249 ms | 271 ms | 161 |
| Memory only | 1.82 ms | 4 ms | 16,705 |
| Memory + Redis | 1.74 ms | 4 ms | 17,184 |
Caching moved the service from 161 req/s to over 17,000 req/s — roughly a 100× improvement in throughput. Average latency dropped from 249ms to under 2ms. The memory-only and layered routes performed nearly identically in steady state because hot requests stay in L1 after warm-up.
What Happens When Redis Is Slow or Dead?
This is the question that separates a library you can actually run in production from one you can only trust in demos.
Slow Redis
I measured three scenarios with injected TCP latency:
| Redis Delay | L1 hot hit | L2 hit | Cold miss |
|---|---|---|---|
| 0ms | 0.407ms | 2.655ms | 12.259ms |
| 100ms | 0.119ms | 101.172ms | 504.167ms |
| 500ms | 0.196ms | 501.404ms | 2506.013ms |
The key insight: L1 hot hits stayed fast regardless of Redis latency. If a request can be served from in-process memory, slow Redis does not matter at all. The latency penalty only applies when a request needs to reach L2 or perform a cold miss.
Cold misses scaled hard with injected delay because the request paid both the Redis round-trip and the write-back path. If you have traffic patterns with many cold misses, a slow Redis will drag your tail latency even with gracefulDegradation enabled — the benchmark showed graceful and strict modes performing nearly identically under slow conditions.
Dead Redis
Under a fully paused Redis instance:
- Warm L1 hits: still worked — both strict and graceful modes served from memory normally
- Cold misses: timed out at 2000ms — both modes failed
This is important to understand. gracefulDegradation keeps warm traffic alive when Redis goes down. It does not create a fast fallback path for cold keys. New keys and expired keys that need a Redis write-back will stall until the timeout.
Operationally this means: if your L1 TTL is shorter than your expected Redis outage window, you will see degraded cold-miss behavior. Size your L1 TTLs with this in mind.
Queue Amplification Under Slow Redis
A follow-up benchmark asked: if Redis is slow and 500 concurrent requests pile up on L2-hit traffic, does latency stay bounded or blow up?
| Redis Delay | Concurrency 1 | Concurrency 500 | Amplification |
|---|---|---|---|
| 100ms | 100.8ms | 128.9ms | 1.28× |
| 500ms | 501.1ms | 515.8ms | 1.03× |
No runaway queue amplification. At 500 concurrent requests against a 500ms-latency Redis, wall-clock time only grew by about 15ms above the single-request baseline. The library appears to batch or overlap L2 requests within a shared Redis client rather than serializing them, which keeps the curve nearly flat.
Memory Pressure and Eviction
With maxSize: 25 and 180 unique keys inserted (each with a 256KB payload), then revisiting the earliest 25 keys:
| Evictions | L1 Retained | Revisit Avg | Origin Fetches |
|---|---|---|---|
| 180 | 25 | 1.332ms | 0 |
Eviction was predictable. L1 held exactly maxSize entries after the fill phase. When evicted keys were revisited, they reloaded from Redis L2 rather than hitting the origin — zero origin fetches despite L1 having evicted everything. GC activity was measurable (36 events, 78ms total) but no stop-the-world pauses appeared at this payload size.
Multi-Instance and Cross-Process Features
Single-process benchmarks only tell part of the story. layercache ships with primitives for distributed deployments:
import {
CacheStack, MemoryLayer, RedisLayer,
RedisInvalidationBus,
RedisSingleFlightCoordinator
} from 'layercache'
const redis = new Redis()
const cache = new CacheStack(
[
new MemoryLayer({ ttl: 60, maxSize: 10_000 }),
new RedisLayer({ client: redis, ttl: 3600 })
],
{
invalidationBus: new RedisInvalidationBus({
publisher: redis,
subscriber: new Redis() // separate connection for pub/sub
}),
singleFlightCoordinator: new RedisSingleFlightCoordinator({ client: redis }),
gracefulDegradation: { retryAfterMs: 10_000 }
}
)
The edge benchmark verified both of these features work:
- Cross-instance invalidation: Instance B observed the updated value after Instance A invalidated and repopulated the key.
- Distributed single-flight: 60 concurrent requests split across two instances triggered exactly 1 origin fetch total.
TTL expiry stampedes are also deduplicated. In the benchmark, 40 concurrent requests hitting the same expired key across 5 rounds produced only 5 origin executions — one per expiry round.
Framework Integrations
layercache ships middleware and adapters for the major Node.js frameworks:
Express:
app.get('/api/users', createExpressCacheMiddleware(cache, {
ttl: 30,
tags: ['users'],
keyResolver: (req) => `users:${req.url}`
}), handler)
NestJS:
@Module({
imports: [CacheStackModule.forRoot({
layers: [
new MemoryLayer({ ttl: 20 }),
new RedisLayer({ client: redis, ttl: 300 })
]
})]
})
export class AppModule {}
Fastify, Hono, tRPC, GraphQL resolver wrappers, and Next.js App Router are also covered.
Payload Size Matters for Redis Reads
One benchmark result worth highlighting explicitly: payload size has almost no effect on L1 memory hits, but has a large effect when Redis is on the read path.
| Mode | 1KB avg | 1MB avg |
|---|---|---|
| Memory hit | 0.012ms | 0.018ms |
| Redis hit | 0.200ms | 4.170ms |
If you are storing large objects — full page renders, heavy API responses — and relying on Redis as the primary read path without a warm L1 in front, you will feel the serialization and network overhead. Keep large objects in L1 where possible, or enable compression at the Redis layer.
When to Use layercache
Good fit:
- Services handling repeated reads for the same keys under any meaningful concurrency
- Multi-instance deployments that need consistent cache state across processes
- Situations where Redis slowdowns or outages should degrade gracefully rather than cascade
- Teams that want observable caching with hits/misses/latency metrics without building the instrumentation themselves
Less relevant:
- Pure write-heavy workloads with no repeated reads
- Environments where an in-process memory cache is prohibited for compliance reasons
- Very simple single-key caches where a plain
Mapwith a TTL is already sufficient
Summary
| Scenario | Key number |
|---|---|
| Warm L1 hit latency | ~0.006ms |
| HTTP throughput gain (no cache → cached) | ~100× |
| Stampede dedup (75 concurrent, 5 rounds) | 375 fetches → 5 |
| Distributed single-flight (60 requests, 2 instances) | 60 fetches → 1 |
| Slow Redis impact on hot L1 traffic | None |
| Dead Redis impact on warm L1 traffic | None |
| Dead Redis impact on cold-miss traffic | Timeout |
The library makes a clear promise: stack your layers, wire up your fetcher, and it handles the coordination. The benchmarks back that promise up on a real backend.
Links:
Top comments (0)