Filling a maintainer's "Help needed": shipping a Next.js 16 Redis cache handler
Next.js 16 split caching into two distinct handler interfaces:
-
cacheHandler(singular) — Pages Router ISR, on-demand revalidation -
cacheHandlers(plural) — the new'use cache'directive,cacheComponents: true
The most popular OSS Redis handler today is @fortedigital/nextjs-cache-handler@3.2.0. It declares peerDependencies.next: ">=16.1.5". But its README marks the entire plural-API column as ❌:
cacheHandlers config (plural) ❌ Not yet supported - Help needed
'use cache' directive ❌ Not yet supported - Help needed
'use cache: remote' directive ❌ Not yet supported - Help needed
'use cache: private' directive ❌ Not yet supported - Help needed
cacheComponents ❌ Not yet supported - Help needed
The community attempt to fix this — PR #207 — has been stalled for three months on a PHASE_PRODUCTION_BUILD regression that the maintainer rejected. The maintainer also said in Issue #152: "Next.js does not care about any other cloud or cluster environment than Vercel" — a candid acknowledgement that fortedigital's roadmap may not include this any time soon.
I had a multi-instance Next.js 16 deployment running on AWS ECS Fargate that needed all of this working today. So I built a separate small package focused on filling those gaps:
📦 @leejpsd/nextjs-cache-handler — currently 0.2.0, MIT licensed.
This post is the technical writeup — what it does, why it exists, the trap that almost shipped silently, and what live-traffic dogfood actually verified.
What it actually does
If you have a Next.js 16 app deployed across multiple containers (ECS task / Kubernetes pod / Fly.io machine), the default in-memory cache fragments per-instance. Two tasks behind one ALB will independently evaluate 'use cache' functions, write into their own local LRU, and never see each other's writes. revalidateTag('posts') only invalidates the task that received the call.
The fix Next.js documents is "register a custom cache handler that writes to a shared store". The interface is well-defined; the actual implementation has more landmines than the docs imply.
This package implements both interfaces in one wrapper, with a few production-driven defaults that the upstream OSS landscape currently doesn't cover.
// next.config.ts
const nextConfig = {
cacheComponents: true,
cacheHandler: require.resolve("./cache-incremental.cjs"),
cacheHandlers: { default: require.resolve("./cache-components.cjs") },
};
// cache-components.cjs
const { createCacheComponentsHandler } = require("@leejpsd/nextjs-cache-handler/cache-components");
module.exports = createCacheComponentsHandler({
client: { type: "redis", url: process.env.REDIS_URL },
buildNamespace: process.env.DEPLOYMENT_VERSION, // auto deploy isolation
abortTimeoutMs: 1500,
staleWhileRevalidate: true,
singleFlight: true, // optional, opt-in stampede protection (v0.2)
});
That's it. 'use cache', revalidateTag, updateTag, cacheLife all work. The library handles the build-time vs runtime split, the Lua-atomic tag updates, and the deploy-boundary key namespacing.
Compatibility matrix
| Feature | this | @fortedigital 3.2.0 | nextjs-turbo-redis-cache 1.13 |
|---|---|---|---|
cacheHandlers config (plural) |
✅ | ❌ Help needed | ✅ since 1.11 |
'use cache' directive |
✅ | ❌ Help needed | ✅ since 1.11 |
'use cache: remote' |
✅ | ❌ | partial |
cacheComponents: true |
✅ Production-validated | ❌ | ✅ |
Build-phase skip (PHASE_PRODUCTION_BUILD) |
✅ default-on | ✅ (singular only) | ✅ |
| Auto deploy isolation | ✅ BUILD_NAMESPACE env-resolved |
manual | ✅ BUILD_ID since 1.13 |
| Lua-atomic SET+tag | ✅ Lua scripts | partial (MULTI) | partial |
| Single-flight refresh lock | ✅ opt-in (v0.2) | ❌ | ❌ |
| AbortSignal timeout | ✅ per-op | ✅ Proxy-wrapped | ❌ |
| OpenTelemetry hook | ✅ onMetric (v0.2) |
❌ | ❌ |
| Integration tests vs real Redis | ✅ 21 scenarios (v0.2) | ✅ | partial |
| Live-traffic dogfood report | ✅ public 24h soak | not published | not published |
(Verified 2026-05-10. Both upstream packages move quickly; please check their READMEs for the latest state.)
The trap that almost shipped silently
The most useful artifact in this whole exercise wasn't the handler implementation — it was a single landmine I tripped during dogfood deployment.
Setup: an env-var toggle in next.config.ts that flips between the in-tree handler (existing implementation) and the new library, so I could ship the library to staging behind a one-flag rollback.
// next.config.ts (the buggy version)
const useLibrary = process.env.USE_LIBRARY_HANDLER === "true";
const path = useLibrary
? "./lib-cache-components.cjs"
: "./redis-handler.cjs";
const nextConfig = {
cacheHandlers: { default: require.resolve(path) },
// ...
};
Looks fine, right? Toggle flag, swap path, done.
I deployed this. CloudWatch confirmed USE_LIBRARY_HANDLER=true was set on the ECS task. Cache state inspection showed entries being written. But the cache key shapes were wrong — they had no BUILD_NAMESPACE prefix, which is the library's signature feature.
I added console.log("loaded") to the library wrapper. Re-deployed. Searched CloudWatch.
0 results.
The library wrapper was never being required at runtime. Despite:
-
USE_LIBRARY_HANDLER=truecorrectly set - The deploy commit hash showing the latest code
- The library being installed in
node_modules - The
next.config.tstoggle logic being correct
What actually happened
next.config.ts is evaluated at build time. Specifically, require.resolve(...) resolves the absolute file path once during the Docker build, then bakes that resolved path into the standalone server bundle.
In the Docker build environment, USE_LIBRARY_HANDLER was not set. So:
build time:
process.env.USE_LIBRARY_HANDLER === undefined
→ useLibrary === false
→ path = "./redis-handler.cjs"
→ require.resolve("./redis-handler.cjs") = "/abs/path/redis-handler.cjs"
→ that absolute path is what Next.js bakes into the server bundle
runtime:
process.env.USE_LIBRARY_HANDLER === "true" // (irrelevant — already baked)
→ Next.js loads /abs/path/redis-handler.cjs
→ the library is NEVER required
The runtime env var was completely ignored.
The fix: a request-time router module
Move the env check from next.config.ts into a dedicated module that's loaded at request time:
// cache-components-router.cjs
"use strict";
const useLibrary = process.env.USE_LIBRARY_HANDLER === "true";
module.exports = useLibrary
? require("./lib-cache-components.cjs")
: require("./redis-handler.cjs");
Now next.config.ts always points at the same router file. The router reads the env var when it's actually loaded — which is when a request comes in, with the runtime environment fully populated.
// next.config.ts (fixed)
cacheHandlers: {
default: require.resolve("./cache-components-router.cjs"),
},
Plus an outputFileTracingIncludes so Next.js's standalone build copies both backend handler files (the library AND the in-tree fallback) into .next/standalone/:
outputFileTracingIncludes: {
"/**/*": [
"./cache-components-router.cjs",
"./incremental-router.cjs",
"./lib-cache-components.cjs",
"./lib-incremental-cache-handler.cjs",
"./redis-handler.cjs",
"./incremental-cache-handler.js",
"./node_modules/@leejpsd/nextjs-cache-handler/**/*",
],
},
After this, the library activated correctly. CloudWatch logs showed the wrapper being loaded. Cache keys carried the BUILD_NAMESPACE prefix.
Why this matters
If you're writing a Next.js cache handler — or any module loaded by next.config.ts — the build-time vs runtime trap is silent in the worst possible way: no errors, no warnings, just the wrong code path at runtime. Even Next.js's own docs don't call it out clearly.
The full writeup with diagrams is in docs/build-phase.md. It's also exactly the trap that the PR #207 maintainer review was pointing at — and which has now been resolved cleanly in this package's shouldUseRedis() helper.
What live-traffic dogfood verified
Before promoting 0.1.0-rc.1 to stable, I deployed it behind the env-var toggle described above and let production traffic flow through it on AWS ECS Fargate (multi-instance, ElastiCache Redis).
Snapshot from the validation window:
$ curl /api/cache-debug | jq '.cacheState'
{
"entryKeys": 2,
"tagKeys": 2,
"tagExpirationKeys": 1,
"incrementalEntryKeys": 9,
"incrementalTagKeys": 1,
"sample": "next-cache:entry:8d5a4f71c4cc:[\"build-...\"]"
}
$ curl /api/health | jq '.checks.redis'
{
"ok": true,
"latencyMs": 2,
"reason": null
}
Two things to note:
-
Cache key shapes carry the
BUILD_NAMESPACEprefix (8d5a4f71c4ccis the deployment SHA). If the in-tree handler were active, keys would benext-cache:entry:["build-..."]with no namespace segment. That single character difference is the deployment-isolation guarantee in action. -
Redis ping at 2ms — well within the 1500ms
abortTimeoutMsbudget. No timeout events recorded during the validation window.
The signals were stable enough to promote to 0.1.0.
v0.2: three differentiators
A pre-publish self-audit on 0.1.0 flagged three gaps the matrix didn't yet cover. v0.2 closes them — each one fills an area that no other Next.js Redis handler currently has:
1. Single-flight refresh lock
Stale entries in the SWR window are served instantly. With many instances all crossing the revalidate boundary at the same moment, each one independently triggers its own background refresh — N parallel re-renders for the same key, all hitting your origin once.
singleFlight: true adds an opt-in Redis lock at the SWR boundary. The first instance to acquire it becomes the leader and runs the refresh; the rest become followers and keep serving the same stale entry. Lock acquisition uses a Lua-atomic SETNX-style script with a 10-second TTL (configurable):
-- refresh-tag-lock.lua
if redis.call('GET', KEYS[1]) then return 0 end
redis.call('SET', KEYS[1], ARGV[1], 'EX', tonumber(ARGV[2]))
return 1
Two new MetricEvent types appear on the onMetric hook so operators can verify leadership balance across the fleet:
| event | meaning |
|---|---|
cache.stale.refresh.leader |
this instance just acquired the lock |
cache.stale.refresh.follower |
another instance holds it; we serve stale |
If lock acquisition fails (Redis hiccup), the handler defaults to the follower path. The stale entry is always served, never dropped. The lock is an optimization, not a correctness primitive.
2. OpenTelemetry reference adapter
The library deliberately ships zero observability dependencies — the onMetric(event) hook gives strictly-typed events you wire to whichever stack you already run.
examples/opentelemetry/ is a copy-paste reference wrapper that exposes a counter (nextjs_cache.events_total) and a histogram (nextjs_cache.op_latency_ms), both with bounded cardinality (no cache keys or tag names emitted as attributes).
Three suggested dashboards in the example README: hit-rate over time, single-flight leadership distribution, op latency p50/p95/p99.
3. Integration tests against real Redis
72 unit tests with a MockRedisClient give fast, hermetic coverage of the spec. They don't catch:
- redis@5 / ioredis method shape changes between minor versions
- Lua EVAL/EVALSHA semantics on a real server
- Cursor-based scanIterator chunk behavior (the
redis@4 → redis@5upgrade silently broke scanning in the reference deployment, surfacing only under live traffic) - TTL/EX behavior under real Redis time
v0.2 adds 21 integration scenarios that bring up Redis 7 in docker-compose and run the full test grid against both redis@5 and ioredis adapters. Same scenarios, swapped underlying client. CI runs them on every PR via a service container:
# .github/workflows/ci.yml
integration:
services:
redis:
image: redis:7-alpine
ports: [6390:6379]
steps:
- run: npm run test:integration
env:
INTEGRATION_REDIS_URL: redis://127.0.0.1:6390
The hardest bug they caught was during initial setup: a vitest transform hook combined with assetsInclude was running twice on .lua files, emitting export default "export default \"...\"", which Redis rejected with '=' expected near 'default'. A single load hook (no transform) fixed it.
Honest limitations (v0.2)
- The dogfood window is a starting point, not a completion criterion. Memory leaks, timer drift, and edge cases that only surface after extended uptime are not yet covered. Patch releases will accumulate live time as the package ages.
-
Redis Cluster is implemented but not load-tested at scale. The
hashTag: trueflag routes multi-key Lua scripts to the same slot, but I haven't run a real-world cluster benchmark. v0.3 milestone. -
Vercel KV / Upstash adapters ship in v0.3. Both work today via the standard
redis@5adapter against their Redis-compatible endpoints, but native adapters with edge-runtime support are scoped for v0.3. -
Provenance attestation ships from the GitHub Actions OIDC publish path (now wired up in
release.yml). The first stable was published from a local machine without provenance; v0.2.x tarballs published via the workflow will carry the verified attestation.
These are spelled out in the README's Roadmap section; the goal is that "what's not yet supported" is as visible as "what is".
What I'd repeat / what I'd change
Repeat
-
Dogfood before promotion. Running the rc against my own production traffic surfaced the build-time-vs-runtime trap before it could embarrass me publicly. The dogfood plan (
docs/staging-dogfood.md) is the single most useful piece of project hygiene I added. -
Frozen spec snapshot in repo. I copied Next.js's official
cacheHandlersspec verbatim intodocs/next16-spec.mdbefore writing any handler code. CI prints its sha256 on every run so spec drift gets noticed before it bites. - Compatibility matrix with timestamp. Both upstream packages I compare against are evolving. Putting "verified 2026-05-10" on the matrix is the difference between an honest snapshot and a future lie.
Change
-
Should have started with
outputFileTracingIncludesfrom day one. I burned half a day on the build-phase trap because Next.js'soutput: standalonequietly strips files that aren't transitively required by the build-time code path. If you're building anything that gets loaded viarequire.resolve()fromnext.config.ts, pin it explicitly. - Should have shipped Redis Cluster load test results before claiming "Redis Cluster ✅" in the matrix. The current matrix line says "unit-tested, not yet load-tested at scale" — honest, but only because I caught the gap during the pre-publish self-audit. Future me writes the load test first.
Try it
npm install @leejpsd/nextjs-cache-handler redis
# or
npm install @leejpsd/nextjs-cache-handler ioredis
Wiring is two CommonJS wrapper files plus a next.config.ts toggle. The full quick-start is in the README.
Most useful entry points if you're considering it:
- README compatibility matrix — verify against your environment
-
docs/build-phase.md— the build-time vs runtime trap deep dive -
docs/architecture.md— read paths, write paths, single-flight state machine -
docs/staging-dogfood.md— the verification checklist before promoting your own dogfood to stable
Issues, PRs, and feedback — especially on Redis Cluster behavior under real load — are all welcome at github.com/leejpsd/nextjs-cache-handler.
Disclaimer: I'm not affiliated with @fortedigital, @neshca, or nextjs-turbo-redis-cache. The compatibility matrix is verified against their public READMEs as of 2026-05-10; all three projects move quickly and snapshots can go stale. The maintainers of all three deserve credit — the patterns I built on came directly from reading their source.
Top comments (0)