You’ve just containerized your shiny Next.js application. You’ve set resource limits in Kubernetes (say, memory: 512Mi). Everything works fine locally.
But in production, after a few hours—or sometimes under the first real load—your pod dies.
kubectl describe pod shows the dreaded:
text
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
You check your code. No infinite loops. No obvious leaks. So what’s happening?
The short answer: Next.js is not a static binary. Under the hood, it caches aggressively, and Kubernetes enforces limits that Next.js was never designed to respect.
Let’s break down why this happens and how to fix it.
- The 3 Memory Hogs in Next.js Most people think Next.js is just React on the server. But in standalone mode, it runs three distinct subsystems, each with its own memory profile.
A) The Node.js Runtime (App Router & Pages Router)
Each request creates a render context.
React Server Components (RSC) payloads are kept in memory longer than you think.
Streaming responses hold buffers.
B) The Built-in Cache System
Next.js caches aggressively by default:
Data Cache: fetch() results with force-cache (infinite TTL by default).
Full Route Cache: Rendered page payloads (for static/dynamic segments).
Router Cache: Client-side (but that’s the browser).
On the server, the Data Cache and Full Route Cache live in the Node.js heap. With many unique pages or cache keys, memory grows without bound.
C) Incremental Static Regeneration (ISR)
ISR stores generated pages in memory (or on disk, but default to memory in many setups). Every revalidated page adds another version until garbage collected.
Result: In a Kubernetes pod with a 512Mi memory limit, your Next.js app may need 1Gi+ after a few hundred unique cache entries.
- Why K8s Makes It Worse Unlike a VM, Kubernetes doesn’t swap. It uses hard memory limits via cgroups. When Node.js tries to allocate more than the limit, the kernel’s OOM killer instantly terminates the process.
Node.js has its own garbage collector (GC). It sees memory pressure and tries to free it—but here’s the catch: Node.js’s heap limit often ignores cgroup limits on older versions.
Node.js < v14: Doesn’t respect --max-old-space-size relative to cgroup limits.
Node.js v16+ with NODE_OPTIONS=--max-old-space-size=: You can cap it, but Next.js’s cache can still leak beyond that because of native addons and external buffers.
- Reproduce It Yourself (Locally with Docker) Try this to see the crash in action:
Dockerfile:
dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]
Run with memory limit:
bash
docker run --memory=256m --memory-swap=256m -p 3000:3000 my-nextjs-app
Now bombard it:
bash
Simulate many unique cache keys
for i in {1..1000}; do
curl "http://localhost:3000/blog/$i?nocache=$i"
done
Watch docker stats — memory climbs until OOM.
- Real Fixes (Not Just "Add More RAM") You can throw 2Gi at the pod, but that just delays the crash. Fix the root causes.
Fix #1: Correct Node.js Heap Limit
Set NODE_OPTIONS to leave room for non-heap memory (buffers, C++ objects, etc.):
yaml
deployment.yaml
spec:
containers:
- name: nextjs
image: my-nextjs-app
resources:
limits:
memory: "1024Mi"
requests:
memory: "512Mi"
env:
- name: NODE_OPTIONS value: "--max-old-space-size=768" # 75% of memory limit Formula: max-old-space-size = (memory limit in MB) * 0.75
Fix #2: Disable or Limit Caching (App Router)
In next.config.js:
javascript
module.exports = {
// Disable in-memory caching for ISR
staticPageGenerationTimeout: 120,
// For App Router – control fetch cache
experimental: {
// Keep ISR pages on disk, not memory
incrementalCacheHandlerPath: require.resolve('./cache-handler.js'),
},
};
Create a custom cache handler (Redis/Memcached):
javascript
// cache-handler.js
const redis = require('redis');
const client = redis.createClient();
module.exports = class RedisCache {
async get(key) { return await client.get(key); }
async set(key, data) { await client.setex(key, 3600, data); }
};
Fix #3: Explicit Garbage Collection
Add this to your server code (use sparingly):
javascript
// pages/api/gc.js or app/api/gc/route.ts
export async function GET() {
if (global.gc) {
global.gc();
return Response.json({ message: "GC triggered" });
}
return Response.json({ error: "Run with --expose-gc" }, { status: 500 });
}
Then add --expose-gc to NODE_OPTIONS.
Fix #4: Horizontal Instead of Vertical
Don’t run one Next.js pod with 2Gi RAM. Run 4 pods with 512Mi each.
Add a HPA (Horizontal Pod Autoscaler) based on memory:
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nextjs-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nextjs
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- Monitoring: The Lifesaver You can’t fix what you can’t see. Add:
Prometheus + Grafana: Monitor process.memoryUsage().heapUsed.
Kubernetes Vertical Pod Autoscaler (VPA): Recommends memory requests/limits based on actual usage.
Heap snapshots: Use node --inspect and Chrome DevTools to see what’s cached.
Alert when memory > 80% limit for 5 minutes.
Final Checklist Before You Ship
✅ Set NODE_OPTIONS="--max-old-space-size=..."
✅ Disable or externalize ISR/data cache (Redis)
✅ Set resources.limits.memory and resources.requests.memory
✅ Add livenessProbe and readinessProbe (crashed pods restart faster)
✅ Run load tests with --memory flag in Docker
✅ Upgrade to Node.js 20+ (better cgroup v2 support)
The Bottom Line
Your Next.js app isn't leaking memory in the traditional sense—it's intentionally caching without respecting Kubernetes limits. The framework assumes infinite RAM, infinite disk, and a long-running process. Kubernetes assumes the opposite.
Bridge the gap by explicitly managing that cache and correctly sizing the Node.js heap. Or move static pages to a CDN and keep only dynamic routes in the pod.
Next.js on K8s works beautifully—once you stop treating it like a static file server.
Further reading:
Next.js Custom Cache Handler
Kubernetes OOM Explained
Node.js memory limits in containers
Let’s discuss: Have you seen Next.js OOMKilled in production? What was your fix? 👇
Top comments (0)