Why Puppeteer keeps timing out in production (and what to do instead)
Your Puppeteer screenshot works locally. Takes 2 seconds. You deploy to production.
Suddenly: timeouts. Every 3rd request fails. Your error logs are full of:
TimeoutError: Waiting for navigation to "https://example.com" failed: Timeout 30000ms exceeded
or
Error: Browser.newPage(): target page crashed
You're not alone. This is the #1 problem with running Puppeteer in production.
Why Puppeteer times out
1. Memory exhaustion
Each Puppeteer instance holds a browser process (~150MB base + page overhead). Under load, memory fills up fast.
// This looks fine...
for (let i = 0; i < 1000; i++) {
const page = await browser.newPage();
await page.goto(url);
const screenshot = await page.screenshot();
// FORGOT TO CLOSE PAGE
// page.close(); // ← This line missing
}
Forgot to close pages? Memory bloat. Browser slows down. Next page timeout.
Or you have a memory leak in page closure — pages pile up in memory even after .close().
2. Cold start penalty
Spawning a new browser process takes 5-15 seconds on first call:
// First request to your server
const browser = await puppeteer.launch(); // ← 8 seconds
const page = await browser.newPage();
await page.goto(url, { timeout: 30000 }); // ← Now 22 seconds already spent
const screenshot = await page.screenshot();
Your timeout is 30 seconds total. You've burned 8 seconds just starting the browser. Network hiccup? Timeout.
3. Single-page app rendering lag
Modern SPAs don't render on initial HTML. They load, fetch data, render.
await page.goto(url, {
waitUntil: 'networkidle2' // ← Waits for network quiet
});
If the SPA has a bug and keeps fetching data, networkidle2 waits forever (or until timeout).
One bad third-party API call → entire screenshot times out.
4. Resource exhaustion under concurrency
You have 10 concurrent screenshot requests. Each needs a browser process:
Request 1: Browser process #1 (150MB)
Request 2: Browser process #2 (150MB)
Request 3: Browser process #3 (150MB)
...
Request 10: Out of memory, killed
Now requests 1-9 fail because the OS killed the browser.
5. DNS and network issues
await page.goto(url, { timeout: 30000 });
If DNS is slow (even 1 second extra) or the page takes 20 seconds to load, timeout.
Why self-hosting Puppeteer is fragile at scale
You think the problem is your code. It's not. It's the architecture.
Puppeteer at scale requires:
- Browser pool management (pre-spawn, recycle, health checks)
- Memory monitoring (kill old processes before OOM)
- Timeout handling (retry logic, fallbacks)
- Load balancing (distribute across multiple instances)
- Logging/debugging (understand why timeouts happen)
One EC2 instance can handle ~5-10 concurrent Puppeteer requests. Scale to 100 concurrent? You need 10-20 instances.
Now you're managing:
- Kubernetes orchestration
- Session affinity (sticky sessions)
- Auto-scaling policies
- Health checks
- Cost (10-20 × $50/month = $500+/month)
And it's still fragile. One bad page → timeout cascade → cascading failures across all instances.
Solution: REST API (2-3 second latency, zero complexity)
Instead of managing browser pools, call an API:
const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
method: 'POST',
headers: {
'x-api-key': process.env.PAGEBOLT_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://example.com',
format: 'png'
})
});
const screenshot = await response.arrayBuffer();
res.set('Content-Type', 'image/png');
res.send(screenshot);
That's it. No memory management. No timeout handling. No infrastructure.
If the API times out? Managed service handles retry logic for you. Your code doesn't care.
Real comparison: Puppeteer vs API
Puppeteer (production headache)
const puppeteer = require('puppeteer');
let browser;
async function init() {
browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-dev-shm-usage']
});
}
async function takeScreenshot(url) {
try {
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 720 });
// Timeout wrapped in try-catch
await Promise.race([
page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 }),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Custom timeout')), 25000)
)
]);
const screenshot = await page.screenshot({ format: 'png' });
await page.close();
return screenshot;
} catch (error) {
console.error('Screenshot failed:', error);
throw error; // Caller retries? Logs error? Your code handles this
}
}
app.get('/screenshot', async (req, res) => {
try {
const screenshot = await takeScreenshot(req.query.url);
res.set('Content-Type', 'image/png');
res.send(screenshot);
} catch (error) {
res.status(500).send('Screenshot failed: ' + error.message);
}
});
// Handle browser crashes
process.on('SIGTERM', async () => {
await browser.close();
process.exit(0);
});
// Monitor memory, restart if needed
setInterval(async () => {
const memory = process.memoryUsage();
if (memory.heapUsed > 1e9) { // 1GB
console.log('Memory high, restarting browser...');
await browser.close();
browser = await puppeteer.launch({...});
}
}, 60000);
Issues:
- 100+ lines of boilerplate
- Manual memory management
- Crash recovery logic
- Timeout handling logic
- Still fragile
REST API (5-line solution)
const fetch = require('node-fetch');
async function takeScreenshot(url) {
const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
method: 'POST',
headers: {
'x-api-key': process.env.PAGEBOLT_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url, format: 'png' })
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return await response.arrayBuffer();
}
app.get('/screenshot', async (req, res) => {
try {
const screenshot = await takeScreenshot(req.query.url);
res.set('Content-Type', 'image/png');
res.send(screenshot);
} catch (error) {
res.status(500).send('Screenshot failed');
}
});
Advantages:
- 5 lines of actual code
- No memory management
- No crash recovery (managed service handles it)
- No timeout logic (API handles retries)
- Robust (managed infrastructure handles crashes and retries)
Cost reality
| Item | Puppeteer | REST API |
|---|---|---|
| Infrastructure | $50-200/month | $0 |
| Scaling | 10 instances = $500/month | Auto-scales, same cost |
| DevOps time | 5-10 hours/month | 0 hours |
| Timeout debugging | 10+ hours/month | Never |
| Per-screenshot cost | $0.05-0.20 | $0.02-0.05 |
| Total effective cost | $1,500-2,000/month | $50-100/month |
For most teams: REST API wins by 10-20x.
When to keep Puppeteer
Keep self-hosted Puppeteer if:
- ✅ Processing 10,000+ screenshots/day (economies of scale)
- ✅ Data residency requirement (EU data can't leave EU)
- ✅ Dedicated DevOps team already maintaining it
- ✅ Sub-millisecond latency critical (unlikely)
For everyone else: use an API.
PDF generation: Same story
html-pdf, puppeteer.pdf(), wkhtmltopdf — all have the same problems.
Same solution:
const response = await fetch('https://pagebolt.dev/api/v1/pdf', {
method: 'POST',
headers: {
'x-api-key': process.env.PAGEBOLT_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
html: '<h1>Invoice #123</h1><p>Total: $99.99</p>',
format: 'A4'
})
});
const pdf = await response.arrayBuffer();
res.set('Content-Type', 'application/pdf');
res.send(pdf);
No wkhtmltopdf process management. No hangs. No timeouts.
Getting started
- Sign up at pagebolt.dev (free: 100 requests/month)
- Replace Puppeteer code with fetch() calls (5 minutes)
- Deploy and never think about Puppeteer timeouts again
Stop debugging Puppeteer crashes at 2 AM. Let a managed service handle it.
Start free — 100 screenshots/month, no credit card.
Top comments (0)