TLDR: Puppeteer works great in dev. It crashes, leaks memory, and hangs in production. Here's what you need to fix it, and when you should just use an API instead.
How It Started
needed screenshots for my SaaS. figured Puppeteer would be easy - every tutorial shows this:
const puppeteer = require('puppeteer');
async function screenshot(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const screenshot = await page.screenshot();
await browser.close();
return screenshot;
}
ran it locally. worked great. shipped to production. worked great for about a week.
How It Went
month 1: server RAM hits 90%. process crashes. pm2 restarts it. happens every 12 hours. i figure pm2 is handling it so whatever. boss hasn't noticed yet.
month 2: crashes every 6 hours now. customers complaining. boss asking questions. can't ignore it anymore. turns out browsers aren't being garbage collected even though i'm calling browser.close()
. spend my entire weekend implementing browser pooling. restart browsers every 100 screenshots. tell boss it's fixed.
month 3: it wasn't fixed. browsers hang. page.goto()
times out but never releases the browser back to the pool. all 3 browsers hung. new requests just wait forever doing nothing. screenshot feature completely broken. boss schedules a "quick sync". i panic and add watchdog timers to force-kill hung browsers after 30 seconds. cancel the meeting. some slow pages fail now but at least it doesn't hang completely.
month 4: customer emails: "why are instagram screenshots blank?" cool cool cool. instagram detects headless browsers. spend 4 hours googling User-Agent spoofing instead of building the payment flow customers are waiting for. then twitter screenshots start failing. add cookie handling. cloudflare blocking us on random sites. add retry logic. boss asks why we're behind on the roadmap.
i'm debugging why instagram returns blank instead of shipping features people pay for.
40+ hours total
that's $2k at $50/hour. worse - it's 40 hours not building things customers wanted. things my boss kept bringing up in standups.
What You Actually Need (If You're Stubborn Like Me)
1. browser pooling (took me 8 hours)
const genericPool = require('generic-pool');
const factory = {
create: async () => {
return await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu'
]
});
},
destroy: async (browser) => {
await browser.close();
},
validate: async (browser) => {
try {
await browser.pages();
return true;
} catch {
return false;
}
}
};
const pool = genericPool.createPool(factory, {
min: 2,
max: 5,
testOnBorrow: true,
evictionRunIntervalMillis: 60000
});
what this solves: creating browsers for every request kills performance
what this doesn't solve: hung browsers still exhaust the pool and everything breaks
2. timeout logic (6 hours of my life i'll never get back)
async function takeScreenshotWithTimeout(url, timeoutMs = 30000) {
return Promise.race([
takeScreenshot(url),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), timeoutMs)
)
]);
}
async function takeScreenshotWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await takeScreenshotWithTimeout(url);
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(resolve =>
setTimeout(resolve, 1000 * Math.pow(2, i))
);
}
}
}
3. caching (4 hours because i'm smart enough to use redis)
const redis = require('redis');
const client = redis.createClient();
async function getScreenshot(url, options = {}) {
const cacheKey = `screenshot:${url}:${JSON.stringify(options)}`;
const cached = await client.get(cacheKey);
if (cached) return Buffer.from(cached, 'base64');
const screenshot = await takeScreenshotWithRetry(url);
await client.setEx(cacheKey, 86400, screenshot.toString('base64'));
return screenshot;
}
4. browser restart logic (2 hours, should've done this first)
let screenshotCount = 0;
const MAX_SCREENSHOTS = 100;
async function takeScreenshot(url) {
const browser = await pool.acquire();
try {
screenshotCount++;
if (screenshotCount >= MAX_SCREENSHOTS) {
await pool.destroy(browser);
screenshotCount = 0;
return await takeScreenshot(url);
}
} finally {
await pool.release(browser);
}
}
5. monitoring (4 hours so you know WHEN it breaks, not IF)
app.get('/health', async (req, res) => {
const memUsage = process.memoryUsage();
const health = {
status: 'healthy',
pool: {
size: pool.size,
available: pool.available,
pending: pool.pending
},
memory: {
heapUsed: `${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`
}
};
if (memUsage.heapUsed / memUsage.heapTotal > 0.9) {
health.status = 'warning';
}
res.json(health);
});
6. edge cases (this never ends)
if (url.includes('instagram.com')) {
await page.setUserAgent(
'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)...'
);
}
if (url.includes('twitter.com')) {
await page.setCookie({
name: 'auth_token',
value: process.env.TWITTER_AUTH_TOKEN,
domain: '.twitter.com'
});
}
The Math
time spent:
- browser pooling: 8 hours
- timeout/retry logic: 6 hours
- caching: 4 hours
- browser restart: 2 hours
- monitoring: 4 hours
- debugging edge cases: 16+ hours
- total: 40+ hours
40 hours not building features customers asked for. features my boss kept asking about.
at $50/hour: $2k in wasted dev time
ongoing maintenance: 2-5 hours/month forever
server: $25/month
first 4 months: $2k dev time + ($150 × 4) maintenance + ($25 × 4) server = $2,700
screenshot API would've cost $29/month.
4 months = $116 total.
spent $2,700 and 40 hours to save $116.
boss would've been pissed if he knew.
When to Self-Host
puppeteer isn't bad. use it if:
- already running node infrastructure
- screenshots are your actual product
- need custom browser configs
- doing >100K/month where API costs add up
- boss is cool with you spending weeks on infrastructure
When to Use an API
use an API if:
- screenshots are a small feature
- want to ship this week
- your time costs more than $50/hour
- don't want to debug memory leaks
- doing <100K/month
- boss expects features not infrastructure work
break-even math:
for 10K screenshots/month:
- self-hosted first 4 months: $2,700
- screenshot API first 4 months: $116
the API is 23x cheaper. and i don't have to maintain it.
What I Did
took all the puppeteer stuff i built and turned it into SnapCapture.
$5/month. you skip the 40 hours.
built it to solve my problem. if it covers costs i'm happy.
free tier: 100 screenshots/month. test it first.
Bottom Line
puppeteer works. making it production-ready sucks.
you can build it yourself. took me 40 hours.
question is: worth $5/month to skip that?
for me yeah.
Resources:
Top comments (0)