Custodia-Admin

Posted on Mar 2 • Originally published at pagebolt.dev

Why Puppeteer keeps timing out in production (and what to do instead)

#puppeteer #node #webdev #devtools

Why Puppeteer keeps timing out in production (and what to do instead)

Your Puppeteer screenshot works locally. Takes 2 seconds. You deploy to production.

Suddenly: timeouts. Every 3rd request fails. Your error logs are full of:

TimeoutError: Waiting for navigation to "https://example.com" failed: Timeout 30000ms exceeded

Error: Browser.newPage(): target page crashed

You're not alone. This is the #1 problem with running Puppeteer in production.

Why Puppeteer times out

1. Memory exhaustion

Each Puppeteer instance holds a browser process (~150MB base + page overhead). Under load, memory fills up fast.

// This looks fine...
for (let i = 0; i < 1000; i++) {
  const page = await browser.newPage();
  await page.goto(url);
  const screenshot = await page.screenshot();
  // FORGOT TO CLOSE PAGE
  // page.close(); // ← This line missing
}

Forgot to close pages? Memory bloat. Browser slows down. Next page timeout.

Or you have a memory leak in page closure — pages pile up in memory even after .close().

2. Cold start penalty

Spawning a new browser process takes 5-15 seconds on first call:

// First request to your server
const browser = await puppeteer.launch(); // ← 8 seconds
const page = await browser.newPage();
await page.goto(url, { timeout: 30000 }); // ← Now 22 seconds already spent
const screenshot = await page.screenshot();

Your timeout is 30 seconds total. You've burned 8 seconds just starting the browser. Network hiccup? Timeout.

3. Single-page app rendering lag

Modern SPAs don't render on initial HTML. They load, fetch data, render.

await page.goto(url, { 
  waitUntil: 'networkidle2' // ← Waits for network quiet
});

If the SPA has a bug and keeps fetching data, networkidle2 waits forever (or until timeout).

One bad third-party API call → entire screenshot times out.

4. Resource exhaustion under concurrency

You have 10 concurrent screenshot requests. Each needs a browser process:

Request 1: Browser process #1 (150MB)
Request 2: Browser process #2 (150MB)
Request 3: Browser process #3 (150MB)
...
Request 10: Out of memory, killed

Now requests 1-9 fail because the OS killed the browser.

5. DNS and network issues

await page.goto(url, { timeout: 30000 });

If DNS is slow (even 1 second extra) or the page takes 20 seconds to load, timeout.

Why self-hosting Puppeteer is fragile at scale

You think the problem is your code. It's not. It's the architecture.

Puppeteer at scale requires:

Browser pool management (pre-spawn, recycle, health checks)
Memory monitoring (kill old processes before OOM)
Timeout handling (retry logic, fallbacks)
Load balancing (distribute across multiple instances)
Logging/debugging (understand why timeouts happen)

One EC2 instance can handle ~5-10 concurrent Puppeteer requests. Scale to 100 concurrent? You need 10-20 instances.

Now you're managing:

Kubernetes orchestration
Session affinity (sticky sessions)
Auto-scaling policies
Health checks
Cost (10-20 × $50/month = $500+/month)

And it's still fragile. One bad page → timeout cascade → cascading failures across all instances.

Solution: REST API (2-3 second latency, zero complexity)

Instead of managing browser pools, call an API:

const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.PAGEBOLT_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com',
    format: 'png'
  })
});

const screenshot = await response.arrayBuffer();
res.set('Content-Type', 'image/png');
res.send(screenshot);

That's it. No memory management. No timeout handling. No infrastructure.

If the API times out? Managed service handles retry logic for you. Your code doesn't care.

Real comparison: Puppeteer vs API

Puppeteer (production headache)

const puppeteer = require('puppeteer');
let browser;

async function init() {
  browser = await puppeteer.launch({
    args: ['--no-sandbox', '--disable-dev-shm-usage']
  });
}

async function takeScreenshot(url) {
  try {
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 720 });

    // Timeout wrapped in try-catch
    await Promise.race([
      page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 }),
      new Promise((_, reject) => 
        setTimeout(() => reject(new Error('Custom timeout')), 25000)
      )
    ]);

    const screenshot = await page.screenshot({ format: 'png' });
    await page.close();

    return screenshot;
  } catch (error) {
    console.error('Screenshot failed:', error);
    throw error; // Caller retries? Logs error? Your code handles this
  }
}

app.get('/screenshot', async (req, res) => {
  try {
    const screenshot = await takeScreenshot(req.query.url);
    res.set('Content-Type', 'image/png');
    res.send(screenshot);
  } catch (error) {
    res.status(500).send('Screenshot failed: ' + error.message);
  }
});

// Handle browser crashes
process.on('SIGTERM', async () => {
  await browser.close();
  process.exit(0);
});

// Monitor memory, restart if needed
setInterval(async () => {
  const memory = process.memoryUsage();
  if (memory.heapUsed > 1e9) { // 1GB
    console.log('Memory high, restarting browser...');
    await browser.close();
    browser = await puppeteer.launch({...});
  }
}, 60000);

Issues:

100+ lines of boilerplate
Manual memory management
Crash recovery logic
Timeout handling logic
Still fragile

REST API (5-line solution)

const fetch = require('node-fetch');

async function takeScreenshot(url) {
  const response = await fetch('https://pagebolt.dev/api/v1/screenshot', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.PAGEBOLT_API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, format: 'png' })
  });

  if (!response.ok) throw new Error(`HTTP ${response.status}`);
  return await response.arrayBuffer();
}

app.get('/screenshot', async (req, res) => {
  try {
    const screenshot = await takeScreenshot(req.query.url);
    res.set('Content-Type', 'image/png');
    res.send(screenshot);
  } catch (error) {
    res.status(500).send('Screenshot failed');
  }
});

Advantages:

5 lines of actual code
No memory management
No crash recovery (managed service handles it)
No timeout logic (API handles retries)
Robust (managed infrastructure handles crashes and retries)

Cost reality

Item	Puppeteer	REST API
Infrastructure	$50-200/month	$0
Scaling	10 instances = $500/month	Auto-scales, same cost
DevOps time	5-10 hours/month	0 hours
Timeout debugging	10+ hours/month	Never
Per-screenshot cost	$0.05-0.20	$0.02-0.05
Total effective cost	$1,500-2,000/month	$50-100/month

For most teams: REST API wins by 10-20x.

When to keep Puppeteer

Keep self-hosted Puppeteer if:

✅ Processing 10,000+ screenshots/day (economies of scale)
✅ Data residency requirement (EU data can't leave EU)
✅ Dedicated DevOps team already maintaining it
✅ Sub-millisecond latency critical (unlikely)

For everyone else: use an API.

PDF generation: Same story

html-pdf, puppeteer.pdf(), wkhtmltopdf — all have the same problems.

Same solution:

const response = await fetch('https://pagebolt.dev/api/v1/pdf', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.PAGEBOLT_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    html: '<h1>Invoice #123</h1><p>Total: $99.99</p>',
    format: 'A4'
  })
});

const pdf = await response.arrayBuffer();
res.set('Content-Type', 'application/pdf');
res.send(pdf);