DEV Community

Antonios Thanasis
Antonios Thanasis

Posted on

How We Fixed a Puppeteer Memory Leak in a Laravel IoT App

We were running a Laravel IoT application on an Azure D2s_v3 VM (2 vCPUs, 8GB RAM) and kept hitting a frustrating problem β€” randomly, Docker commands would hang or fail entirely. No obvious reason. Just… dead.

After digging, we found the culprit: a Puppeteer scraper silently leaking Chromium processes every time it ran multiple scrapes.


πŸ” Investigation

First thing we checked was memory:

ps aux --sort=-%mem | head -20
Enter fullscreen mode Exit fullscreen mode

We spotted multiple orphaned Chromium processes, each eating ~90MB:

green  10374  0.1  1.1  /usr/lib/chromium/chrome --type=renderer --headless ...
green  10367  0.1  1.1  /usr/lib/chromium/chrome --type=renderer --headless ...
green  10347  0.9  1.0  /usr/lib/chromium/chrome --headless --enable-automation ...
Enter fullscreen mode Exit fullscreen mode

Docker stats confirmed the problem:

sudo docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}"
Enter fullscreen mode Exit fullscreen mode
scraper_1       320.5MiB / 7.772GiB  ← 🚨
app_1           143.8MiB / 7.772GiB
queue_worker_1  120.5MiB / 7.772GiB
Enter fullscreen mode Exit fullscreen mode

The scraper container was using much more memory than expected, and it kept growing over time.


πŸ•΅οΈ The Root Cause

The scraper visits router admin pages using Puppeteer to collect bandwidth data. Originally, the code launched one browser per device:

  • Sequential failures left Chromium processes orphaned.

  • Memory usage grew with each failed scrape.

  • Docker eventually ran out of memory or hung.

Original Problematic Pattern

// ❌ Launches a browser per URL
const browser = await puppeteer.launch();
const page = await browser.newPage();

try {
    await page.goto(url);
    // scrape logic
    await browser.close(); // only runs on success
} catch (err) {
    // may never reach browser.close()
}
Enter fullscreen mode Exit fullscreen mode
  • Each failure left a Chromium instance alive.

  • With 15+ routers, these quickly stacked up.


βœ… The Fix: Single Browser, Multiple Pages

Instead of launching a browser per device, we:

  1. Launch one shared browser.

  2. Open/close a page per URL.

  3. Always close the browser in a finally block.

Runner (scraper logic)

const runner = async ({ url, browser }) => {
    const page = await browser.newPage();

    try {
        // Go to the router URL
        await page.goto(url, { waitUntil: 'load', timeout: 15000 });

        // Fill in login form
        await page.$eval('input[name="username"]', (el, v) => el.value = v, config.teltonika_username);
        await page.$eval('input[name="password"]', (el, v) => el.value = v, config.teltonika_password);
        await page.$eval('input[type="submit"]', btn => btn.click());

        // Wait for page to load after login
        await page.waitForTimeout(15000);

        // Scrape the values
        const sent = await page.$eval("#lb_tx_sum", el => el.textContent.trim());
        const received = await page.$eval("#lb_rx_sum", el => el.textContent.trim());
        const total_today_usage = await page.$eval("#lb_all_sum", el => el.textContent.trim());

        return { fields: { sent, received, total_today_usage } };

    } catch (err) {
        throw err; // let the caller handle errors
    } finally {
        // Always close the page to prevent leaks
        await page.close();
    }
};
Enter fullscreen mode Exit fullscreen mode

Main Scraper Loop

const browser = await puppeteer.launch({
    executablePath: process.env.PUPPETEER_EXECUTABLE_PATH,
    args: ["--ignore-certificate-errors", "--no-sandbox", "--disable-dev-shm-usage"]
});

try {
    for (const dom of domains) {
        const results = await teltonika.exec({ id: dom.id, url: dom.url, browser });
        // send results to API
    }
} finally {
    await browser.close(); // βœ… guaranteed cleanup
}
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Results

Metric Before After
Scraper container memory 320.5 MiB 34.77 MiB
Orphaned Chromium processes 3+ (growing) 0 βœ…
VM RAM usage 7/8 GB Stable
Docker commands failing Frequently Never

That's a 10x memory reduction from a single finally block.


🧠 Key Takeaways

  1. Use try/catch/finally with Puppeteer β€” ensures browser cleanup even if an error occurs.

  2. Do not launch one browser per URL β€” reuse a single instance for multiple pages.

  3. Set timeouts for goto, waitForSelector, and waitForTimeout β€” avoids hangs on unreachable hosts.

  4. Scrape sequentially for many devices β€” parallel Chromium launches explode memory usage.

  5. Monitor with ps aux | grep chromium β€” if processes accumulate, you have a leak.

πŸ”§ Bonus: Quick Cleanup if You're Already Leaking

If you're in this situation right now and need to free memory immediately:

# Kill all orphaned Chromium
pkill -f chromium

# Free OS cache (safe)
sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# Add swap as a safety net
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Enter fullscreen mode Exit fullscreen mode

This approach ensures Puppeteer scrapers are robust, memory-efficient, and production-safe, even when scraping dozens of devices sequentially.


Top comments (3)

Collapse
 
sleywill_45 profile image
Alex Serebriakov

memory leaks with puppeteer are a classic β€” browser.close() in finally blocks helps but zombie processes still accumulate over time in long-running services. if screenshot/PDF generation is the part that's leaking, moving it to an API call (like snapapi.pics) offloads the browser process lifecycle entirely. your app just fires HTTP requests, no chrome to leak

Collapse
 
antoniosthanasisgit profile image
Antonios Thanasis

yeah totally agree β€” puppeteer leaks are kind of inevitable in long-running services πŸ˜… even with finally blocks you still end up chasing zombie processes after a while

offloading screenshots/PDFs to an API is actually a really nice approach, especially if that’s the main source of the leaks. not having to manage the browser lifecycle at all is a big win

in our case we wanted to keep everything self-contained, but yeah for a lot of setups that trade-off (less control vs more stability) is definitely worth it πŸ‘

Collapse
 
sleywill_45 profile image
Alex Serebriakov

solid breakdown. puppeteer memory leaks in prod are a real pain β€” especially long-running workers

been using snapapi.pics to avoid running chromium altogether. REST API for screenshots, no browser processes to leak