Dennis

Posted on Apr 3

Profiling Puppeteer Memory Usage in Node.js

#javascript #node #performance #tutorial

A puppeteer memory leak shows up as RSS growing over hours until your container gets OOM-killed. Finding the actual source requires heap snapshots, not guesswork. This guide walks through profiling Puppeteer memory step by step: setting up monitoring, establishing a baseline, running load tests, capturing heap snapshots, comparing them to find retained objects, and identifying what's actually leaking.

Understanding Node.js Memory Metrics

Before profiling, you need to know what the numbers mean. Node.js exposes four memory metrics through process.memoryUsage():

Metric	What It Measures	Puppeteer Relevance
`rss`	Resident Set Size: total physical memory allocated to the process	Your actual memory footprint. Includes code, stack, heap, and buffers.
`heapUsed`	V8 heap memory currently in use	JavaScript objects. If this grows, you have a JS-level leak.
`heapTotal`	V8 heap memory allocated	V8 may allocate more than it uses. Growth here without `heapUsed` growth means fragmentation.
`external`	Memory used by C++ objects bound to JS objects	Buffers for screenshots, page content. Large screenshots spike this.

Here's the thing most articles miss: Puppeteer's real memory usage is in Chrome's child processes, not in Node.js. The Node.js process just holds references. A "small" Node.js heap with 2GB of Chrome processes is still a 2GB problem.

function logMemory(label = '') {
  const mem = process.memoryUsage();
  const format = bytes => (bytes / 1024 / 1024).toFixed(1) + 'MB';
  console.log(
    `[${label}] rss=${format(mem.rss)} heap=${format(mem.heapUsed)}/${format(mem.heapTotal)} external=${format(mem.external)}`
  );
}

Step 1: Set Up Continuous Memory Monitoring

The first thing I do when investigating a puppeteer memory leak is add time-series tracking. You need to see memory over time, not at a single point.

const fs = require('fs');

class MemoryTracker {
  constructor(logPath = './memory-profile.jsonl') {
    this.logPath = logPath;
    this.stream = fs.createWriteStream(logPath, { flags: 'a' });
    this.requestCount = 0;
    this.interval = null;
  }

  start(intervalMs = 5000) {
    this.interval = setInterval(() => this.record(), intervalMs);
    this.record(); // Capture initial state
  }

  stop() {
    if (this.interval) clearInterval(this.interval);
    this.stream.end();
  }

  incrementRequests() {
    this.requestCount++;
  }

  record() {
    const mem = process.memoryUsage();
    const entry = {
      ts: Date.now(),
      requests: this.requestCount,
      rss: mem.rss,
      heapUsed: mem.heapUsed,
      heapTotal: mem.heapTotal,
      external: mem.external,
    };
    this.stream.write(JSON.stringify(entry) + '\n');
  }
}

// Usage
const tracker = new MemoryTracker();
tracker.start(5000); // Log every 5 seconds

// In your request handler:
async function handleScreenshot(url) {
  tracker.incrementRequests();
  // ... capture logic ...
}

This writes JSONL that you can graph later. I usually pipe it through a quick script to generate a text chart, or import it into a spreadsheet.

Step 2: Establish a Baseline

Run your service with zero load for 2 minutes and record the memory. This is your baseline.

const puppeteer = require('puppeteer');

async function measureBaseline() {
  console.log('=== BASELINE: Before browser launch ===');
  logMemory('pre-launch');

  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-dev-shm-usage'],
  });

  console.log('=== BASELINE: After browser launch ===');
  logMemory('post-launch');

  // Open and close a single page to warm up
  const page = await browser.newPage();
  await page.goto('about:blank');
  await page.close();

  console.log('=== BASELINE: After warmup page ===');
  logMemory('post-warmup');

  return browser;
}

Typical baseline numbers:

State	RSS	Heap Used
Before launch	50-80MB	15-25MB
After launch	150-250MB	25-40MB
After warmup	160-270MB	30-45MB

If your baseline is already high, check your launch args. Running with extensions, GPU enabled, or a non-headless mode uses significantly more memory.

Step 3: Run a Controlled Load Test

Now stress the system with a known workload. The goal is to see how memory behaves under sustained load.

async function loadTest(browser, urls, concurrency = 3) {
  const tracker = new MemoryTracker('./load-test.jsonl');
  tracker.start(2000);

  let completed = 0;
  const startTime = Date.now();

  async function processUrl(url) {
    const page = await browser.newPage();
    try {
      await page.goto(url, {
        waitUntil: 'networkidle2',
        timeout: 15000,
      });
      await page.screenshot({ type: 'png' });
    } catch (err) {
      console.error(`Failed: ${url} - ${err.message}`);
    } finally {
      await page.close().catch(() => {});
      completed++;
      tracker.incrementRequests();
    }
  }

  // Process in batches
  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);
    await Promise.all(batch.map(url => processUrl(url)));

    // Log progress
    const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);
    const mem = process.memoryUsage();
    console.log(
      `Completed ${completed}/${urls.length} in ${elapsed}s | RSS: ${(mem.rss / 1024 / 1024).toFixed(1)}MB`
    );
  }

  tracker.stop();
  return tracker.logPath;
}

// Test with 200 requests to the same page
const testUrls = Array(200).fill('https://example.com');
const logFile = await loadTest(browser, testUrls, 3);

What to look for in the output:

RSS grows, then plateaus: Normal. V8 allocates pools and reuses them.
RSS grows linearly and never stops: You have a puppeteer memory leak. Every request adds retained memory.
RSS grows in steps: Chrome is spawning extra processes. Check if pages are actually being closed.

Step 4: Capture Heap Snapshots with CDP

The Chrome DevTools Protocol gives you direct access to V8's heap profiler. This is where you find exactly what objects are being retained.

async function captureHeapSnapshot(page, filename) {
  const client = await page.target().createCDPSession();

  // Collect chunks of the heap snapshot
  const chunks = [];
  client.on('HeapProfiler.addHeapSnapshotChunk', ({ chunk }) => {
    chunks.push(chunk);
  });

  // Force GC first so we only see truly retained objects
  await client.send('HeapProfiler.collectGarbage');

  // Take the snapshot
  await client.send('HeapProfiler.takeHeapSnapshot', {
    reportProgress: false,
  });

  // Write to file (can be loaded in Chrome DevTools)
  const snapshot = chunks.join('');
  const fs = require('fs');
  fs.writeFileSync(filename, snapshot);
  console.log(`Heap snapshot saved: ${filename} (${(snapshot.length / 1024 / 1024).toFixed(1)}MB)`);

  await client.detach();
}

Take snapshots at strategic points:

async function profileWithSnapshots(browser) {
  // Snapshot 1: Clean state
  const warmupPage = await browser.newPage();
  await warmupPage.goto('about:blank');
  await captureHeapSnapshot(warmupPage, 'heap-01-baseline.heapsnapshot');
  await warmupPage.close();

  // Process 50 pages
  for (let i = 0; i < 50; i++) {
    const page = await browser.newPage();
    await page.goto('https://example.com', { waitUntil: 'networkidle2' });
    await page.screenshot();
    await page.close();
  }

  // Snapshot 2: After 50 requests
  const checkPage1 = await browser.newPage();
  await checkPage1.goto('about:blank');
  await captureHeapSnapshot(checkPage1, 'heap-02-after50.heapsnapshot');
  await checkPage1.close();

  // Process 50 more
  for (let i = 0; i < 50; i++) {
    const page = await browser.newPage();
    await page.goto('https://example.com', { waitUntil: 'networkidle2' });
    await page.screenshot();
    await page.close();
  }

  // Snapshot 3: After 100 requests
  const checkPage2 = await browser.newPage();
  await checkPage2.goto('about:blank');
  await captureHeapSnapshot(checkPage2, 'heap-03-after100.heapsnapshot');
  await checkPage2.close();
}

Step 5: Compare Snapshots to Find Retained Objects

Load the .heapsnapshot files in Chrome DevTools (Memory tab > Load). Switch to "Comparison" view between snapshot 1 and snapshot 3.

Sort by "Size Delta" descending. The biggest growers are your leak suspects.

Common retained objects in Puppeteer leaks:

Object Type	Likely Cause
`JSArrayBuffer`	Screenshot buffers not released
`(string)` growing	Console messages, page content retained
`EventEmitter` / `listeners`	Event handlers not removed
`CDPSession`	CDP sessions not detached
`ExecutionContext`	Page contexts surviving page.close()

You can also do this programmatically if you want automated leak detection:

async function getHeapStats(page) {
  const client = await page.target().createCDPSession();
  await client.send('HeapProfiler.collectGarbage');

  const { result } = await client.send('Runtime.evaluate', {
    expression: `({
      jsHeapSizeLimit: performance.memory?.jsHeapSizeLimit,
      totalJSHeapSize: performance.memory?.totalJSHeapSize,
      usedJSHeapSize: performance.memory?.usedJSHeapSize,
    })`,
    returnByValue: true,
  });

  await client.detach();
  return result.value;
}

Step 6: Use Node.js --inspect for Live Profiling

For real-time investigation, start your process with the --inspect flag and connect Chrome DevTools:

node --inspect --max-old-space-size=512 server.js

Then open chrome://inspect in Chrome and click "inspect" next to your Node.js process.

In the Memory tab, you can:

Take heap snapshots at any time
Use "Allocation instrumentation on timeline" to see which functions allocate the most
Use "Allocation sampling" for lower-overhead production profiling

The allocation timeline is particularly useful. Start recording, run 10-20 screenshot requests, stop recording. Blue bars are allocations that were garbage collected (normal). Red bars are allocations that are still alive (potential leaks).

Programmatic Heap Dumps with heapdump

For production systems where you can't attach a debugger, use the heapdump module:

const heapdump = require('heapdump');
const path = require('path');

// Dump heap on demand via signal
process.on('SIGUSR2', () => {
  const filename = path.join(
    '/tmp',
    `heapdump-${process.pid}-${Date.now()}.heapsnapshot`
  );
  heapdump.writeSnapshot(filename, (err, filepath) => {
    if (err) console.error('Heap dump failed:', err);
    else console.log('Heap dump written to', filepath);
  });
});

// Auto-dump when memory exceeds threshold
function watchMemory(thresholdMB = 500) {
  setInterval(() => {
    const { heapUsed } = process.memoryUsage();
    const usedMB = heapUsed / 1024 / 1024;

    if (usedMB > thresholdMB) {
      const filename = path.join(
        '/tmp',
        `heapdump-oom-${process.pid}-${Date.now()}.heapsnapshot`
      );
      heapdump.writeSnapshot(filename);
      console.warn(`Heap exceeded ${thresholdMB}MB, dumped to ${filename}`);
    }
  }, 30000);
}

Trigger a dump with kill -USR2 <pid>. Download the file and load it in Chrome DevTools.

When Growth Is Not a Leak

Not all memory growth is a puppeteer memory leak. Some growth is expected:

V8 heap pool allocation: V8 allocates memory in chunks. After the first few hundred requests, it may grab 50-100MB more heap space. If heapUsed stays flat while heapTotal grows, V8 just pre-allocated room. This growth stops eventually.

Shared library loading: The first time Chrome encounters certain content (WebGL, video codecs, PDF rendering), it loads shared libraries. These stay loaded. This is one-time growth.

Disk cache warming: If you haven't disabled the disk cache, Chrome caches resources in memory before flushing to disk. This looks like temporary growth.

Buffer Pool: Node.js maintains a pool of Buffer objects. The first batch of screenshots causes the pool to grow. It doesn't shrink, but it doesn't keep growing either.

How to tell the difference:

async function isLeaking(durationMinutes = 10, sampleIntervalMs = 10000) {
  const samples = [];

  return new Promise(resolve => {
    const interval = setInterval(() => {
      const mem = process.memoryUsage();
      samples.push({
        ts: Date.now(),
        rss: mem.rss,
        heapUsed: mem.heapUsed,
      });
    }, sampleIntervalMs);

    setTimeout(() => {
      clearInterval(interval);

      // Simple linear regression on RSS
      const n = samples.length;
      const half = Math.floor(n / 2);
      const firstHalfAvg =
        samples.slice(0, half).reduce((s, x) => s + x.rss, 0) / half;
      const secondHalfAvg =
        samples.slice(half).reduce((s, x) => s + x.rss, 0) / (n - half);

      const growthMB = (secondHalfAvg - firstHalfAvg) / 1024 / 1024;

      console.log(`First half avg RSS: ${(firstHalfAvg / 1024 / 1024).toFixed(1)}MB`);
      console.log(`Second half avg RSS: ${(secondHalfAvg / 1024 / 1024).toFixed(1)}MB`);
      console.log(`Growth: ${growthMB.toFixed(1)}MB`);

      // If growth is more than 50MB over the test period, likely a leak
      resolve({
        leaking: growthMB > 50,
        growthMB: growthMB,
        samples: samples.length,
      });
    }, durationMinutes * 60 * 1000);
  });
}

Full Profiling Session Example

Putting it all together. Here's a complete profiling script you can drop into your project:

const puppeteer = require('puppeteer');
const fs = require('fs');

const LOG_FILE = './memory-profile.jsonl';
const SNAPSHOT_DIR = './snapshots';

async function fullProfile() {
  fs.mkdirSync(SNAPSHOT_DIR, { recursive: true });
  const log = fs.createWriteStream(LOG_FILE, { flags: 'w' });

  function record(label, extra = {}) {
    const mem = process.memoryUsage();
    const entry = {
      ts: Date.now(),
      label,
      rss: mem.rss,
      heapUsed: mem.heapUsed,
      heapTotal: mem.heapTotal,
      external: mem.external,
      ...extra,
    };
    log.write(JSON.stringify(entry) + '\n');
    const mb = n => (n / 1024 / 1024).toFixed(1);
    console.log(
      `[${label}] RSS=${mb(mem.rss)}MB heap=${mb(mem.heapUsed)}MB ext=${mb(mem.external)}MB`
    );
  }

  record('start');

  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-dev-shm-usage'],
  });

  record('browser-launched');

  const testUrl = process.argv[2] || 'https://example.com';
  const iterations = parseInt(process.argv[3] || '100', 10);

  // Capture snapshots at 0%, 50%, 100%
  const snapshotAt = new Set([0, Math.floor(iterations / 2), iterations - 1]);

  for (let i = 0; i < iterations; i++) {
    const page = await browser.newPage();

    if (snapshotAt.has(i)) {
      await page.goto('about:blank');
      const snapshotFile = `${SNAPSHOT_DIR}/heap-${i}.heapsnapshot`;
      const client = await page.target().createCDPSession();
      const chunks = [];
      client.on('HeapProfiler.addHeapSnapshotChunk', ({ chunk }) => chunks.push(chunk));
      await client.send('HeapProfiler.collectGarbage');
      await client.send('HeapProfiler.takeHeapSnapshot', { reportProgress: false });
      fs.writeFileSync(snapshotFile, chunks.join(''));
      await client.detach();
      console.log(`Snapshot saved: ${snapshotFile}`);
    }

    try {
      await page.goto(testUrl, { waitUntil: 'networkidle2', timeout: 15000 });
      await page.screenshot({ type: 'png' });
    } catch (err) {
      console.error(`Request ${i} failed: ${err.message}`);
    } finally {
      await page.close().catch(() => {});
    }

    record(`request-${i}`, { requestNum: i });
  }

  await browser.close();
  record('done');
  log.end();

  console.log(`\nProfile complete. Data: ${LOG_FILE}`);
  console.log(`Snapshots: ${SNAPSHOT_DIR}/`);
  console.log('Load .heapsnapshot files in Chrome DevTools (Memory tab) to analyze.');
}

fullProfile().catch(console.error);

Run it:

node --max-old-space-size=512 profile.js https://your-target-site.com 200

Then load the heap snapshots in Chrome DevTools and compare them. The objects that grow between snapshots are your leak.

When Profiling Points to Infrastructure, Not Code

Sometimes the profiling shows that your code is clean. Pages are closed properly, no listeners are leaking, no closures retaining data. But RSS still grows because Chrome itself fragments memory over thousands of page loads.

At that point you have two options: browser recycling (restart Chrome every N requests) or offload the problem entirely. Screenshot APIs like SnapRender handle browser lifecycle management on their infrastructure, which means the puppeteer memory leak becomes their problem to profile, not yours. For high-volume workloads, that tradeoff often makes sense financially when you factor in the engineering hours spent on profiling sessions like this one.

But for lower volumes or when you need full browser control, the profiling workflow above will find your leak. The key principle: measure, don't guess. Heap snapshots don't lie.

DEV Community