Dennis

Posted on Apr 9

Puppeteer in Docker: Image Size, Memory, and Performance Optimisation

#softwaredevelopment

A default Puppeteer Docker image weighs 2.5GB, eats 500MB+ of RAM per browser instance, and crashes under load with cryptic SIGKILL errors. This guide covers the specific techniques I use to get Puppeteer Docker images down to 800MB, keep memory stable in production, and avoid the container-specific failure modes that don't show up in local development.

The Starting Point: Why Default Images Are Huge

When you npm install puppeteer inside a Docker container, it downloads a full Chromium binary (~170MB compressed, ~450MB on disk). Combine that with a node:latest base image (~350MB), development dependencies, npm cache, and build artifacts, and you're easily at 2-2.5GB.

Every megabyte matters. Larger images mean slower pulls, slower deployments, more registry storage costs, and longer cold starts if you're running this on Lambda or Cloud Run. For a puppeteer screenshot docker lambda pipeline, image size directly impacts startup latency.

Let's fix each layer.

Image Size Reduction: 2.5GB to 800MB

Step 1: Use node:slim Instead of node:latest

The default Node.js image is based on Debian with a full set of tools and libraries you'll never use. The slim variant strips those out.

Base Image	Size
`node:20`	~350MB
`node:20-slim`	~75MB
`node:20-alpine`	~50MB

Alpine is tempting, but I don't recommend it for Puppeteer. Chrome depends on glibc, and Alpine uses musl. You can make it work with extra packages, but you'll fight compatibility issues. Stick with slim.

Step 2: Install Only Required Chromium Dependencies

Chromium needs specific system libraries for rendering, fonts, and GPU interaction. Here's the exact list for Debian-based images:

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    fonts-liberation \
    fonts-noto-color-emoji \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libglib2.0-0 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libx11-6 \
    libxcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    wget \
    xdg-utils \
    && rm -rf /var/lib/apt/lists/*

The --no-install-recommends flag is critical. Without it, apt-get pulls in dozens of "recommended" packages you don't need. The rm -rf /var/lib/apt/lists/* clears the package cache from the layer.

Step 3: Skip Puppeteer's Bundled Download, Use System Chrome

By default, npm install puppeteer downloads its own Chromium. If you're installing Chrome yourself (for version control or size reasons), skip that download:

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable

Then install Chrome directly:

RUN wget -q -O /tmp/chrome.deb \
    https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
    && apt-get update \
    && apt-get install -y --no-install-recommends /tmp/chrome.deb \
    && rm /tmp/chrome.deb \
    && rm -rf /var/lib/apt/lists/*

This gives you a known, stable Chrome version instead of whatever Chromium revision Puppeteer pins to.

Step 4: Multi-Stage Build

The final optimization: use a multi-stage build to leave behind npm cache, build tools, and anything else that's not needed at runtime.

# Build stage
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev

# Production stage
FROM node:20-slim
WORKDIR /app

# System dependencies for Chrome
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    fonts-liberation \
    fonts-noto-color-emoji \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libglib2.0-0 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libx11-6 \
    libxcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    wget \
    xdg-utils \
    && rm -rf /var/lib/apt/lists/*

# Install Chrome
RUN wget -q -O /tmp/chrome.deb \
    https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
    && apt-get update \
    && apt-get install -y --no-install-recommends /tmp/chrome.deb \
    && rm /tmp/chrome.deb \
    && rm -rf /var/lib/apt/lists/*

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable

# Copy only production node_modules
COPY --from=builder /app/node_modules ./node_modules
COPY . .

# Run as non-root
RUN groupadd -r pptruser && useradd -r -g pptruser pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser /app
USER pptruser

CMD ["node", "server.js"]

Result: ~800MB-1.2GB depending on how many fonts you include. That's a 50-65% reduction from the naive approach.

Size Comparison Summary

Approach	Image Size
`node:20` + `npm install puppeteer`	~2.5GB
`node:20-slim` + system Chrome	~1.4GB
Multi-stage + slim + system Chrome	~800MB-1.2GB
Multi-stage + slim + @sparticuz/chromium (no full Chrome)	~400-500MB

Memory Optimization

Image size affects deployment speed. Memory affects whether your containers stay alive under load. In my experience, memory issues cause 80% of production Puppeteer failures in Docker.

The /dev/shm Problem

This is the most common puppeteer screenshot docker lambda and container issue. Chrome uses /dev/shm (shared memory) for inter-process communication. In Docker, /dev/shm defaults to 64MB. Chrome routinely needs more than that.

Symptom: Error: Page crashed! or SIGBUS errors under load.

Fix option 1: Increase /dev/shm:

docker run --shm-size=1g my-puppeteer-image

Fix option 2: Disable shared memory usage entirely:

const browser = await puppeteer.launch({
  args: ["--disable-dev-shm-usage"],
});

This flag makes Chrome write to /tmp instead of /dev/shm. It's slightly slower but eliminates the size constraint. I use this in every containerized Puppeteer deployment.

Critical Launch Arguments for Containers

const browser = await puppeteer.launch({
  executablePath: "/usr/bin/google-chrome-stable",
  args: [
    "--disable-dev-shm-usage",
    "--disable-gpu",
    "--disable-software-rasterizer",
    "--disable-extensions",
    "--disable-background-networking",
    "--disable-default-apps",
    "--disable-sync",
    "--no-first-run",
    "--no-sandbox",        // Required when running as root in Docker
    "--disable-setuid-sandbox",
    "--metrics-recording-only",
    "--mute-audio",
    "--hide-scrollbars",
  ],
  headless: "new",
});

A note on --no-sandbox: Chrome's sandbox requires specific kernel capabilities that aren't available in most container runtimes. Running as a non-root user (as in the Dockerfile above) is the right security mitigation. Don't run as root with --no-sandbox in production.

--single-process for Low-Memory Environments

Chrome normally spawns multiple processes: a browser process, a GPU process, renderer processes for each tab, and utility processes. In a constrained container, this can mean 6-8 processes for a single page.

args: ["--single-process"]

This collapses everything into one process. Benefits: lower total memory usage, faster startup. Tradeoffs: a crash in one tab kills the whole browser, and some Chrome features don't work in single-process mode. For a screenshot service handling one page at a time, it's a good tradeoff.

Page Lifecycle Discipline

The most impactful memory optimization isn't a Chrome flag. It's closing pages when you're done with them.

// Bad: pages accumulate, memory grows until OOM
async function captureScreenshot(url) {
  const page = await browser.newPage();
  await page.goto(url);
  const screenshot = await page.screenshot();
  return screenshot;
  // page is never closed!
}

// Good: always close in a finally block
async function captureScreenshot(url) {
  const page = await browser.newPage();
  try {
    await page.goto(url);
    return await page.screenshot();
  } finally {
    await page.close();
  }
}

Each unclosed page holds references to the DOM, JavaScript heap, and network state. Ten leaked pages can eat 1-2GB.

Browser Recycling

Creating a new browser instance for every request is wasteful (5-8 seconds of startup each time). But keeping one browser alive forever leads to memory leaks from Chrome's internal state.

The sweet spot: recycle the browser after N requests.

const MAX_REQUESTS_PER_BROWSER = 50;
let browser = null;
let requestCount = 0;

async function getBrowser() {
  if (!browser || !browser.isConnected() || requestCount >= MAX_REQUESTS_PER_BROWSER) {
    if (browser) {
      await browser.close().catch(() => {});
    }
    browser = await puppeteer.launch({
      executablePath: "/usr/bin/google-chrome-stable",
      args: [
        "--disable-dev-shm-usage",
        "--disable-gpu",
        "--no-sandbox",
        "--disable-setuid-sandbox",
      ],
      headless: "new",
    });
    requestCount = 0;
  }
  requestCount++;
  return browser;
}

I've found 50-100 requests per browser is a good balance. Below 20 and you're paying too much startup cost. Above 200 and memory creep becomes noticeable.

Performance Optimization

/dev/shm Sizing

If you're using --disable-dev-shm-usage, skip this section. If you'd rather keep shared memory for performance:

# docker-compose.yml
services:
  screenshot:
    shm_size: "2g"

Rule of thumb: allocate 256MB per concurrent page you plan to render. Four concurrent pages means 1GB minimum.

CPU Allocation

Chrome rendering is CPU-bound. Skimping on CPU creates a bottleneck where pages take 10-15 seconds instead of 2-3 seconds.

services:
  screenshot:
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "1.0"
          memory: 2G

One full CPU core handles about 3-5 concurrent screenshot requests comfortably. Two cores handle 8-12.

Concurrent Page Limits

Don't let your service open unlimited pages. Each page consumes 100-300MB. Set a hard limit and queue requests beyond it.

const pLimit = require("p-limit");

const MAX_CONCURRENT_PAGES = 4;
const limit = pLimit(MAX_CONCURRENT_PAGES);

async function handleScreenshotRequest(url) {
  return limit(async () => {
    const page = await browser.newPage();
    try {
      await page.goto(url, { waitUntil: "networkidle2", timeout: 15000 });
      return await page.screenshot({ type: "png" });
    } finally {
      await page.close();
    }
  });
}

Docker Compose for Local Development

A complete setup for local development with a screenshot service:

version: "3.8"

services:
  screenshot-service:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    shm_size: "2g"
    environment:
      - NODE_ENV=development
      - MAX_CONCURRENT_PAGES=4
      - BROWSER_RECYCLE_AFTER=50
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

volumes:
  redis-data:

The Redis service is optional but useful for caching screenshots by URL hash. A simple SET url_hash screenshot_base64 EX 3600 gives you one-hour caching with almost no code.

Here's a minimal Express server that ties it together:

const express = require("express");
const puppeteer = require("puppeteer-core");
const pLimit = require("p-limit");

const app = express();
const PORT = process.env.PORT || 3000;
const MAX_PAGES = parseInt(process.env.MAX_CONCURRENT_PAGES || "4");
const limit = pLimit(MAX_PAGES);

let browser;

async function initBrowser() {
  browser = await puppeteer.launch({
    executablePath: process.env.PUPPETEER_EXECUTABLE_PATH || "/usr/bin/google-chrome-stable",
    args: [
      "--disable-dev-shm-usage",
      "--disable-gpu",
      "--no-sandbox",
      "--disable-setuid-sandbox",
    ],
    headless: "new",
  });
}

app.get("/health", (req, res) => {
  if (browser && browser.isConnected()) {
    res.status(200).json({ status: "ok" });
  } else {
    res.status(503).json({ status: "unhealthy" });
  }
});

app.get("/screenshot", async (req, res) => {
  const { url, width, height, fullPage } = req.query;
  if (!url) return res.status(400).json({ error: "url required" });

  try {
    const screenshot = await limit(async () => {
      const page = await browser.newPage();
      try {
        await page.setViewport({
          width: parseInt(width) || 1920,
          height: parseInt(height) || 1080,
        });
        await page.goto(url, { waitUntil: "networkidle2", timeout: 20000 });
        return await page.screenshot({
          type: "png",
          fullPage: fullPage === "true",
        });
      } finally {
        await page.close();
      }
    });

    res.set("Content-Type", "image/png");
    res.send(screenshot);
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

initBrowser().then(() => {
  app.listen(PORT, () => console.log(`Screenshot service on port ${PORT}`));
});

Kubernetes Deployment Considerations

Running Puppeteer containers in Kubernetes adds another layer of concerns.

Resource Limits

Be generous with resource limits and strict with requests. Chrome is spiky.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: screenshot-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: screenshot-service
  template:
    metadata:
      labels:
        app: screenshot-service
    spec:
      containers:
        - name: screenshot
          image: your-registry/screenshot-service:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: "1000m"
              memory: "2Gi"
            limits:
              cpu: "2000m"
              memory: "4Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 2
          volumeMounts:
            - name: shm
              mountPath: /dev/shm
      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 2Gi

Key points:

The /dev/shm volume mount. Kubernetes doesn't have a shm_size equivalent. You mount an emptyDir with medium: Memory at /dev/shm. This uses RAM from the node, so account for it in your memory limits.
Liveness probe with generous thresholds. Chrome can hang under memory pressure. The liveness probe catches this and restarts the pod. initialDelaySeconds: 15 gives Chrome time to start.
Readiness probe. Prevents Kubernetes from sending traffic to pods where Chrome crashed and hasn't recovered yet.

Graceful Shutdown

When Kubernetes sends SIGTERM, you need to close Chrome cleanly or you'll leak zombie processes.

process.on("SIGTERM", async () => {
  console.log("SIGTERM received, shutting down gracefully");
  if (browser) {
    await browser.close().catch(() => {});
  }
  process.exit(0);
});

process.on("SIGINT", async () => {
  if (browser) {
    await browser.close().catch(() => {});
  }
  process.exit(0);
});

Set terminationGracePeriodSeconds: 30 in your pod spec to give in-flight requests time to complete.

When to Stop Optimizing and Use an API

There's a point where the puppeteer screenshot docker lambda optimization loop stops being productive. You've spent days getting the image small, the memory stable, and the concurrency right. Then Chrome updates and breaks something, or a new page type causes OOM crashes, and you're back to debugging.

If your core need is "turn URLs into images," a screenshot API removes the entire infrastructure layer. SnapRender, for instance, handles the Chrome management, caching, device emulation, and scaling internally. One GET request, one API key, done. You trade fine-grained control for not having to deal with any of this.

But if you need browser automation beyond screenshots (form filling, scraping, testing), or you need to run custom JavaScript on pages before capture, the Docker approach is the right one. Just budget time for ongoing maintenance. Chrome doesn't stand still, and neither will your container configuration.

DEV Community