DEV Community

Cover image for How to use Puppeteer inside a Docker container
Axel Navarro for Cloud(x);

Posted on • Edited on

How to use Puppeteer inside a Docker container

Introduction

Puppeteer is a Node.js library which provides a high-level API to control Chromium (or Firefox) browsers over the DevTools Protocol.

This guide helps to use Puppeteer inside a Docker container using the Node.js image.

If we use the Docker images for Node.js v14 LTS Gallium, when installing the chromium package from apt, it will be v90.0, which can have compatibility issues with the latest Puppeteer. This is because it was tested with the latest Chromium stable release.

Selecting the correct image

Well... we want to run a web browser inside a container. it's important to know what are the different between the available variants.

Alpine is enough but ...

Yeah, we can run Chromium using Alpine Linux, but we'll need a few extra steps to make it run. That's why we prefer Debian variants to make it easier.

Which distro?

Every major version of Node.js in built over a version of Debian, and that Debian version comes with an old version of Chromium, which one could be not compatible with the latest version of Puppeteer.

Node.js Debian Chromium
v14 9.13 73.0.3683.75
v16 10.9 90.0.4430.212
v17 11.2 99.0.4844.84

To quickly solve that issue we can use the Google Chrome's Debian package that always installs the latest stable version. Therefore, this Dockerfile is compatible with Node.js v14, v16, or any new one.

Why not the built-in Chromium

When we install Google Chrome, apt will install all the dependencies for us.

Dockerfile

FROM node:slim AS app

# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install curl gnupg -y \
  && curl --location --silent https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
  && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
  && apt-get update \
  && apt-get install google-chrome-stable -y --no-install-recommends \
  && rm -rf /var/lib/apt/lists/*

# Install your app here...
Enter fullscreen mode Exit fullscreen mode

💡 If you are in an ARM-based CPU like Apple M1, you should use the --platform argument when you build the Docker image.

docker build --platform linux/amd64 -t image-name .
Enter fullscreen mode Exit fullscreen mode

The code config

Remember to use the installed browser instead of the Puppeteer's built-in one inside your app's code.

import puppeteer from 'puppeteer';
...

const browser = await puppeteer.launch({
  executablePath: '/usr/bin/google-chrome',
  args: [...] // if we need them.
});
Enter fullscreen mode Exit fullscreen mode

Conclusion

The browser installation via apt will resolve the required dependencies to run a headless browser inside a Docker container without any manual intervention. These dependencies are not included in the Node.js Docker images by default.

The easiest path to use Puppeteer inside a Docker container is installing Google Chrome because, in contrast to the Chromium package offered by Debian, Chrome only offers the latest stable version.

Update 2022-08-24

This new Dockerfile version

FROM node:slim

# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install gnupg wget -y && \
  wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
  sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
  apt-get update && \
  apt-get install google-chrome-stable -y --no-install-recommends && \
  rm -rf /var/lib/apt/lists/*
Enter fullscreen mode Exit fullscreen mode

Applies the following changes:

A. Removes the apt-key deprecation warning.

Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
Enter fullscreen mode Exit fullscreen mode

B. Uses wget because it's installed by google-chrome-stable and it reduces a few MiB not installing curl.

Latest comments (39)

Collapse
 
federicowoodward profile image
federicowoodward

Hi! I just wanted to share a new solution for this. I know it's a bit late, but maybe someone will find it useful.

Step 1:
Create a simple Express server that receives the HTML and returns the buffer.

const express = require("express");
const puppeteer = require("puppeteer");

const app = express();
app.use(express.json({ limit: "10mb" })); 

app.post("/generate", async (req, res) => {
  const { html } = req.body;
  if (!html) {
    return res.status(400).send('Debe enviar la propiedad "html" en el body');
  }

  try {
    const browser = await puppeteer.launch({
      headless: true,
      args: [
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--disable-dev-shm-usage",
      ],
      protocolTimeout: 120000,
    });
    const page = await browser.newPage();
    await page.setContent(html, { waitUntil: "networkidle0" });
    const pdfBuffer = await page.pdf({ format: "A4", printBackground: true });
    const pdf = Buffer.from(pdfBuffer);
    await browser.close();

    res.set("Content-Type", "application/pdf");
    res.send(pdf);
  } catch (err) {
    console.error(err);
    res.status(500).send("Error al generar el PDF");
  }
});

const port = process.env.PORT || 3002;
app.listen(port, () =>
  console.log(`PDF Generator corriendo en el puerto ${port}`)
);

Enter fullscreen mode Exit fullscreen mode

Step 2:
Build the Docker image that uses Debian Bullseye (here puppeteer works better.) and install dependencies for chromium

FROM node:18-bullseye

RUN apt-get update && apt-get install -y \
    chromium \
    fonts-liberation \
    libasound2 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libc6 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libexpat1 \
    libfontconfig1 \
    libgbm1 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libpangocairo-1.0-0 \
    libx11-6 \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxi6 \
    libxrandr2 \
    libxrender1 \
    libxss1 \
    libxtst6 \
    --no-install-recommends && rm -rf /var/lib/apt/lists/*

ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

WORKDIR /app

COPY package.json package-lock.json* ./
RUN npm install --production

COPY . .

EXPOSE 3002

CMD ["node", "pdf-generator.js"]
Enter fullscreen mode Exit fullscreen mode

Step 3:
I won’t go into detail on this part, but you’ll need to create a docker-compose.yml with a network. Then, in your backend (for example, I use NestJS), you connect to the Puppeteer service to generate the PDF.

(In my use case, I return the response as a download link to the frontend.)

Hope this helps! 🚀

Collapse
 
rasmus_madsen_596525bd776 profile image
Rasmus Madsen • Edited

For anyone having troubles here is my solution (That runs on my machine lul)

Dockerfile

FROM node:20@sha256:48db4f6ea21d134be744207225753a1730c4bc1b4cdf836d44511c36bf0e34d7

# Configure default locale (important for chrome-headless-shell).
ENV LANG en_US.UTF-8

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chrome that Puppeteer
# installs, work.
RUN apt-get update \
    && apt-get install -y --no-install-recommends fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros \
    fonts-kacst fonts-freefont-ttf dbus dbus-x11

# Add pptruser.
RUN groupadd -r pptruser && useradd -rm -g pptruser -G audio,video pptruser

USER pptruser

WORKDIR /home/pptruser

ENV DBUS_SESSION_BUS_ADDRESS autolaunch:

# Install @puppeteer/browsers, puppeteer and puppeteer-core into /home/pptruser/node_modules.
RUN npm i puppeteer
RUN npm i @puppeteer/browsers@0.2.0
RUN npm i puppeteer-core

# Copy your script.js file into the container.
COPY script.js ./

# Install system dependencies as root.
USER root
RUN npx puppeteer browsers install chrome --install-deps

USER pptruser

# Run the script.js file using Node.js.
CMD ["node", "script.js"]
Enter fullscreen mode Exit fullscreen mode

script.js

const puppeteer = require('puppeteer');

(async () => {
  // Launch a new browser instance
  const browser = await puppeteer.launch();

  // Open a new page
  const page = await browser.newPage();

  // Navigate to example.com
  await page.goto('https://example.com');

  // Take a screenshot and save it as 'example.png'
  await page.screenshot({ path: 'example.png' });
})();
Enter fullscreen mode Exit fullscreen mode
Collapse
 
jhoanlt profile image
Jhoan López

Thank you, after so much searching I found the correct solution.

Collapse
 
illestomas profile image
Illés Tamás

what is your solution, can you post ?

Collapse
 
kordeviant profile image
Puria Kordrostami

thanks for updating the article. 💗

Collapse
 
mail2viveksagar profile image
Vivek

Getting this error - ****
Error: Failed to launch the browser process!
[74:122:0316/214552.693705:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0316/214552.821672:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Function not implemented (38)
Assertion failed: p_rcu_reader->depth != 0 (/qemu/include/qemu/rcu.h: rcu_read_unlock: 102)

TROUBLESHOOTING: pptr.dev/troubleshooting

at ChildProcess.onClose (/work/node_modules/@puppeteer/browsers/lib/cjs/launch.js:277:24)
at ChildProcess.emit (node:events:530:35)
at ChildProcess._handle.onexit (node:internal/child_process:294:12)
{"success":false,"data":"{\"status\":\"\",\"headers\":[],\"content\":\"\",\"trace\":\"PUPPET_LOG: INPUT_JSON = {\"url\":\"example.com\",\"user_agent\"...} StartLoading > ERROR > Cannot read properties of undefined (reading 'newPage')\"}"}TypeError: Cannot read properties of undefined (reading 'close')
at closeBrowser (/work/download_page_html.js:330:24)
at killProcess (/work/download_page_html.js:345:8)
at outputResult (/work/download_page_html.js:378:3)
at /work/download_page_html.js:367:4
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Collapse
 
mail2viveksagar profile image
Vivek

My docker file looks like - FROM --platform=linux/amd64 node:20

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
ENV PUPPETEER_SKIP_DOWNLOAD true

RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /work

COPY package.json ./

RUN npm install

RUN npx puppeteer browsers install chrome

COPY app.js download_page_html.js crawler-browser.js start-crawler-browser.sh start-download-page-html.sh ./

EXPOSE 3000

And I am using Apple M1 laptop

Collapse
 
rajeshpal53 profile image
Rajesh Pal

As you working on puppeteer, and if you suffer from zombie process then use below docker commands , it will not create zombie process

FROM node:18-slim

RUN apt-get update
RUN apt-get upgrade

RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update && \
apt-get upgrade && apt-get install -y vim

ADD ./puppetron.tar /usr/share/
WORKDIR /usr/share/puppetron

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV SERVICE_PATH=/usr/share/puppetron

CMD node main.js;

And path of browser

executablePath: '/usr/bin/google-chrome',

Collapse
 
keith_waters_d9b1b41d1314 profile image
Keith Waters

also VERY important is to include --init when starting your container

Collapse
 
septianmuh profile image
septianmuh • Edited

this tutorial is great. but i have an issue, when i try to install chromium in node:18-alpine3.16, i add some command like this : RUN apk add --no-cache chromium
but it still not work, chromium not installed in container. any one has tutorials or some advice?

Collapse
 
__e135a9d50c860 profile image
Лев Хотылев

Can anyone give dockerfile to work with puppeteer under python? I can't find one working correctly anywhere.

Collapse
 
tushareshwar profile image
Tushar Khanka • Edited

Here is a working docker file for an arm based docker image.

FROM --platform=linux/arm64 ubuntu:22.04

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update
RUN apt-get install -y gconf-service apt-transport-https ca-certificates libssl-dev wget libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils curl build-essential tar gzip findutils net-tools dnsutils telnet ngrep tcpdump
RUN apt-get install software-properties-common -y 
RUN add-apt-repository ppa:saiarcot895/chromium-dev

RUN apt update
RUN apt-get install -y chromium-browser


ENV NODE_VERSION 14.17.0

RUN ARCH=arm64 \
  && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-$ARCH.tar.xz" \
  && tar -xJf "node-v$NODE_VERSION-linux-$ARCH.tar.xz" -C /usr/local --strip-components=1 --no-same-owner \
  && rm "node-v$NODE_VERSION-linux-$ARCH.tar.xz" \
  && ln -s /usr/local/bin/node /usr/local/bin/nodejs \
  # smoke tests
  && node --version \
  && npm --version

RUN ARCH=arm64 \
  && npm install -g npm@7.20

ENTRYPOINT ["/bin/sh", "-c", "bash"]

Enter fullscreen mode Exit fullscreen mode
Collapse
 
jijonivan profile image
Ivan Jijon • Edited

Wonderful, thanks!
I had to add ENV PUPPETEER_SKIP_DOWNLOAD true to the env variables.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true wasn't enough.

Collapse
 
arielerv profile image
Ariel Erviti

Hi there, I know it's an old post, but it's still valid. I provide a config that works for image oraclelinux based on rh.

FROM oraclelinux:7-slim

RUN yum -y install oracle-nodejs-release-el7 oracle-instantclient-release-el7 wget unzip && \
    yum-config-manager --disable ol7_developer_nodejs\* && \
    yum-config-manager --enable ol7_developer_nodejs16 && \
    yum-config-manager --enable ol7_optional_latest && \
    yum -y install nodejs node-oracledb-node16 && \
    rm -rf /var/cache/yum/*

RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm && \
    yum install -y google-chrome-stable_current_x86_64.rpm

WORKDIR /srv/app/

COPY . /srv/app/.

RUN npm install

EXPOSE 3006

CMD ["node", "index.js"]
Enter fullscreen mode Exit fullscreen mode

And the lunch:

        const browser = await puppeteer.launch({
            executablePath: '/usr/bin/google-chrome',
            args: [
                '--disable-gpu',
                '--disable-dev-shm-usage',
                '--disable-setuid-sandbox',
                '--no-sandbox'
            ]
        });
Enter fullscreen mode Exit fullscreen mode