Introduction
Puppeteer is a Node.js library which provides a high-level API to control Chromium (or Firefox) browsers over the DevTools Protocol.
This guide helps to use Puppeteer inside a Docker container using the Node.js image.
If we use the Docker images for Node.js v14 LTS Gallium, when installing the chromium
package from apt
, it will be v90.0, which can have compatibility issues with the latest Puppeteer. This is because it was tested with the latest Chromium stable release.
Selecting the correct image
Well... we want to run a web browser inside a container. it's important to know what are the different between the available variants.
Alpine is enough but ...
Yeah, we can run Chromium using Alpine Linux, but we'll need a few extra steps to make it run. That's why we prefer Debian variants to make it easier.
Which distro?
Every major version of Node.js in built over a version of Debian, and that Debian version comes with an old version of Chromium, which one could be not compatible with the latest version of Puppeteer.
Node.js | Debian | Chromium |
---|---|---|
v14 | 9.13 | 73.0.3683.75 |
v16 | 10.9 | 90.0.4430.212 |
v17 | 11.2 | 99.0.4844.84 |
To quickly solve that issue we can use the Google Chrome's Debian package that always installs the latest stable version. Therefore, this Dockerfile is compatible with Node.js v14, v16, or any new one.
Why not the built-in Chromium
When we install Google Chrome, apt
will install all the dependencies for us.
Dockerfile
FROM node:slim AS app
# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install your app here...
💡 If you are in an ARM-based CPU like Apple M1, you should use the --platform
argument when you build the Docker image.
docker build --platform linux/amd64 -t image-name .
The code config
Remember to use the installed browser instead of the Puppeteer's built-in one inside your app's code.
import puppeteer from 'puppeteer';
...
const browser = await puppeteer.launch({
executablePath: '/usr/bin/google-chrome',
args: [...] // if we need them.
});
Conclusion
The browser installation via apt will resolve the required dependencies to run a headless browser inside a Docker container without any manual intervention. These dependencies are not included in the Node.js Docker images by default.
The easiest path to use Puppeteer inside a Docker container is installing Google Chrome because, in contrast to the Chromium package offered by Debian, Chrome only offers the latest stable version.
Update 2022-08-24
This new Dockerfile version
FROM node:slim
# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install gnupg wget -y && \
wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
apt-get update && \
apt-get install google-chrome-stable -y --no-install-recommends && \
rm -rf /var/lib/apt/lists/*
Applies the following changes:
A. Removes the apt-key
deprecation warning.
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
B. Uses wget
because it's installed by google-chrome-stable
and it reduces a few MiB not installing curl
.
Latest comments (39)
Hi! I just wanted to share a new solution for this. I know it's a bit late, but maybe someone will find it useful.
Step 1:
Create a simple Express server that receives the HTML and returns the buffer.
Step 2:
Build the Docker image that uses Debian Bullseye (here puppeteer works better.) and install dependencies for chromium
Step 3:
I won’t go into detail on this part, but you’ll need to create a docker-compose.yml with a network. Then, in your backend (for example, I use NestJS), you connect to the Puppeteer service to generate the PDF.
(In my use case, I return the response as a download link to the frontend.)
Hope this helps! 🚀
For anyone having troubles here is my solution (That runs on my machine lul)
Dockerfile
script.js
Thank you, after so much searching I found the correct solution.
what is your solution, can you post ?
thanks for updating the article. 💗
Getting this error - ****
Error: Failed to launch the browser process!
[74:122:0316/214552.693705:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[0316/214552.821672:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Function not implemented (38)
Assertion failed: p_rcu_reader->depth != 0 (/qemu/include/qemu/rcu.h: rcu_read_unlock: 102)
TROUBLESHOOTING: pptr.dev/troubleshooting
at ChildProcess.onClose (/work/node_modules/@puppeteer/browsers/lib/cjs/launch.js:277:24)
at ChildProcess.emit (node:events:530:35)
at ChildProcess._handle.onexit (node:internal/child_process:294:12)
{"success":false,"data":"{\"status\":\"\",\"headers\":[],\"content\":\"\",\"trace\":\"PUPPET_LOG: INPUT_JSON = {\"url\":\"example.com\",\"user_agent\"...} StartLoading > ERROR > Cannot read properties of undefined (reading 'newPage')\"}"}TypeError: Cannot read properties of undefined (reading 'close')
at closeBrowser (/work/download_page_html.js:330:24)
at killProcess (/work/download_page_html.js:345:8)
at outputResult (/work/download_page_html.js:378:3)
at /work/download_page_html.js:367:4
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
My docker file looks like - FROM --platform=linux/amd64 node:20
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
ENV PUPPETEER_SKIP_DOWNLOAD true
RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /work
COPY package.json ./
RUN npm install
RUN npx puppeteer browsers install chrome
COPY app.js download_page_html.js crawler-browser.js start-crawler-browser.sh start-download-page-html.sh ./
EXPOSE 3000
And I am using Apple M1 laptop
As you working on puppeteer, and if you suffer from zombie process then use below docker commands , it will not create zombie process
FROM node:18-slim
RUN apt-get update
RUN apt-get upgrade
RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent dl-ssl.google.com/linux/linux_sign... | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get upgrade && apt-get install -y vim
ADD ./puppetron.tar /usr/share/
WORKDIR /usr/share/puppetron
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV SERVICE_PATH=/usr/share/puppetron
CMD node main.js;
And path of browser
executablePath: '/usr/bin/google-chrome',
also VERY important is to include --init when starting your container
this tutorial is great. but i have an issue, when i try to install chromium in node:18-alpine3.16, i add some command like this : RUN apk add --no-cache chromium
but it still not work, chromium not installed in container. any one has tutorials or some advice?
Can anyone give dockerfile to work with puppeteer under python? I can't find one working correctly anywhere.
Here is a working docker file for an arm based docker image.
Wonderful, thanks!
I had to add
ENV PUPPETEER_SKIP_DOWNLOAD true
to the env variables.ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
wasn't enough.Hi there, I know it's an old post, but it's still valid. I provide a config that works for image oraclelinux based on rh.
And the lunch: