Setting up Puppeteer on Ubuntu 18.04 and Digital Ocean

#puppeteer #digitalocean #ubuntu

This is a tutorial on how to set up Puppeteer to work on Ubuntu 18.04 using a Digital Ocean droplet. While I will be going through specific steps for Digital Ocean, most of these steps should work great for any web server or just Ubuntu linux box.

This post is going to mirror closely what I wrote about setting up puppeteer on Ubuntu 16.04.

Getting into it, after creating an account with Digital Ocean, you will want to create a new droplet. A droplet in this case is just a VM where you can setup whatever you’d like. When you select it, you should come to this screen. For our purposes, we can use the smallest droplet at only $5/mo.

If you are unfamiliar with doing any kind of cloud hosting, one of the beauties of it is that this $5/mo is the cost if you leave the droplet running all month. You can also scale up the power at any time. This is ideal for the purposes of web scraping. If we are just scraping something daily for like 10 – 15 minutes, we can just spin up the droplet for that time and then spin it down when we are done. It’s pretty great and makes it super economical.

I really feel that using an ssh key is the way to go for accessing your droplet. It’s a lot more secure and quicker since you don’t have to enter your password each time. I’m not going to go into how to do it here but Digital Ocean has a great article walking through it.

We will use a bash terminal to connect via ssh to our newly formed droplet. There are some various basic server setup that is recommended and Digital Ocean has another great article here about it.

Cloning the repo

Using the Citadel packaging integration test example because I know it contains some good puppeteer testing, we clone it with git clone git@github.com:aarmora/jordan-does-integration-tests-on-citadel-packaging.git.

I’m now going to install node.js on this droplet so we can take advantage of all of the javascript that we need to get this going. Digital Ocean has yet another article on how to do this. The necessary commands you will need to run, however, are the following:

sudo apt-get update
sudo apt-get install build-essential libssl-dev

curl -sL https://raw.githubusercontent.com/creationix/nvm/v0.33.8/install.sh -o install_nvm.sh

bash install_nvm.sh
source ~/.profile

nvm install 10

This will install node version 10.16.0. You can verify it by running node -v. From here we just go into our scraper directory and run npm i.

Now if we were on a windows (and probably mac?) machine, we’d be good to go. Running even our npm run start:ubuntu which should run Puppeteer ready for ubuntu will still throw something like the following error:

You will need to install some additional depedencies:

sudo apt-get install libx11-xcb1 libxcomposite1 libxi6 libxext6 libxtst6 libnss3 libcups2 libxss1 libxrandr2 libasound2 libpangocairo-1.0-0 libatk1.0-0 libatk-bridge2.0-0 libgtk-3-0

and then you should be good to go! Puppeteer is working like a charm!