This is a tutorial on how to set up Puppeteer to work on Ubuntu 18.04 using a Digital Ocean droplet. While I will be going through specific steps for Digital Ocean, most of these steps should work great for any web server or just Ubuntu linux box.
This post is going to mirror closely what I wrote about setting up puppeteer on Ubuntu 16.04.
Getting into it, after creating an account with Digital Ocean, you will want to create a new droplet. A droplet in this case is just a VM where you can setup whatever you’d like. When you select it, you should come to this screen. For our purposes, we can use the smallest droplet at only $5/mo.
If you are unfamiliar with doing any kind of cloud hosting, one of the beauties of it is that this $5/mo is the cost if you leave the droplet running all month. You can also scale up the power at any time. This is ideal for the purposes of web scraping. If we are just scraping something daily for like 10 – 15 minutes, we can just spin up the droplet for that time and then spin it down when we are done. It’s pretty great and makes it super economical.
I really feel that using an ssh key is the way to go for accessing your droplet. It’s a lot more secure and quicker since you don’t have to enter your password each time. I’m not going to go into how to do it here but Digital Ocean has a great article walking through it.
We will use a bash terminal to connect via ssh to our newly formed droplet. There are some various basic server setup that is recommended and Digital Ocean has another great article here about it.
Using the Citadel packaging integration test example because I know it contains some good puppeteer testing, we clone it with
git clone firstname.lastname@example.org:aarmora/jordan-does-integration-tests-on-citadel-packaging.git.
sudo apt-get update sudo apt-get install build-essential libssl-dev curl -sL https://raw.githubusercontent.com/creationix/nvm/v0.33.8/install.sh -o install_nvm.sh bash install_nvm.sh source ~/.profile nvm install 10
This will install node version 10.16.0. You can verify it by running
node -v. From here we just go into our scraper directory and run
Now if we were on a windows (and probably mac?) machine, we’d be good to go. Running even our
npm run start:ubuntu which should run Puppeteer ready for ubuntu will still throw something like the following error:
You will need to install some additional depedencies:
sudo apt-get install libx11-xcb1 libxcomposite1 libxi6 libxext6 libxtst6 libnss3 libcups2 libxss1 libxrandr2 libasound2 libpangocairo-1.0-0 libatk1.0-0 libatk-bridge2.0-0 libgtk-3-0
and then you should be good to go! Puppeteer is working like a charm!
In this post we want to take a closer look at data structures designed for low read overhead that are commonly used in practice, i.e. hash tables, red-black trees, and skip lists. This blog post is the second part of the RUM series.