This is my first article in 2021. Recently, I got a task to capture screenshots of 300+ web pages and while doing this I learned Puppeteer APIs. In this article, I would like to share my experience with Puppeteer.
Before, I start with writing code. Let me explain to you about Puppeteer in short.
What is Puppeteer.
Puppeteer is a Node library backed by Google. It provides a high-level API to control headless Chrome or Chromium by using DevTools protocols. This means with Puppeteer we can capture screenshots and PDFs of web pages, run our e2e test cases, and diagnose performance-related issues, etc.
Let's write some code...
Installation
To use puppeteer, you need to install the Node.js module through npm or yarn.
npm i puppeteer
Note: When you install Puppeteer, it downloads a recent version of Chromium so it may take a long time based on you're network speed.
Capture GitHub profile screenshot
Here, is the bare minimum code for capturing a screenshot of my GitHub profile.
// require fs and puppeteer
const fs = require("fs");
const puppeteer = require("puppeteer");
async function captureScreenshot() {
// if screenshots directory is not exist then create one
if (!fs.existsSync("screenshots")) {
fs.mkdirSync("screenshots");
}
let browser = null;
try {
// launch headless Chromium browser
browser = await puppeteer.launch({ headless: true });
// create new page object
const page = await browser.newPage();
// set viewport width and height
await page.setViewport({ width: 1440, height: 1080 });
await page.goto("https://github.com/sagar-gavhane");
// capture screenshot and store it into screenshots directory.
await page.screenshot({ path: `screenshots/github-profile.jpeg` });
} catch (err) {
console.log(`❌ Error: ${err.message}`);
} finally {
await browser.close();
console.log(`\n🎉 GitHub profile screenshots captured.`);
}
}
captureScreenshot();
Capture multiple screenshots
What if you've to take screenshots of many web pages with a puppeteer. Below is a list of pages defined in the pages.json
file.
[
{
"id": "c1472465-ede8-4376-853c-39274242aa69",
"url": "https://github.com/microsoft/vscode",
"name": "VSCode"
},
{
"id": "6b08743e-9454-4829-ab3a-91ad2ce9a6ac",
"url": "https://github.com/vuejs/vue",
"name": "vue"
},
{
"id": "08923d12-caf2-4d5e-ba41-3019a9afbf9b",
"url": "https://github.com/tailwindlabs/tailwindcss",
"name": "tailwindcss"
},
{
"id": "daeacf42-1ab9-4329-8f41-26e7951b69cc",
"url": "https://github.com/getify/You-Dont-Know-JS",
"name": "You Dont Know JS"
}
]
I just tweaked the above captureScreenshot()
function to iterate over pages array and on every iteration visit page.url
and capture screenshot. That's it.
const fs = require("fs");
const puppeteer = require("puppeteer");
const pages = require("./pages.json");
async function captureMultipleScreenshots() {
if (!fs.existsSync("screenshots")) {
fs.mkdirSync("screenshots");
}
let browser = null;
try {
// launch headless Chromium browser
browser = await puppeteer.launch({
headless: true,
});
// create new page object
const page = await browser.newPage();
// set viewport width and height
await page.setViewport({
width: 1440,
height: 1080,
});
for (const { id, name, url } of pages) {
await page.goto(url);
await page.screenshot({ path: `screenshots/${id}.jpeg` });
console.log(`✅ ${name} - (${url})`);
}
} catch (err) {
console.log(`❌ Error: ${err.message}`);
} finally {
if (browser) {
await browser.close();
}
console.log(`\n🎉 ${pages.length} screenshots captured.`);
}
}
captureMultipleScreenshots();
Discussion (4)
Make this into a serverless function and viola! Nice and handy API to get screenshots
Yes, I built one serverless function for my website but running this function takes time so it will drastically increase the cost.
What about a 5$ digital ocean droplet? How many users can it handle (concurrently or otherwise) if you optimize it for this purpose?
Yep, like the author says, it will take time and increase cost.