Puppeteer is a Node library that provides a high-level API to control Chromium, Chrome, or Firefox.
Cases
- Automatic account registration
- Scrap info from sites different difficulty
- Generate screenshots and PDF of pages
- Automatic tests of sites
The puppeteer is very powerful. He can do everything the same as a people, but we will only consider web-scrapping
Installation
By default, puppeteer comes with Chromium, but you can use another browser.
Create a folder for your project
mkdir puppeteer
init node project
yarn init
and install puppeteer with
yarn add puppeteer
Puppeteer is now installed, and we ready for coding.
Example
Create the main source file example.js
with this content:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
//by default puppeteer run in headless
//this option disable headless and you
//can view browser instead of headless
defaultViewport: null
//by default puppeteer run with non-default viewport
//this option enable your default viewport
});
//create puppeteer browser instance
//you can run more browsers with
//const browser2 = await puppeteer.launch();
const page = await browser.newPage();
//create page(tab)
//more pages with
//const page2 = await browser.newPage();
await page.goto('https://dev.to');
//just visit dev.to automatic
})();
And run with node example
. You can see Chromium browser with dev.to
But what is async
and await
? Each puppeteer method is promise and you can use with
const puppeteer = require('puppeteer');
puppeteer
.launch({
headless: false,
defaultViewport: null
})
.then(browser => browser.newPage())
.then(page => page.goto('https://dev.to'));
But the first example more comfortable, and I prefer to use it
Find selectors
To find the desired selector, you need to right-click on the element and click "Inspect". This requires basic knowledge of HTML and CSS. But you can use Firefox and extension SelectorsHub
Type and click
Ok, let's steal our IP from Google
await page.goto('https://google.com');
//just visit google.com automatic
await page.waitForSelector('.gLFyf.gsfi');
//wait for element with `.gLFyf.gsfi` selector
//is loaded
await page.type('.gLFyf.gsfi', 'what is my ip');
//type some text on `.gLFyf.gsfi` selector
await page.keyboard.press('Enter');
//press `enter` on page
await page.waitForSelector('span[style="font-size:20px"]');
//wait for element with `span[style="font-size:20px"]`
//selector is loaded
let ip = await page.$eval('span[style="font-size:20px"]', el => el.innerText)
//execude code `el.innerText` on element
//with `span[style="font-size:20px"]` selector
//and put innerText of element in variable
console.log(ip)
await browser.close();
//close browser
Save ip-google.js
file and run with node ip-google
. Few seconds later you can see your ip in console
Bonus. Understanding (async () => {})()
My first reaction when I saw (async () => {})()
was "wtf is this"
function someFunction() {}
//simple
Could it be shorter?
function () {}
//anonymous function
But how to use await
in function?
async function () {}
//async function
Could it be shorter?
async () => {}
//arrow function
Inline execute?
(async () => {})()
//execute
This function is asynchronous, allows await
, and is executed immediately. That's all
Bonus. Repo with code
All code from this guide hosted on GitHub
Top comments (0)