Puppeteer is Google's free Node.js library for controlling Chrome/Chromium — the tool behind every serious web scraping and browser automation project.
Why Puppeteer?
- Official Google project — maintained by the Chrome DevTools team
- Headless and headed modes — run without visible browser for speed
- Full Chrome DevTools Protocol — access everything Chrome can do
- PDF generation — render any page to pixel-perfect PDF
- Screenshot API — full page or element-level captures
- Network interception — modify requests and responses on the fly
Quick Start
# Install (downloads Chromium automatically)
npm install puppeteer
# Or without bundled browser (use your own Chrome)
npm install puppeteer-core
Web Scraping Example
import puppeteer from "puppeteer";
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://news.ycombinator.com");
// Extract all story titles and links
const stories = await page.$$eval(".titleline > a", (links) =>
links.map((a) => ({
title: a.textContent,
url: a.href,
}))
);
console.log(`Found ${stories.length} stories`);
stories.forEach((s) => console.log(`${s.title}: ${s.url}`));
await browser.close();
PDF Generation (Free Alternative to Paid APIs)
import puppeteer from "puppeteer";
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Render any HTML to PDF
await page.setContent(`
<html>
<style>
body { font-family: Arial; padding: 40px; }
h1 { color: #2563eb; }
.invoice-table { width: 100%; border-collapse: collapse; }
.invoice-table td { padding: 8px; border-bottom: 1px solid #eee; }
</style>
<h1>Invoice #1234</h1>
<table class="invoice-table">
<tr><td>Web Scraping Service</td><td>$500</td></tr>
<tr><td>Data Cleaning</td><td>$200</td></tr>
<tr><td><strong>Total</strong></td><td><strong>$700</strong></td></tr>
</table>
</html>
`);
await page.pdf({
path: "invoice.pdf",
format: "A4",
printBackground: true,
margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" },
});
await browser.close();
Screenshot Automation
import puppeteer from "puppeteer";
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set viewport for consistent screenshots
await page.setViewport({ width: 1920, height: 1080 });
await page.goto("https://example.com", { waitUntil: "networkidle2" });
// Full page screenshot
await page.screenshot({ path: "full-page.png", fullPage: true });
// Element screenshot
const element = await page.$("header");
await element.screenshot({ path: "header.png" });
// Specific area
await page.screenshot({
path: "hero-section.png",
clip: { x: 0, y: 0, width: 1920, height: 600 },
});
await browser.close();
Form Automation
import puppeteer from "puppeteer";
const browser = await puppeteer.launch({ headless: false }); // Visible for demo
const page = await browser.newPage();
await page.goto("https://example.com/signup");
// Type with realistic delays
await page.type("#name", "John Doe", { delay: 50 });
await page.type("#email", "john@example.com", { delay: 50 });
await page.type("#password", "SecureP@ss123", { delay: 50 });
// Select dropdown
await page.select("#country", "US");
// Check checkbox
await page.click("#terms");
// Click submit and wait for navigation
await Promise.all([
page.waitForNavigation(),
page.click("#submit-btn"),
]);
console.log("Current URL:", page.url());
await browser.close();
Network Interception
import puppeteer from "puppeteer";
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
page.on("request", (req) => {
// Block images and CSS for faster scraping
if (["image", "stylesheet", "font"].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
// Track API responses
page.on("response", async (res) => {
if (res.url().includes("/api/")) {
const data = await res.json().catch(() => null);
if (data) console.log("API Response:", data);
}
});
await page.goto("https://example.com");
await browser.close();
Stealth Mode (Avoid Bot Detection)
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();
// Set realistic user agent
await page.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
);
// Set realistic viewport
await page.setViewport({ width: 1366, height: 768 });
await page.goto("https://example.com");
Puppeteer vs Playwright vs Selenium
| Feature | Puppeteer | Playwright | Selenium |
|---|---|---|---|
| Browsers | Chrome/Chromium | Chrome, Firefox, Safari | All |
| Language | Node.js | Node/Python/Java/C# | All major |
| Speed | Fast | Fastest | Slowest |
| Auto-wait | Partial | Full | None |
| Stealth plugins | Yes | Limited | Limited |
| PDF generation | Excellent | Good | None |
| Maintained by | Microsoft | Community |
Need to scrape data from any website and get it in structured JSON? Check out my web scraping tools on Apify — no coding required, results in minutes.
Have a custom data extraction project? Email me at spinov001@gmail.com — I build tailored scraping solutions for businesses.
Top comments (0)