Puppeteer Has a Free API — Browser Automation by Google

#puppeteer #javascript #webdev #scraping

Puppeteer is Google's free Node.js library for controlling Chrome/Chromium — the tool behind every serious web scraping and browser automation project.

Why Puppeteer?

Official Google project — maintained by the Chrome DevTools team
Headless and headed modes — run without visible browser for speed
Full Chrome DevTools Protocol — access everything Chrome can do
PDF generation — render any page to pixel-perfect PDF
Screenshot API — full page or element-level captures
Network interception — modify requests and responses on the fly

Quick Start

# Install (downloads Chromium automatically)
npm install puppeteer

# Or without bundled browser (use your own Chrome)
npm install puppeteer-core

Web Scraping Example

import puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto("https://news.ycombinator.com");

// Extract all story titles and links
const stories = await page.$$eval(".titleline > a", (links) =>
  links.map((a) => ({
    title: a.textContent,
    url: a.href,
  }))
);

console.log(`Found ${stories.length} stories`);
stories.forEach((s) => console.log(`${s.title}: ${s.url}`));

await browser.close();

PDF Generation (Free Alternative to Paid APIs)

import puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Render any HTML to PDF
await page.setContent(`
  <html>
    <style>
      body { font-family: Arial; padding: 40px; }
      h1 { color: #2563eb; }
      .invoice-table { width: 100%; border-collapse: collapse; }
      .invoice-table td { padding: 8px; border-bottom: 1px solid #eee; }
    </style>
    <h1>Invoice #1234</h1>
    <table class="invoice-table">
      <tr><td>Web Scraping Service</td><td>$500</td></tr>
      <tr><td>Data Cleaning</td><td>$200</td></tr>
      <tr><td><strong>Total</strong></td><td><strong>$700</strong></td></tr>
    </table>
  </html>
`);

await page.pdf({
  path: "invoice.pdf",
  format: "A4",
  printBackground: true,
  margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" },
});

await browser.close();

Screenshot Automation

import puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Set viewport for consistent screenshots
await page.setViewport({ width: 1920, height: 1080 });
await page.goto("https://example.com", { waitUntil: "networkidle2" });

// Full page screenshot
await page.screenshot({ path: "full-page.png", fullPage: true });

// Element screenshot
const element = await page.$("header");
await element.screenshot({ path: "header.png" });

// Specific area
await page.screenshot({
  path: "hero-section.png",
  clip: { x: 0, y: 0, width: 1920, height: 600 },
});

await browser.close();

Form Automation

import puppeteer from "puppeteer";

const browser = await puppeteer.launch({ headless: false }); // Visible for demo
const page = await browser.newPage();

await page.goto("https://example.com/signup");

// Type with realistic delays
await page.type("#name", "John Doe", { delay: 50 });
await page.type("#email", "john@example.com", { delay: 50 });
await page.type("#password", "SecureP@ss123", { delay: 50 });

// Select dropdown
await page.select("#country", "US");

// Check checkbox
await page.click("#terms");

// Click submit and wait for navigation
await Promise.all([
  page.waitForNavigation(),
  page.click("#submit-btn"),
]);

console.log("Current URL:", page.url());
await browser.close();

Network Interception

import puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();

// Enable request interception
await page.setRequestInterception(true);

page.on("request", (req) => {
  // Block images and CSS for faster scraping
  if (["image", "stylesheet", "font"].includes(req.resourceType())) {
    req.abort();
  } else {
    req.continue();
  }
});

// Track API responses
page.on("response", async (res) => {
  if (res.url().includes("/api/")) {
    const data = await res.json().catch(() => null);
    if (data) console.log("API Response:", data);
  }
});

await page.goto("https://example.com");
await browser.close();

Stealth Mode (Avoid Bot Detection)

import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();

// Set realistic user agent
await page.setUserAgent(
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
);

// Set realistic viewport
await page.setViewport({ width: 1366, height: 768 });

await page.goto("https://example.com");

Puppeteer vs Playwright vs Selenium

Feature	Puppeteer	Playwright	Selenium
Browsers	Chrome/Chromium	Chrome, Firefox, Safari	All
Language	Node.js	Node/Python/Java/C#	All major
Speed	Fast	Fastest	Slowest
Auto-wait	Partial	Full	None
Stealth plugins	Yes	Limited	Limited
PDF generation	Excellent	Good	None
Maintained by	Google	Microsoft	Community

Need to scrape data from any website and get it in structured JSON? Check out my web scraping tools on Apify — no coding required, results in minutes.

Have a custom data extraction project? Email me at spinov001@gmail.com — I build tailored scraping solutions for businesses.

DEV Community