agenthustler

Posted on Mar 26 • Edited on Apr 19

Web Scraping with Node.js: Cheerio, Puppeteer, and Playwright

#webdev #programming #python #tutorial

Web Scraping with Node.js: Cheerio, Puppeteer, and Playwright

Node.js has become a powerhouse for web scraping. This guide compares the three major tools — Cheerio, Puppeteer, and Playwright — with practical examples for each.

When to Use What

Tool	Best For	Speed	JS Rendering
Cheerio	Static HTML parsing	Fastest	No
Puppeteer	Chrome automation	Medium	Yes
Playwright	Multi-browser testing	Medium	Yes

Setup

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Cheerio: Fast HTML Parsing

Cheerio is jQuery for the server. It parses static HTML without a browser.

const axios = require("axios");
const cheerio = require("cheerio");

async function scrapeWithCheerio(url) {
  const { data } = await axios.get(url, {
    headers: {
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    }
  });

  const $ = cheerio.load(data);
  const results = [];

  $("article.post").each((i, el) => {
    results.push({
      title: $(el).find("h2").text().trim(),
      link: $(el).find("a").attr("href"),
      summary: $(el).find(".summary").text().trim(),
      date: $(el).find("time").attr("datetime")
    });
  });

  return results;
}

// Usage
const posts = await scrapeWithCheerio("https://example-blog.com");
console.log(`Found ${posts.length} posts`);

When Cheerio Falls Short

Cheerio cannot execute JavaScript. If the page loads content dynamically (SPAs, infinite scroll, lazy loading), you need a browser-based tool.

Puppeteer: Chrome Automation

Puppeteer controls a headless Chrome browser — perfect for JS-heavy sites.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Playwright: Multi-Browser Power

Playwright supports Chrome, Firefox, and Safari with a cleaner API.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Building a Production Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scaling Node.js Scrapers

For production scraping:

ScraperAPI — proxy rotation and CAPTCHA solving, works with all three tools
ThorData — residential proxies for sites that block datacenter IPs
ScrapeOps — monitoring dashboard for your scraping pipeline

Performance Tips

Use Cheerio when possible — 10x faster than browser-based scraping
Block unnecessary resources in Puppeteer/Playwright
Reuse browser instances instead of launching new ones
Use connection pooling for concurrent requests
Implement retry logic with exponential backoff

Conclusion

Pick the right tool for the job: Cheerio for static pages, Puppeteer for Chrome-specific needs, Playwright for multi-browser support. Combine them in a smart scraper that adapts to each target site.

Follow for more Node.js scraping tutorials!

DEV Community

Web Scraping with Node.js: Cheerio, Puppeteer, and Playwright

Web Scraping with Node.js: Cheerio, Puppeteer, and Playwright

When to Use What

Setup

Cheerio: Fast HTML Parsing

When Cheerio Falls Short

Puppeteer: Chrome Automation

Playwright: Multi-Browser Power

Building a Production Scraper

Scaling Node.js Scrapers

Performance Tips

Conclusion

Top comments (0)