DEV Community

Alex Spinov
Alex Spinov

Posted on

Crawlee Has a Free Framework — Here's How to Build Production Web Scrapers in Node.js

I've built scrapers with Puppeteer, Playwright, Cheerio, and Axios. Each time I'd rebuild the same things: request queue, retry logic, proxy rotation, error handling. Then I found Crawlee — all of that comes built-in.

What Crawlee Offers

Crawlee (open source, free):

  • Built-in request queue — persistent, resumable
  • Auto-retry — failed requests retry automatically
  • Proxy rotation — built-in proxy management
  • Multiple crawlers — HTTP (Cheerio), Playwright, Puppeteer
  • Auto-scaling — adjusts concurrency based on system load
  • Storage — datasets, key-value stores, request queues
  • TypeScript-first — full type safety
  • Apify integration — deploy to Apify with zero changes

Quick Start

npx crawlee create my-scraper
cd my-scraper
npm start
Enter fullscreen mode Exit fullscreen mode

HTTP Scraper (Fast)

import { CheerioCrawler, Dataset } from 'crawlee';

const crawler = new CheerioCrawler({
  maxConcurrency: 10,
  maxRequestRetries: 3,

  async requestHandler({ request, $, enqueueLinks }) {
    // Extract data
    const title = $('h1').text().trim();
    const price = $('.price').text().trim();
    const description = $('meta[name=description]').attr('content');

    // Save to dataset
    await Dataset.pushData({
      url: request.url,
      title,
      price,
      description,
      scrapedAt: new Date().toISOString()
    });

    // Follow links (auto-queued)
    await enqueueLinks({
      selector: '.product-link',
      label: 'PRODUCT'
    });
  }
});

await crawler.run(['https://example-shop.com/products']);

// Export results
const dataset = await Dataset.open();
await dataset.exportToCSV('products');
Enter fullscreen mode Exit fullscreen mode

Browser Scraper (JavaScript Sites)

import { PlaywrightCrawler, Dataset } from 'crawlee';

const crawler = new PlaywrightCrawler({
  maxConcurrency: 5,
  launchContext: {
    launchOptions: { headless: true }
  },

  async requestHandler({ page, request, enqueueLinks }) {
    // Wait for dynamic content
    await page.waitForSelector('.product-card');

    // Extract data from rendered page
    const products = await page.$$eval('.product-card', cards =>
      cards.map(card => ({
        name: card.querySelector('h2')?.textContent?.trim(),
        price: card.querySelector('.price')?.textContent?.trim(),
        rating: card.querySelector('.stars')?.getAttribute('data-rating')
      }))
    );

    await Dataset.pushData(products);

    // Handle pagination
    const nextButton = await page.$('.pagination .next');
    if (nextButton) {
      await enqueueLinks({ selector: '.pagination .next a' });
    }
  }
});

await crawler.run(['https://spa-shop.com/products']);
Enter fullscreen mode Exit fullscreen mode

Proxy Rotation

import { CheerioCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfig = new ProxyConfiguration({
  proxyUrls: [
    'http://user:pass@proxy1.com:8080',
    'http://user:pass@proxy2.com:8080',
    'http://user:pass@proxy3.com:8080'
  ]
});

const crawler = new CheerioCrawler({
  proxyConfiguration: proxyConfig,
  // Crawlee auto-rotates proxies and retries on failure

  async requestHandler({ request, $ }) {
    // Scraping logic
  }
});
Enter fullscreen mode Exit fullscreen mode

Error Handling & Retries

const crawler = new CheerioCrawler({
  maxRequestRetries: 5,

  async requestHandler({ request, $ }) {
    const data = extractData($);
    if (!data.title) throw new Error('Missing title — will retry');
    await Dataset.pushData(data);
  },

  async failedRequestHandler({ request, error }) {
    console.log(`Failed after 5 retries: ${request.url}`);
    console.log(`Error: ${error.message}`);
    // Log failed URLs for manual review
  }
});
Enter fullscreen mode Exit fullscreen mode

Deploy to Apify (One Command)

# Your Crawlee scraper becomes an Apify Actor
apify init
apify push
# Now available on Apify Store with API, scheduling, and monitoring
Enter fullscreen mode Exit fullscreen mode

Why Crawlee

Crawlee Raw Puppeteer/Playwright
Auto-retry built in Build retry logic
Request queue persistent In-memory queue
Proxy rotation built in Manual proxy handling
Auto-scaling Manual concurrency
Export to CSV/JSON Build export logic
TypeScript types Add types yourself

Want ready-made scrapers? Check out my web scraping actors on Apify — pre-built for Google, Amazon, Reddit, and 70+ sites.

Need a custom scraper? Email me at spinov001@gmail.com — I build production scrapers with Crawlee.

Top comments (0)