DEV Community

Alex Spinov
Alex Spinov

Posted on

Crawlee Has a Free Web Scraping Framework — Build Reliable Scrapers with Auto-Retry and Proxy Rotation

Why Crawlee?

Crawlee (by Apify) is a web scraping framework with automatic retries, proxy rotation, request queuing, and both HTTP and browser-based scraping.

npx crawlee create my-scraper
cd my-scraper && npm start
Enter fullscreen mode Exit fullscreen mode

HTTP Scraping (Fast)

import { CheerioCrawler } from 'crawlee'

const crawler = new CheerioCrawler({
  async requestHandler({ request, $ }) {
    const title = $('h1').text()
    const price = $('.price').text()
    console.log({ url: request.url, title, price })
  },
})

await crawler.run(['https://example.com/product/1', 'https://example.com/product/2'])
Enter fullscreen mode Exit fullscreen mode

Browser Scraping (JavaScript-Heavy Sites)

import { PlaywrightCrawler } from 'crawlee'

const crawler = new PlaywrightCrawler({
  async requestHandler({ page, request }) {
    await page.waitForSelector('.product-list')
    const products = await page.$$eval('.product', (els) =>
      els.map((el) => ({
        name: el.querySelector('.name')?.textContent,
        price: el.querySelector('.price')?.textContent,
      }))
    )
    console.log(products)
  },
})

await crawler.run(['https://spa-site.com/products'])
Enter fullscreen mode Exit fullscreen mode

Auto-Retry + Proxy Rotation

import { CheerioCrawler, ProxyConfiguration } from 'crawlee'

const proxyConfig = new ProxyConfiguration({
  proxyUrls: ['http://proxy1:8080', 'http://proxy2:8080'],
})

const crawler = new CheerioCrawler({
  proxyConfiguration: proxyConfig,
  maxRequestRetries: 3,
  requestHandlerTimeoutSecs: 30,
  async requestHandler({ request, $ }) {
    // Auto-rotates proxies, auto-retries on failure
  },
})
Enter fullscreen mode Exit fullscreen mode

Save to Dataset

import { Dataset } from 'crawlee'

await Dataset.pushData({ title, price, url: request.url })
// Auto-saves to ./storage/datasets/default/
Enter fullscreen mode Exit fullscreen mode

Crawlee vs Puppeteer vs Playwright

Feature Crawlee Puppeteer Playwright
Auto-retry Yes No No
Proxy rotation Yes Manual Manual
Request queue Yes No No
Dataset storage Yes No No
HTTP + Browser Both Browser Browser

Need to extract data from any website at scale? I build custom web scrapers — 77 production scrapers running on Apify Store. Email me at spinov001@gmail.com for a tailored solution.

Top comments (0)