Why Crawlee?
Crawlee (by Apify) is a web scraping framework with automatic retries, proxy rotation, request queuing, and both HTTP and browser-based scraping.
npx crawlee create my-scraper
cd my-scraper && npm start
HTTP Scraping (Fast)
import { CheerioCrawler } from 'crawlee'
const crawler = new CheerioCrawler({
async requestHandler({ request, $ }) {
const title = $('h1').text()
const price = $('.price').text()
console.log({ url: request.url, title, price })
},
})
await crawler.run(['https://example.com/product/1', 'https://example.com/product/2'])
Browser Scraping (JavaScript-Heavy Sites)
import { PlaywrightCrawler } from 'crawlee'
const crawler = new PlaywrightCrawler({
async requestHandler({ page, request }) {
await page.waitForSelector('.product-list')
const products = await page.$$eval('.product', (els) =>
els.map((el) => ({
name: el.querySelector('.name')?.textContent,
price: el.querySelector('.price')?.textContent,
}))
)
console.log(products)
},
})
await crawler.run(['https://spa-site.com/products'])
Auto-Retry + Proxy Rotation
import { CheerioCrawler, ProxyConfiguration } from 'crawlee'
const proxyConfig = new ProxyConfiguration({
proxyUrls: ['http://proxy1:8080', 'http://proxy2:8080'],
})
const crawler = new CheerioCrawler({
proxyConfiguration: proxyConfig,
maxRequestRetries: 3,
requestHandlerTimeoutSecs: 30,
async requestHandler({ request, $ }) {
// Auto-rotates proxies, auto-retries on failure
},
})
Save to Dataset
import { Dataset } from 'crawlee'
await Dataset.pushData({ title, price, url: request.url })
// Auto-saves to ./storage/datasets/default/
Crawlee vs Puppeteer vs Playwright
| Feature | Crawlee | Puppeteer | Playwright |
|---|---|---|---|
| Auto-retry | Yes | No | No |
| Proxy rotation | Yes | Manual | Manual |
| Request queue | Yes | No | No |
| Dataset storage | Yes | No | No |
| HTTP + Browser | Both | Browser | Browser |
Need to extract data from any website at scale? I build custom web scrapers — 77 production scrapers running on Apify Store. Email me at spinov001@gmail.com for a tailored solution.
Top comments (0)