TL;DR
Crawlee is an open-source web scraping framework by Apify that handles browser fingerprinting, proxy rotation, and request queuing automatically. It supports HTTP, Cheerio, Playwright, and Puppeteer crawlers — all with the same unified interface.
What Is Crawlee?
Crawlee is the most complete scraping framework:
- Anti-bot protection — automatic fingerprinting, headers, proxy rotation
- Multiple crawlers — HTTP, Cheerio, Playwright, Puppeteer
- Request queue — handles millions of URLs with retry logic
- Auto-scaling — adjusts concurrency based on system resources
- Session management — rotates sessions to avoid blocks
- Storage — built-in dataset and key-value storage
- Free — Apache 2.0
Quick Start
npx crawlee create my-scraper
cd my-scraper
npm start
HTTP Crawler (Fastest)
import { HttpCrawler } from "crawlee";
const crawler = new HttpCrawler({
maxRequestsPerCrawl: 100,
async requestHandler({ request, body, log }) {
log.info(`Processing ${request.url}`);
// body is raw HTML string
console.log(body.substring(0, 200));
},
});
await crawler.run(["https://example.com"]);
Cheerio Crawler (HTML Parsing)
import { CheerioCrawler, Dataset } from "crawlee";
const crawler = new CheerioCrawler({
async requestHandler({ $, request, enqueueLinks }) {
// jQuery-like selectors with Cheerio
const title = $("h1").text();
const price = $(".price").text();
const description = $(".description").text();
// Save to dataset
await Dataset.pushData({
url: request.url,
title,
price,
description,
});
// Follow links (auto-deduplication)
await enqueueLinks({
globs: ["https://example.com/products/*"],
});
},
maxRequestsPerCrawl: 1000,
maxConcurrency: 10,
});
await crawler.run(["https://example.com/products"]);
// Export results
const dataset = await Dataset.open();
await dataset.exportToCSV("results");
Playwright Crawler (JavaScript-Heavy Sites)
import { PlaywrightCrawler, Dataset } from "crawlee";
const crawler = new PlaywrightCrawler({
launchContext: {
launchOptions: { headless: true },
},
async requestHandler({ page, request, enqueueLinks }) {
// Wait for dynamic content
await page.waitForSelector(".product-card");
// Extract data from rendered page
const products = await page.$$eval(".product-card", (cards) =>
cards.map((card) => ({
name: card.querySelector(".name")?.textContent?.trim(),
price: card.querySelector(".price")?.textContent?.trim(),
image: card.querySelector("img")?.src,
}))
);
await Dataset.pushData(products);
// Handle infinite scroll
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(2000);
// Follow pagination
await enqueueLinks({ selector: ".next-page" });
},
});
await crawler.run(["https://example.com"]);
Proxy Rotation
import { CheerioCrawler, ProxyConfiguration } from "crawlee";
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
"http://user:pass@proxy1.com:8080",
"http://user:pass@proxy2.com:8080",
"http://user:pass@proxy3.com:8080",
],
});
const crawler = new CheerioCrawler({
proxyConfiguration,
sessionPoolOptions: {
maxPoolSize: 100,
sessionOptions: {
maxUsageCount: 50, // rotate after 50 uses
},
},
async requestHandler({ $ }) {
// Proxy is automatically rotated per request
},
});
Crawlee vs Alternatives
| Feature | Crawlee | Scrapy | Puppeteer | Selenium |
|---|---|---|---|---|
| Language | TypeScript | Python | JavaScript | Multi |
| Anti-bot | Built-in | Manual | Manual | Manual |
| Browser support | Playwright+Puppeteer | Splash | Chrome | Multi |
| Request queue | Built-in | Built-in | Manual | Manual |
| Auto-scaling | Yes | Scrapy Cloud | Manual | Manual |
| Session rotation | Built-in | Manual | Manual | Manual |
| Dataset storage | Built-in | Built-in | Manual | Manual |
Deploy to Apify
# Deploy your Crawlee scraper to Apify cloud
npx apify push
# Run in cloud with API
curl -X POST https://api.apify.com/v2/acts/YOUR_ACTOR/runs \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"startUrls": [{"url": "https://example.com"}]}'
Resources
- Crawlee Documentation
- GitHub Repository — 16K+ stars
- Examples
- Apify Platform — run Crawlee in the cloud
Ready to scrape at scale? Check out my production scrapers on Apify — built with Crawlee for reliable, scalable data extraction. Custom scraping solutions? Email spinov001@gmail.com
Top comments (0)