Crawlee is a production-ready web scraping and crawling library for Node.js.
What You Get for Free
- Multiple crawlers — Cheerio (fast HTML), Playwright (JS rendering), Puppeteer
- Auto-scaling — adjusts concurrency based on system resources
- Proxy rotation — built-in proxy management with session tracking
- Request queuing — persistent queue, handles millions of URLs
- Anti-blocking — fingerprint randomization, human-like behavior
- Data storage — built-in dataset and key-value store
- Error handling — automatic retries with exponential backoff
- TypeScript-first — full type safety
- Apify integration — deploy to Apify cloud with one command
Quick Start
npx crawlee create my-crawler
cd my-crawler && npm start
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
async requestHandler({ page, request, enqueueLinks }) {
const title = await page.title();
const price = await page.$eval('.price', el => el.textContent);
await Dataset.pushData({ url: request.url, title, price });
await enqueueLinks({ globs: ['https://example.com/products/*'] });
},
maxRequestsPerCrawl: 1000,
});
await crawler.run(['https://example.com/products']);
Why Developers Choose It Over Scrapy
Scrapy is Python-only and complex. Crawlee:
- TypeScript — type safety, modern async/await
- Playwright built-in — no separate browser setup
- Anti-blocking — fingerprint rotation included
- Apify cloud — deploy and scale with one command
A data team maintained 15 Scrapy spiders with custom retry logic, proxy rotation, and error handling. They rewrote them in Crawlee — anti-blocking and retries are built in, proxy rotation is automatic, and TypeScript caught bugs at compile time.
Need Custom Data Solutions?
I build production-grade scrapers and data pipelines for startups, agencies, and research teams.
Browse 88+ ready-made scrapers on Apify → — Reddit, HN, LinkedIn, Google, Amazon, and more.
Custom project? Email me: spinov001@gmail.com — fast turnaround, fair pricing.
Top comments (0)