Alex Spinov

Posted on Mar 25

5 Free Web Scraping Tools That Replace Bright Data (With Real Code)

#python #webdev #tutorial #productivity

I spent 3 months scraping data for market research projects. My client wanted pricing data from 50 e-commerce sites — updated daily.

Bright Data quoted me $500/month for residential proxies.

I'm a solo developer. That's my entire monthly tool budget.

So I built the entire pipeline with free tools. Here's exactly what I used, with code you can copy.

1. Crawlee — The Swiss Army Knife

Crawlee is what I wish existed 5 years ago. It handles anti-bot detection, proxy rotation, and browser fingerprinting — all built-in.

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: 100,
  async requestHandler({ page, request, enqueueLinks }) {
    const title = await page.title();
    const price = await page.$eval('.price', el => el.textContent);
    console.log(`${request.url}: ${price}`);
    await enqueueLinks({ globs: ['https://example.com/product/*'] });
  },
});

await crawler.run(['https://example.com/products']);

Why it replaces Bright Data: Crawlee's built-in SessionPool rotates browser fingerprints automatically. For most sites, you don't need residential proxies at all — the fingerprint rotation alone gets you past basic bot detection.

Best for: JavaScript-rendered pages, sites with moderate anti-bot protection.

2. Scrapy + scrapy-playwright — The Production Workhorse

If you're scraping at scale (10K+ pages/day), Scrapy is still king. Add scrapy-playwright for JavaScript-heavy sites.

import scrapy

class PriceSpider(scrapy.Spider):
    name = 'prices'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'https': 'scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler',
        },
        'CONCURRENT_REQUESTS': 16,
        'AUTOTHROTTLE_ENABLED': True,
    }

    def start_requests(self):
        urls = open('urls.txt').read().splitlines()
        for url in urls:
            yield scrapy.Request(url, meta={'playwright': True})

    def parse(self, response):
        yield {
            'url': response.url,
            'price': response.css('.price::text').get(),
            'name': response.css('h1::text').get(),
        }

Why it replaces Bright Data: Scrapy's AutoThrottle + concurrent requests handle rate limiting intelligently. Combined with free proxy lists from free-proxy-list.net, I scrape 50K pages/day without paying a cent.

Best for: Large-scale scraping, data pipelines, production systems.

3. Playwright Stealth — When Sites Fight Back

Some sites (think Amazon, LinkedIn, Zillow) use advanced bot detection. playwright-stealth patches Playwright to look like a real browser.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    stealth_sync(page)

    page.goto('https://example.com/protected-page')

    # Now you look like a real Chrome user
    data = page.query_selector_all('.product-card')
    for item in data:
        print(item.inner_text())

    browser.close()

Why it replaces Bright Data: Bright Data's "unlocker" is essentially a managed version of stealth browsing + proxy rotation. With playwright-stealth, you get the stealth part for free. Combine with rotating free proxies and you cover 80% of use cases.

Best for: Sites with fingerprinting detection, login-required scraping.

4. Apify Free Tier — Cloud Scraping Without Infrastructure

Apify gives you $5/month free credits — enough to run lightweight scrapers. I use it for scheduled daily scrapes that would otherwise need a VPS.

import { Actor } from 'apify';

await Actor.init();

const { page } = await Actor.launchPlaywrightBrowser();
await page.goto('https://example.com');

const results = await page.$$eval('.item', items =>
  items.map(item => ({
    name: item.querySelector('.name')?.textContent,
    price: item.querySelector('.price')?.textContent,
  }))
);

await Actor.pushData(results);
await Actor.exit();

Why it replaces Bright Data: For small-to-medium projects, Apify's free tier + their proxy infrastructure handles everything. You get managed browser instances, automatic retries, and result storage — no server setup.

Best for: Scheduled scrapes, small datasets (<10K items), MVPs.

5. curl-impersonate — The Lightweight Option

Sometimes you don't need a browser at all. curl-impersonate makes HTTP requests that look identical to Chrome or Firefox.

# Install
brew install curl-impersonate

# Scrape like Chrome
curl_chrome116 'https://example.com/api/products' \
  -H 'Accept: application/json' | jq '.products[].price'

Or in Python:

import curl_cffi.requests as requests

response = requests.get(
    'https://example.com/api/products',
    impersonate='chrome'
)

for product in response.json()['products']:
    print(f"{product['name']}: ${product['price']}")

Why it replaces Bright Data: Many APIs that block requests or axios work perfectly with curl-impersonate. It's 100x faster than browser scraping and uses almost no resources. For API-based sites, this is all you need.

Best for: API scraping, speed-critical tasks, resource-constrained environments.

When You Actually Need Bright Data

Let me be honest — these free tools don't cover everything:

Massive scale (100K+ requests/day) → you'll eventually need paid proxies
Sites with CAPTCHA on every request → no free solution handles this well
Real-time data from heavily protected sites → residential proxies are the only reliable option

But for 80% of scraping projects? These five tools are all you need.

My Stack (What I Actually Use Daily)

Task	Tool	Cost
JS-heavy sites	Crawlee	Free
Large datasets	Scrapy + playwright	Free
Anti-bot sites	playwright-stealth	Free
Scheduled scrapes	Apify free tier	$0-5/mo
API scraping	curl-impersonate	Free
Total		$0-5/mo

Compare that to Bright Data's $500/month minimum.

Want a complete list of 100+ web scraping tools? I maintain an open-source collection: awesome-web-scraping-2026 — frameworks, proxies, anti-detection, and cloud platforms. All free.

What tools do you use for web scraping? Have you found anything better? Drop a comment below 👇

If you're building scraping pipelines and need help, I'm available for consulting — check my profile for contact details.

Need web scraping or data extraction? I've built 88 production scrapers. Email spinov001@gmail.com — quote in 2 hours. Or try my ready-made Apify actors — no code needed.

Top comments (1)

yaron been • May 19

solid list of free tools. crawlee + scrapy-playwright is a legit combo for sites with moderate protection..
I also liked the "when you actually need bright data" section at the end is honest and i'd agree with it. the 80% of projects that work with free tools is real. but the 20% that doesn't is where most production scraping lives, and that's where the paid tools earn their keep. maintaining playwright-stealth scripts against cloudflare updates is its own full-time job once you're past a few hundred requests/day