DEV Community

Alex Spinov
Alex Spinov

Posted on

5 Free Web Scraping Tools That Replace Bright Data (With Real Code)

I spent 3 months scraping data for market research projects. My client wanted pricing data from 50 e-commerce sites — updated daily.

Bright Data quoted me $500/month for residential proxies.

I'm a solo developer. That's my entire monthly tool budget.

So I built the entire pipeline with free tools. Here's exactly what I used, with code you can copy.


1. Crawlee — The Swiss Army Knife

Crawlee is what I wish existed 5 years ago. It handles anti-bot detection, proxy rotation, and browser fingerprinting — all built-in.

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: 100,
  async requestHandler({ page, request, enqueueLinks }) {
    const title = await page.title();
    const price = await page.$eval('.price', el => el.textContent);
    console.log(`${request.url}: ${price}`);
    await enqueueLinks({ globs: ['https://example.com/product/*'] });
  },
});

await crawler.run(['https://example.com/products']);
Enter fullscreen mode Exit fullscreen mode

Why it replaces Bright Data: Crawlee's built-in SessionPool rotates browser fingerprints automatically. For most sites, you don't need residential proxies at all — the fingerprint rotation alone gets you past basic bot detection.

Best for: JavaScript-rendered pages, sites with moderate anti-bot protection.


2. Scrapy + scrapy-playwright — The Production Workhorse

If you're scraping at scale (10K+ pages/day), Scrapy is still king. Add scrapy-playwright for JavaScript-heavy sites.

import scrapy

class PriceSpider(scrapy.Spider):
    name = 'prices'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'https': 'scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler',
        },
        'CONCURRENT_REQUESTS': 16,
        'AUTOTHROTTLE_ENABLED': True,
    }

    def start_requests(self):
        urls = open('urls.txt').read().splitlines()
        for url in urls:
            yield scrapy.Request(url, meta={'playwright': True})

    def parse(self, response):
        yield {
            'url': response.url,
            'price': response.css('.price::text').get(),
            'name': response.css('h1::text').get(),
        }
Enter fullscreen mode Exit fullscreen mode

Why it replaces Bright Data: Scrapy's AutoThrottle + concurrent requests handle rate limiting intelligently. Combined with free proxy lists from free-proxy-list.net, I scrape 50K pages/day without paying a cent.

Best for: Large-scale scraping, data pipelines, production systems.


3. Playwright Stealth — When Sites Fight Back

Some sites (think Amazon, LinkedIn, Zillow) use advanced bot detection. playwright-stealth patches Playwright to look like a real browser.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    stealth_sync(page)

    page.goto('https://example.com/protected-page')

    # Now you look like a real Chrome user
    data = page.query_selector_all('.product-card')
    for item in data:
        print(item.inner_text())

    browser.close()
Enter fullscreen mode Exit fullscreen mode

Why it replaces Bright Data: Bright Data's "unlocker" is essentially a managed version of stealth browsing + proxy rotation. With playwright-stealth, you get the stealth part for free. Combine with rotating free proxies and you cover 80% of use cases.

Best for: Sites with fingerprinting detection, login-required scraping.


4. Apify Free Tier — Cloud Scraping Without Infrastructure

Apify gives you $5/month free credits — enough to run lightweight scrapers. I use it for scheduled daily scrapes that would otherwise need a VPS.

import { Actor } from 'apify';

await Actor.init();

const { page } = await Actor.launchPlaywrightBrowser();
await page.goto('https://example.com');

const results = await page.$$eval('.item', items =>
  items.map(item => ({
    name: item.querySelector('.name')?.textContent,
    price: item.querySelector('.price')?.textContent,
  }))
);

await Actor.pushData(results);
await Actor.exit();
Enter fullscreen mode Exit fullscreen mode

Why it replaces Bright Data: For small-to-medium projects, Apify's free tier + their proxy infrastructure handles everything. You get managed browser instances, automatic retries, and result storage — no server setup.

Best for: Scheduled scrapes, small datasets (<10K items), MVPs.


5. curl-impersonate — The Lightweight Option

Sometimes you don't need a browser at all. curl-impersonate makes HTTP requests that look identical to Chrome or Firefox.

# Install
brew install curl-impersonate

# Scrape like Chrome
curl_chrome116 'https://example.com/api/products' \
  -H 'Accept: application/json' | jq '.products[].price'
Enter fullscreen mode Exit fullscreen mode

Or in Python:

import curl_cffi.requests as requests

response = requests.get(
    'https://example.com/api/products',
    impersonate='chrome'
)

for product in response.json()['products']:
    print(f"{product['name']}: ${product['price']}")
Enter fullscreen mode Exit fullscreen mode

Why it replaces Bright Data: Many APIs that block requests or axios work perfectly with curl-impersonate. It's 100x faster than browser scraping and uses almost no resources. For API-based sites, this is all you need.

Best for: API scraping, speed-critical tasks, resource-constrained environments.


When You Actually Need Bright Data

Let me be honest — these free tools don't cover everything:

  • Massive scale (100K+ requests/day) → you'll eventually need paid proxies
  • Sites with CAPTCHA on every request → no free solution handles this well
  • Real-time data from heavily protected sites → residential proxies are the only reliable option

But for 80% of scraping projects? These five tools are all you need.


My Stack (What I Actually Use Daily)

Task Tool Cost
JS-heavy sites Crawlee Free
Large datasets Scrapy + playwright Free
Anti-bot sites playwright-stealth Free
Scheduled scrapes Apify free tier $0-5/mo
API scraping curl-impersonate Free
Total $0-5/mo

Compare that to Bright Data's $500/month minimum.


Want a complete list of 100+ web scraping tools? I maintain an open-source collection: awesome-web-scraping-2026 — frameworks, proxies, anti-detection, and cloud platforms. All free.

What tools do you use for web scraping? Have you found anything better? Drop a comment below 👇


If you're building scraping pipelines and need help, I'm available for consulting — check my profile for contact details.

Top comments (0)