DEV Community

agenthustler
agenthustler

Posted on • Edited on

Web Scraping with Python: requests vs Playwright vs Scrapy — Which Should You Use?

Every Python web scraping tutorial starts with a different tool. Some use requests, others jump straight to Scrapy, and newer ones reach for Playwright. They're all valid — but they solve different problems.

I've used all three extensively. Here's when each one makes sense, where each one falls apart, and how to pick the right tool without over-engineering your project.

Quick Comparison

Feature requests + BS4 Playwright Scrapy
Learning curve Easy Medium Steep
JavaScript support No Yes No (without plugins)
Speed Fast Slow Very fast
Memory usage Low High Medium
Built-in concurrency No No Yes
Best for Simple pages SPAs, interactive sites Large-scale crawling

Option 1: requests + BeautifulSoup

This is where everyone should start. It's the simplest approach and handles more sites than you'd expect.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Minimal dependencies (pip install requests beautifulsoup4)
  • Fast — no browser overhead, just HTTP requests
  • Low memory footprint
  • Easy to debug — you can inspect the raw HTML directly
  • Works with lxml parser for even better performance

Cons:

  • Can't handle JavaScript-rendered content
  • No built-in session management for complex login flows
  • You handle retries, rate limiting, and headers manually

Use it when:

  • The page content is in the HTML source (right-click → View Source → can you see the data?)
  • You're scraping fewer than 100 pages
  • Speed matters and the target is simple

Don't use it when:

  • Prices, reviews, or content load via JavaScript/AJAX
  • You need to click buttons, scroll, or interact with the page

Option 2: Playwright

Playwright runs a real browser. It's the nuclear option for sites that won't work with plain HTTP requests.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Handles any JavaScript-rendered page
  • Can interact with pages: click buttons, fill forms, scroll
  • Built-in waiting mechanisms (wait_for_selector, wait_for_load_state)
  • Screenshots and PDF generation for debugging
  • Supports Chromium, Firefox, and WebKit

Cons:

  • Slow — launching a browser takes 1-3 seconds per instance
  • Memory hungry — each browser instance uses 100-300 MB
  • More complex setup (playwright install to download browser binaries)
  • Harder to run in CI/CD or minimal server environments

Use it when:

  • Content is rendered by JavaScript (React, Vue, Angular, Next.js)
  • You need to log in through an interactive form
  • You need to scroll to load infinite content
  • The site uses complex anti-bot measures that check for browser fingerprints

Don't use it when:

  • The data is available in the HTML source or via an API
  • You need to scrape thousands of pages quickly
  • You're running on a server with limited RAM

The Hidden API Trick

Before reaching for Playwright, check if the site has a hidden API. Open your browser's DevTools → Network tab → filter by XHR/Fetch. Many "JavaScript-rendered" sites actually load data from a JSON API. If you find it, use requests to call the API directly — it's faster, more reliable, and returns structured data.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

This approach is underrated. I'd estimate 60% of the time people reach for Playwright, they could use requests against a JSON endpoint instead.

Option 3: Scrapy

Scrapy is a full framework, not just a library. It's built for crawling entire sites, not scraping individual pages.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Run it with: scrapy runspider myspider.py

Pros:

  • Built-in concurrency — scrapes multiple pages simultaneously
  • Automatic request queuing, deduplication, and retry logic
  • Pipeline system for processing/storing data
  • Middleware for proxies, headers, cookies
  • Handles pagination naturally with response.follow()
  • Built-in export to JSON, CSV, databases

Cons:

  • Steep learning curve — spiders, items, pipelines, middlewares, settings
  • No JavaScript support out of the box (need scrapy-playwright plugin)
  • Overkill for scraping a few pages
  • Harder to debug than simple scripts
  • The async architecture can be confusing for beginners

Use it when:

  • You're crawling hundreds or thousands of pages
  • You need to follow links across an entire site
  • You want built-in retries, rate limiting, and data export
  • You're building a scraping pipeline that runs regularly

Don't use it when:

  • You're scraping 5-10 specific URLs
  • You need heavy JavaScript interaction
  • You want quick results without learning a framework

When to Use a Scraping API Instead

All three tools share the same weakness: they don't handle anti-bot systems well on their own. If you're scraping sites that actively block scrapers (e-commerce, social media, search engines), you'll spend more time fighting blocks than extracting data.

Scraping APIs handle the hard parts — proxy rotation, CAPTCHA solving, browser fingerprinting — so you can focus on data extraction.

When a scraping API makes sense:

  • You're getting blocked more than 20% of the time
  • You're scraping sites with Cloudflare, DataDome, or PerimeterX
  • You need reliable data for a production system
  • Your time is worth more than the API cost

Recommended APIs I've tested:

  • ScraperAPI — best all-around option. Handles proxies, CAPTCHAs, and JS rendering. Start with 5,000 free credits to test it on your target site.
  • Scrape.do — competitive pricing, good JS rendering support, clean API design.
  • ScrapeOps — proxy aggregator and monitoring dashboard. Great if you want to compare proxy providers or track your scraper's health.

Using them is straightforward — they work with any of the three tools above:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

My Decision Framework

Here's how I choose for each project:

  1. Can I see the data in View Source? → Use requests + BS4
  2. Is there a hidden JSON API? → Use requests against the API
  3. Does the page need JavaScript to render? → Use Playwright
  4. Am I scraping hundreds+ of pages with pagination? → Use Scrapy
  5. Am I getting blocked? → Add ScraperAPI or Scrape.do to whatever tool I'm using

Most projects start at step 1 and move down the list only when they need to.

Want the Full Playbook?

I cover all three tools in depth — including advanced patterns like stealth configurations, proxy chains, and handling CAPTCHAs — in my web scraping ebook.

Get the Web Scraping Playbook — $9 on Gumroad

Includes code templates for each tool, anti-detection configs, and a decision tree for choosing the right approach.


Got a specific scraping problem? Reach me at hello@web-data-labs.com — happy to point you in the right direction.

Top comments (0)