DEV Community

Alex Spinov
Alex Spinov

Posted on

The 4-Step Method I Use to Build Every Web Scraper

Every scraper I build follows the same 4-step process. It works for any website.

Step 1: Check for a JSON API (5 min)

Open browser DevTools → Network → XHR. Browse the page. Look for JSON responses.

If found → use that endpoint. You're done. No HTML parsing needed.

Examples: Reddit (.json), YouTube (Innertube), HN (Firebase)

Step 2: Check for RSS/Atom (2 min)

Look for <link rel="alternate" type="application/rss+xml"> in the page source.

If found → parse XML. Done.

Example: Google News

Step 3: Check for JSON-LD (2 min)

Search for <script type="application/ld+json"> in page source.

If found → parse JSON. Contains structured product/review/organization data.

Example: Trustpilot

Step 4: Last Resort — HTML Parsing

Only if steps 1-3 fail. Use Cheerio (fast) or Playwright (JavaScript rendering).

This is the MOST COMMON approach but should be your LAST choice.

Apply This to Any Website

  1. DevTools → Network → look for JSON
  2. View source → search for RSS
  3. View source → search for JSON-LD
  4. Only then → Cheerio/Playwright

All 77 scrapers following this method: GitHub

Custom scraping — $20: Order via Payoneer

Top comments (0)