Why Most Web Scrapers Break (And the 4-Tier Fix)

#javascript #beginners #tutorial #webdev

Your scraper worked perfectly for 3 months. Then one morning, it returns empty data. The target site changed their HTML.

This happens because CSS selectors are fragile by design. They depend on class names, element hierarchy, and HTML structure — all of which change during routine redesigns.

After maintaining 77 production scrapers, here's what actually works long-term.

The 4 Reliability Tiers

Tier 1: Public JSON APIs (99.9% uptime)
Sites like Reddit, YouTube, and Hacker News expose JSON endpoints. These are stable because they're used by the site's own mobile app.

Tier 2: RSS Feeds (99% uptime)
Google News, blogs, podcasts — RSS is a standard that hasn't changed in 20 years.

Tier 3: JSON-LD Structured Data (95% uptime)
Embedded in HTML for Google's search results. Follows Schema.org standards. Changes are rare and backwards-compatible.

Tier 4: CSS Selectors (70-90% uptime)
The traditional approach. Breaks on every redesign. Should be your last resort.

Real Examples from My 77 Scrapers

Scraper	Method	Uptime	Last broken
Reddit	JSON API	100%	Never
YouTube Comments	Innertube API	100%	Never
Google News	RSS	100%	Never
Trustpilot	JSON-LD	100%	Never
Bluesky	AT Protocol	100%	Never
HN	Firebase API	100%	Never

Notice a pattern? None of my API-based scrapers have ever broken.

How to Find Hidden APIs

Open browser DevTools → Network tab
Filter by XHR/Fetch requests
Look for JSON responses when you interact with the page
The URL pattern is usually consistent and documented (or easily reverse-engineered)

The Bottom Line

If your scraper uses CSS selectors, it will break. The question is when, not if.

Invest time upfront to find the JSON API, RSS feed, or structured data. Your future self will thank you.

All 77 scrapers (all using Tier 1-3 methods): GitHub

Custom scraping service — $20/dataset: Order via Payoneer

DEV Community

Why Most Web Scrapers Break (And the 4-Tier Fix)

The 4 Reliability Tiers

Real Examples from My 77 Scrapers

How to Find Hidden APIs

The Bottom Line

Top comments (0)