DEV Community

Алексей Спинов
Алексей Спинов

Posted on

Web Scraping vs API: When to Scrape and When to Use the Official API

The biggest mistake in web scraping is scraping when an API exists.

Decision Tree

Does the site have an official API?
├── YES → Is it free or affordable?
│   ├── YES → USE THE API (always)
│   └── NO → Can you afford it?
│       ├── YES → USE THE API
│       └── NO → Is there a hidden JSON API?
│           ├── YES → Use it (see list below)
│           └── NO → Scrape HTML as last resort
└── NO → Is there a hidden JSON endpoint?
    ├── YES → Use it
    └── NO → Scrape with Cheerio/Playwright
Enter fullscreen mode Exit fullscreen mode

Sites With Hidden JSON APIs

Site Endpoint Auth
Reddit .json suffix None
YouTube Innertube None
Shopify /products.json None
HN Algolia API None
Bluesky AT Protocol None

Full list: 7 Sites That Return JSON

The API-First Rule

After building 77 scrapers:

  • 70% use JSON APIs (never break)
  • 25% use Cheerio HTML parsing
  • 5% need Playwright browser automation

Resources


Not sure whether to scrape or use an API? I will assess your target and build the optimal solution. $20. Email: Spinov001@gmail.com | Hire me

Top comments (0)