The biggest mistake in web scraping is scraping when an API exists.
Decision Tree
Does the site have an official API?
├── YES → Is it free or affordable?
│ ├── YES → USE THE API (always)
│ └── NO → Can you afford it?
│ ├── YES → USE THE API
│ └── NO → Is there a hidden JSON API?
│ ├── YES → Use it (see list below)
│ └── NO → Scrape HTML as last resort
└── NO → Is there a hidden JSON endpoint?
├── YES → Use it
└── NO → Scrape with Cheerio/Playwright
Sites With Hidden JSON APIs
| Site | Endpoint | Auth |
|---|---|---|
| .json suffix | None | |
| YouTube | Innertube | None |
| Shopify | /products.json | None |
| HN | Algolia API | None |
| Bluesky | AT Protocol | None |
Full list: 7 Sites That Return JSON
The API-First Rule
After building 77 scrapers:
- 70% use JSON APIs (never break)
- 25% use Cheerio HTML parsing
- 5% need Playwright browser automation
Resources
Not sure whether to scrape or use an API? I will assess your target and build the optimal solution. $20. Email: Spinov001@gmail.com | Hire me
Top comments (0)