I've been building web scrapers for years. Here's my controversial take: most web scraping tutorials teach you the wrong thing.
They teach you to parse HTML. To fight with selectors. To handle dynamic JavaScript rendering.
But 80% of the data you need is available through free public APIs that nobody talks about.
The APIs Nobody Knows About
-
PyPI has a JSON API.
https://pypi.org/pypi/{package}/json— no key, no auth. - YouTube has Innertube. Internal API, no quotas, no key.
- arXiv has a free search API. 2M+ papers, structured XML.
- PubMed returns medical research data in JSON.
- GitHub gives you repo data without a token.
- Crossref searches 130M+ research papers for free.
- WHOIS/RDAP returns domain registration data via REST.
I documented all of them in my free APIs list — 200+ APIs that need zero registration.
Why This Matters
Every time you write a BeautifulSoup selector, you're:
- Building something fragile (one HTML change = broken scraper)
- Fighting anti-bot systems unnecessarily
- Ignoring structured data that's already there
APIs don't change their response format every week. HTML does.
My Rule
Before scraping ANY website, I spend 5 minutes checking:
- Does it have a public API? (check
/api,/graphql, or docs) - Does it expose JSON in page source? (
ytInitialData,__NEXT_DATA__) - Does it have RSS/Atom feeds?
Only if all three fail do I touch the HTML.
What's your approach? Do you default to HTML scraping or APIs first? Have you discovered any hidden APIs that saved you hours of work?
I'm genuinely curious — drop your experience in the comments.
More free tools: 77 Web Scraping Tools & APIs
Do you still scrape HTML or have you switched to APIs? I'd love to hear what approach works best for your projects. 👇
Need custom dev tools, scrapers, or API integrations? I build automation for dev teams. Email spinov001@gmail.com — or explore awesome-web-scraping.
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Also: Neon Free Postgres | Vercel Free API | Hetzner 4x More Server
NEW: I Ran an AI Agent for 16 Days — What Actually Works
You might also like:
Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*
Top comments (0)