DEV Community

Alex Spinov
Alex Spinov

Posted on

Unpopular Opinion: Stop Scraping HTML — Use These Free APIs Instead

I've been building web scrapers for years. Here's my controversial take: most web scraping tutorials teach you the wrong thing.

They teach you to parse HTML. To fight with selectors. To handle dynamic JavaScript rendering.

But 80% of the data you need is available through free public APIs that nobody talks about.

The APIs Nobody Knows About

  • PyPI has a JSON API. https://pypi.org/pypi/{package}/json — no key, no auth.
  • YouTube has Innertube. Internal API, no quotas, no key.
  • arXiv has a free search API. 2M+ papers, structured XML.
  • PubMed returns medical research data in JSON.
  • GitHub gives you repo data without a token.
  • Crossref searches 130M+ research papers for free.
  • WHOIS/RDAP returns domain registration data via REST.

I documented all of them in my free APIs list — 200+ APIs that need zero registration.

Why This Matters

Every time you write a BeautifulSoup selector, you're:

  1. Building something fragile (one HTML change = broken scraper)
  2. Fighting anti-bot systems unnecessarily
  3. Ignoring structured data that's already there

APIs don't change their response format every week. HTML does.

My Rule

Before scraping ANY website, I spend 5 minutes checking:

  1. Does it have a public API? (check /api, /graphql, or docs)
  2. Does it expose JSON in page source? (ytInitialData, __NEXT_DATA__)
  3. Does it have RSS/Atom feeds?

Only if all three fail do I touch the HTML.


What's your approach? Do you default to HTML scraping or APIs first? Have you discovered any hidden APIs that saved you hours of work?

I'm genuinely curious — drop your experience in the comments.

Top comments (0)