DEV Community

Alex Spinov
Alex Spinov

Posted on

The Developer's Guide to Structured Data Extraction (JSON-LD, RSS, APIs)

Three methods that make web scraping 10x more reliable.

1. JSON-LD (Schema.org)

Embedded in <script type='application/ld+json'>. Contains reviews, products, organizations.
Example: Trustpilot Scraper

2. RSS Feeds

Standard XML format. Google News, blogs, podcasts.
Example: Google News Scraper

3. Hidden JSON APIs

Internal endpoints sites use. Reddit .json, YouTube Innertube.
Example: Reddit Scraper

77 tools using these methods: GitHub

Custom extraction — $20: Payoneer

Top comments (0)