Open any website. Right-click → View Source. Search for application/ld+json.
You'll find structured data that the website itself put there for Google to read. It's machine-readable, standardized, and practically begging to be extracted.
What Is JSON-LD?
JSON-LD (JavaScript Object Notation for Linked Data) is a way to embed structured, machine-readable data inside HTML pages. It follows the Schema.org vocabulary — the same standard Google, Bing, and Apple use to power rich search results.
When you see star ratings in Google search, product prices in shopping results, or event dates in Google Calendar — that's JSON-LD.
<script type="application/ld+json">
{
"@type": "Product",
"name": "Wireless Headphones XR-500",
"offers": {"@type": "Offer", "price": "79.99", "priceCurrency": "USD"},
"aggregateRating": {"@type": "AggregateRating", "ratingValue": "4.5", "reviewCount": "1234"}
}
</script>
Where To Find It
Almost every commercial website has JSON-LD:
| Website Type | What JSON-LD Contains |
|---|---|
| E-commerce | Product name, price, rating, review count |
| Review sites (Trustpilot) | Individual reviews with text, author, date, rating |
| Restaurants | Name, address, menu, hours, cuisine type |
| Events | Date, venue, ticket price, performers |
| Recipes | Ingredients, cook time, nutrition, ratings |
| Articles | Headline, author, date published, publisher |
| Local businesses | Address, phone, hours, rating |
Why This Beats Traditional Scraping
It never breaks. JSON-LD follows Schema.org standards. These standards have been stable since 2013. When a website redesigns their CSS, the JSON-LD stays exactly the same.
No JavaScript rendering. JSON-LD is in the raw HTML source. No headless browser, no Playwright, no waiting for JavaScript. A simple HTTP request + regex finds it.
It's legal. The website intentionally published this data for search engines. You're reading the same data Google reads.
It's structured. No CSS selectors, no DOM traversal, no brittle XPath. Just JSON.parse().
How To Extract It
const cheerio = require('cheerio');
const $ = cheerio.load(html);
const jsonLdScripts = $('script[type="application/ld+json"]');
jsonLdScripts.each((i, el) => {
const data = JSON.parse($(el).html());
console.log(data);
});
That's it. 5 lines of code. Works on any website with JSON-LD.
Real Examples
Trustpilot reviews: Every business page has AggregateRating + individual Review objects. My Trustpilot Scraper uses this exclusively — 100% reliability.
Amazon products: Product name, price, rating, description — all in JSON-LD.
Recipe sites: Full ingredient list, nutrition info, cook time — structured and standardized.
Tools
- Trustpilot Review Scraper — JSON-LD based, never breaks
- MCP Price Monitor — extracts prices from JSON-LD
- Meta Tags Extractor — all structured data from any URL
Part of 77 free tools on Apify.
Need structured data extracted from any website? $20 per dataset: Order via Payoneer
Top comments (0)