DEV Community

Alex Spinov
Alex Spinov

Posted on

Every Website Has Hidden Structured Data — Here's How to Extract It

Open any website. Right-click → View Source. Search for application/ld+json.

You'll find structured data that the website itself put there for Google to read. It's machine-readable, standardized, and practically begging to be extracted.

What Is JSON-LD?

JSON-LD (JavaScript Object Notation for Linked Data) is a way to embed structured, machine-readable data inside HTML pages. It follows the Schema.org vocabulary — the same standard Google, Bing, and Apple use to power rich search results.

When you see star ratings in Google search, product prices in shopping results, or event dates in Google Calendar — that's JSON-LD.

<script type="application/ld+json">
{
  "@type": "Product",
  "name": "Wireless Headphones XR-500",
  "offers": {"@type": "Offer", "price": "79.99", "priceCurrency": "USD"},
  "aggregateRating": {"@type": "AggregateRating", "ratingValue": "4.5", "reviewCount": "1234"}
}
</script>
Enter fullscreen mode Exit fullscreen mode

Where To Find It

Almost every commercial website has JSON-LD:

Website Type What JSON-LD Contains
E-commerce Product name, price, rating, review count
Review sites (Trustpilot) Individual reviews with text, author, date, rating
Restaurants Name, address, menu, hours, cuisine type
Events Date, venue, ticket price, performers
Recipes Ingredients, cook time, nutrition, ratings
Articles Headline, author, date published, publisher
Local businesses Address, phone, hours, rating

Why This Beats Traditional Scraping

It never breaks. JSON-LD follows Schema.org standards. These standards have been stable since 2013. When a website redesigns their CSS, the JSON-LD stays exactly the same.

No JavaScript rendering. JSON-LD is in the raw HTML source. No headless browser, no Playwright, no waiting for JavaScript. A simple HTTP request + regex finds it.

It's legal. The website intentionally published this data for search engines. You're reading the same data Google reads.

It's structured. No CSS selectors, no DOM traversal, no brittle XPath. Just JSON.parse().

How To Extract It

const cheerio = require('cheerio');
const $ = cheerio.load(html);

const jsonLdScripts = $('script[type="application/ld+json"]');
jsonLdScripts.each((i, el) => {
  const data = JSON.parse($(el).html());
  console.log(data);
});
Enter fullscreen mode Exit fullscreen mode

That's it. 5 lines of code. Works on any website with JSON-LD.

Real Examples

Trustpilot reviews: Every business page has AggregateRating + individual Review objects. My Trustpilot Scraper uses this exclusively — 100% reliability.

Amazon products: Product name, price, rating, description — all in JSON-LD.

Recipe sites: Full ingredient list, nutrition info, cook time — structured and standardized.

Tools

Part of 77 free tools on Apify.


Need structured data extracted from any website? $20 per dataset: Order via Payoneer

Top comments (0)