Here is a JSON-LD block that looks fine to a human and is useless to an LLM extractor:
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "Acme Analytics",
"offers": {
"@type": "Offer",
"priceRange": "$"
}
}
If you build product pages and you care whether an AI assistant ever names your product in a discovery answer ("what's the best tool for X?"), this matters more than another homepage redesign. The block above fails for two concrete reasons: priceRange is not a valid field on Offer (it belongs on LocalBusiness), and there is no price or priceCurrency. So an extractor asked "what does Acme cost?" has nothing to return, and the page gets dropped from the retrieval set for any pricing-shaped prompt.
Here is the same block, fixed:
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "Acme Analytics",
"applicationCategory": "BusinessApplication",
"operatingSystem": "Web",
"description": "Acme Analytics is a web analytics tool for indie SaaS teams. It tracks product events and reports retention. Plans start at $19/month.",
"offers": {
"@type": "Offer",
"price": "19.00",
"priceCurrency": "USD"
}
}
Four fields carry the weight: name, description, and an offers object with price and priceCurrency. That is the minimum machine-readable contract for a SaaS page that wants to be in an AI answer.
Why this decides whether you get recommended
An assistant answering "what is the best X for Y" runs three stages: retrieve pages, extract facts, synthesize a shortlist. Structured data is what makes the extract stage succeed. A page with no schema, or broken schema, can survive retrieval and still get dropped before synthesis, because the model could not pull clean facts from it. The products that show up in the shortlist are the ones whose facts were extractable, not necessarily the ones with the best marketing. (This is a pattern worth internalizing if you've ever wondered why a worse-funded competitor keeps showing up in AI answers and you don't.)
The three breaks worth auditing
These pass loose validators and fail in practice:
-
priceRangeonOffer. Only valid onLocalBusiness. Useprice+priceCurrencyonOffer. -
Missing
offersentirely. ASoftwareApplicationwith no offer cannot answer cost questions. Add it even for free products ("price": "0"). -
Raw URL strings where an object is required.
ListItem.itemandInteractionCounter.interactionTypeshould be a fullThing/IdReference, not a bare URL string. Google tolerates the loose form; strict parsers reject it. Also:ImageObject.width/heightmust be strings orQuantitativeValue, never raw numbers.
Validate with a strict, typed builder instead of eyeballing. In TypeScript, schema-dts makes the compiler reject hallucinated fields:
import type { WithContext, SoftwareApplication } from 'schema-dts';
const ld: WithContext<SoftwareApplication> = {
'@context': 'https://schema.org',
'@type': 'SoftwareApplication',
name: 'Acme Analytics',
applicationCategory: 'BusinessApplication',
offers: { '@type': 'Offer', price: '19.00', priceCurrency: 'USD' },
};
If you put a field that does not belong on SoftwareApplication, this fails to compile. That is the point: it catches the made-up fields that loose JSON validators wave through. (This is the exact approach we landed on building product listings at PeerPush, after enough hand-written JSON-LD drifted out of spec.)
Test what actually ships, not what you think ships
Client-rendered JSON-LD can vanish from the served HTML, and CSS-driven spacing can mangle text nodes that an extractor reads. Check the raw response, not the browser:
curl -s https://yoursite.com/product/acme | grep -c 'application/ld+json'
curl -s https://yoursite.com/product/acme \
| python3 -c "import sys,re,json; [json.loads(m) for m in re.findall(r'<script type=\"application/ld\+json\">(.*?)</script>', sys.stdin.read(), re.S)]; print('all JSON-LD blocks parse')"
If a block does not parse, it is invisible to extractors, which is worse than having none.
Ship llms.txt
llms.txt is an emerging convention: a plain-text file at your site root that describes your site and lists key pages for AI retrieval. It is not yet required by any assistant, but it is cheap to ship and a reasonable hedge:
# Acme Analytics
> Web analytics for indie SaaS teams. Plans from $19/mo.
## Key pages
- /product: what Acme does
- /pricing: plans and prices
- /alternatives: how Acme compares
Stable URLs are part of the contract
A slug that changes on a rebrand drops you out of the retrieval set until the new URL re-indexes. Assign slugs once, hold them forever, and 301 anything you must move, keeping the redirect alive indefinitely. Protect three URLs above all: your product page, /pricing, and any /alternatives or /vs page. Treat changes to those as migrations, not casual edits.
A first paragraph that extracts
Extractors weight the opening of a page heavily. Open each page with a complete factual claim: what the product is, who it's for, what it costs, in plain English. Replace "Discover the future of analytics" with "Acme Analytics is a web analytics tool for indie SaaS teams; plans start at $19/month." Concrete sentences get quoted; vague ones get dropped.
The checklist
- [ ]
SoftwareApplicationJSON-LD withname,description, andoffers(price+priceCurrency). - [ ] No
priceRangeonOffer; no raw URL strings where objects are required; image dims as strings. - [ ] Validate with
schema-dts(or the Schema.org validator) in CI. - [ ] First paragraph of each page is a complete factual claim, not a tease.
- [ ]
llms.txtat the root. - [ ] Stable slugs;
301s held indefinitely for the product, pricing, and alternatives pages. - [ ]
curl | grepthe served HTML to confirm the JSON-LD actually ships.
None of this is a product change. It is the difference between a page an assistant can quote and one it silently skips, and it's almost entirely under your control as the person who owns the markup.
Top comments (0)