I run an AI-tools review site, and last quarter I got obsessed with one question: why does ChatGPT cite some pages and completely ignore others that rank fine on Google?
Ranking and getting cited turn out to be different games. Google ranks documents; an LLM extracts passages. So I reverse-engineered what makes a passage extractable, turned it into a 10-signal rubric, and built a free checker around it. Here is the rubric so you can implement it yourself, no tool required.
I weight 10 signals out of 100:
1. Answer-first phrasing (15 pts). The first sentence under a heading should answer the heading's question directly. LLMs lift the lead sentence. "Semrush costs $139.95/month" beats "When considering pricing, there are several factors to weigh."
2. Question-shaped headings (12 pts). <h2>How much does X cost?</h2> maps one-to-one onto how people actually prompt. Statement headings like "Pricing" do not.
3. Extractable lists and tables (12 pts). A real <table> or <ul> is far easier to lift verbatim than a wall of prose. Comparisons especially.
4. FAQPage JSON-LD (10 pts). Generate it from your visible Q&A, never fake it. It is the schema engines lean on most for direct answers.
5. Factual specificity (10 pts). Numbers, dates, named entities, units. "$15/mo, 1,500 credits" is citable; "affordable pricing" is noise.
6. Organization schema plus sameAs (10 pts). A linked entity graph (who published this, plus verified social profiles) is how an engine decides you are a real source and not a content farm.
7. Named author and E-E-A-T (8 pts). A real Person author with a byline and an About page beats author: Organization.
8. Freshness (8 pts). datePublished and dateModified, in schema and visible on the page. Engines discount stale pages for fast-moving topics.
9. Authority outbound links (8 pts). Linking to primary sources (official docs, the vendor's own pricing page) is a trust signal, not a leak.
10. Meta summary (7 pts). A tight meta description that is a standalone answer. It is often what gets quoted in the preview.
A few things I learned the hard way:
- Walls kill citations. If the answer sits behind a signup or a JS-rendered tab, the crawler never sees it. Server-render the answer.
- One page, one question cluster. Pages that try to rank for everything get cited for nothing. Tight scope wins.
- Schema is the easy half. Valid JSON-LD gets you in the room; answer-first prose is what actually gets quoted.
I wired all 10 signals into a browser-only checker (paste your HTML, get a /100 score and the priority fixes) if you would rather not score by hand: https://aitoolsinsiderhq.com/ai-search-visibility-checker.html . No signup, runs entirely client-side.
If you implement even the top three (answer-first, question headings, FAQ schema), you will usually see a difference within a crawl cycle or two. Curious what signals other people track. What has worked for getting your own stuff cited?
Top comments (0)