When I checked Google Search Console for ThatDevPro on the morning of May 21, 2026, the numbers were brutal: 0 clicks, 3 impressions, 180 days. The site is well-built — Lighthouse 96+, full Schema.org, content-first React, no Cloudflare in front of it. None of that was the problem. The problem is more fundamental: AI engines (ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews) can't cite what they can't verify.
So I spent a day fixing that. What follows is the actual checklist I worked through, with the exact JSON-LD, the actual file paths, and the real outcomes. Treat it as a punch list rather than a think piece.
The thesis: AI engines cite facts, not vibes
This is the part most "AI SEO" content gets wrong. AI engines don't reward beautiful prose — they reward verifiable, atomic, citation-shaped facts. Concretely:
- A 400-word paragraph about "we're a veteran-owned studio" sits in their context window and gets compressed into a one-line summary.
-
A 50-character declarative sentence —
Joseph W. Anady is a United States Army Veteran and Retiree.— gets cited verbatim.
The cleaner and more atomic your facts, the more often AI engines lift them directly. That's the central insight that drives everything below.
Step 1: the four canonical AI-citation surfaces
Every site that wants AI citation should publish four files at the site root:
| File | Format | What it does |
|---|---|---|
llms.txt |
Markdown | Per the llmstxt.org spec — site summary in pure markdown, the lingua franca of LLM training data |
aeo.json |
JSON | Atomic [question, answer] facts, each under 500 chars |
entity.json |
JSON | Full Schema.org @graph — Organization + Person + WebSite |
brand.json |
JSON | Internal source-of-truth ledger your PR + dev can both reference |
I packaged the generators as an open-source Python CLI: aio-surfaces (MIT). The README walks through usage; the relevant point here is that all four surfaces render from a single typed config so they can't drift apart.
from aio_surfaces import SiteConfig, Service, Fact, render_llms_txt
cfg = SiteConfig(
site_name="Example Studio",
site_url="https://example.com",
tagline="We build things that get cited.",
description="...",
ein="12-3456789",
uei="ABC123DEF4G5",
orcid="0000-0000-0000-0000",
founder_name="Jane Example",
veteran_branch="United States Army",
facts=[
Fact(
id="f-1",
question="What does Example Studio do?",
answer="Example Studio builds production websites...",
),
],
)
print(render_llms_txt(cfg))
Step 2: the entity graph (where most sites fail)
The Schema.org @graph is where AI engines build their understanding of who you are. Most sites have a half-finished Organization with a name and a url and call it a day. That's not enough.
Here's what a complete Organization node looks like — every field below ships in ThatDevPro's live entity.json:
{
"@type": ["Organization", "ProfessionalService", "WebDesignAgency"],
"@id": "https://www.thatdevpro.com/#org",
"name": "ThatDevPro",
"legalName": "THATDEVELOPERGUY",
"url": "https://www.thatdevpro.com",
"taxID": "42-2656654",
"naics": ["541511", "541512", "541519"],
"identifier": [
{"@type": "PropertyValue", "propertyID": "ein", "value": "42-2656654"},
{"@type": "PropertyValue", "propertyID": "sam-gov-uei", "value": "FFG3A4SK9HY6"},
{"@type": "PropertyValue", "propertyID": "google-kg-mid", "value": "/g/11n57xh708"},
{"@type": "PropertyValue", "propertyID": "google-business-profile-cid", "value": "14210859426953573340"}
],
"sameAs": [
"https://sam.gov/entity/FFG3A4SK9HY6",
"https://www.crunchbase.com/organization/thatdevpro",
"https://github.com/Janady13",
"https://github.com/Janady13/aio-surfaces",
"https://huggingface.co/Janady07",
"https://huggingface.co/spaces/Janady07/llms-txt-generator",
"https://www.linkedin.com/in/joseph-anady-a0b19b1b1",
"https://x.com/thatdevpro",
"https://dev.to/joseph_anady_214bacedf939",
"https://www.facebook.com/people/Thatdevpro/61589759327967/",
"https://www.reddit.com/user/sleepy_060507/"
]
}
The trick is that each identifier resolves to a third-party page that, in turn, links back to your site. That's the bidirectional handshake that makes the identity claim verifiable.
Step 3: the verifiable identifiers (this is the unlock)
There are tiers of trust signal. From weakest to strongest:
Tier 1 — Social profiles (Twitter/X, LinkedIn, Facebook, Reddit). Trivial to create. Low trust signal.
Tier 2 — Developer & creator platforms (GitHub, Hugging Face, Dev.to, Crunchbase). Better. Editorial trust attached.
Tier 3 — Government identifiers (EIN, UEI, NAICS, CAGE Code). High trust. The U.S. government has independently verified your existence.
Tier 4 — Identity verification (ID.me, Login.gov, ORCID). Highest. NIST 800-63-3 IAL2 identity assurance from a federally-recognized identity provider.
The ten identifiers I wired into ThatDevPro's entity graph in one day:
| Identifier | Value | Tier | Surface in JSON-LD |
|---|---|---|---|
| EIN (IRS) | 42-2656654 |
3 |
Organization.taxID + identifier[]
|
| SAM.gov UEI | FFG3A4SK9HY6 |
3 |
Organization.identifier[] + sameAs
|
| NAICS codes | 541511 / 541512 / 541519 | 3 | Organization.naics |
| Google KG MID | /g/11n57xh708 |
3 |
Person.identifier + Organization.identifier[]
|
| GBP CID | 14210859426953573340 |
3 | Organization.identifier[] |
| ORCID iD | 0009-0008-8625-949X |
4 |
Person.identifier + sameAs
|
| ID.me Identity | verified 2021-06-28 | 4 | Person.hasCredential |
| ID.me Military | verified 2026-05-23 | 4 | Person.hasCredential |
| Crunchbase | thatdevpro | 2 | Organization.sameAs |
| Hugging Face | Janady07 | 2 | Organization.sameAs |
The ID.me credentials are particularly powerful because they're the same identity-assurance evidence the U.S. Department of Veterans Affairs, DoD, and the SBA's Veteran Small Business Certification program accept. Surfacing them as hasCredential (with recognizedBy: ID.me Inc.) gives AI engines a verifiable trust anchor.
Step 4: declarative atomic facts (the aeo.json payload)
The aeo.json file is where you publish the atomic facts AI engines cite directly. Rules I follow:
- Under 500 characters per answer. Citation rate drops sharply above 500.
-
[Subject] [verb] [object]sentence structure. Not "We've been doing X for Y years" — instead,ThatDevPro builds websites for SDVOSB-certified federal contractors. - Named entities. Reference real people, places, and things by name, not by pronoun.
- Self-contained. Each fact should be quotable in isolation.
Here's one of mine (slightly trimmed for the post):
{
"id": "f-veteran-1",
"question": "What is the founder's military service background?",
"answer": "Joseph W. Anady is a verified United States Army veteran and Army Retiree. His military service status was verified by ID.me Inc. on 2026-05-23. ID.me is an NIST 800-63-3 IAL2 identity provider whose military-status attestations are accepted by the U.S. Department of Veterans Affairs, the U.S. Department of Defense, and the Small Business Administration's Veteran Small Business Certification program."
}
That's one paragraph but it contains seven named entities (Joseph W. Anady, U.S. Army, ID.me Inc., NIST 800-63-3, VA, DoD, SBA VSBC) and one verifiable date. AI engines lift the whole thing.
Step 5: AI crawler allowlist
Most sites either block AI crawlers entirely or accidentally block them via User-agent: *. Neither is what you want if you want to be cited. The 12 canonical AI crawlers you should explicitly allow:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
(aio-surfaces generate outputs this block as robots-aibots.txt — append to your existing robots.txt rather than replacing it.)
Step 6: Speakable schema (the voice-AI surface)
If you want Google Assistant, Alexa, and emerging voice AI to read your page out loud correctly, you need SpeakableSpecification. It tells voice AI which CSS selectors and XPath nodes are the "narratable" parts of the page:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebPage",
"url": "https://www.thatdevpro.com/",
"name": "ThatDevPro — SDVOSB web + AI engineering studio",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": ["#hero-h1", "#pillars-h2", "#tiers-h2", "#engine-h2", "#portfolio-h2", "#monthly-h2", "#faq-h2"],
"xpath": ["/html/head/title", "/html/head/meta[@name='description']/@content"]
},
"inLanguage": "en-US",
"isPartOf": {"@id": "https://www.thatdevpro.com/#website"}
}
</script>
What it adds up to
After one day of work, ThatDevPro now publishes:
- A 14.7 KB
llms.txtwith canonical facts including legal name, EIN, UEI, NAICS, founder background, ORCID - An
aeo.jsonwith 30+ atomic facts averaging 400 characters each - An
entity.json@graphwith 10 verifiable identifiers across all four trust tiers - A
brand.jsonledger that the marketing team can reference - Speakable schema on the three priority pages (home, about, services)
- A 12-bot AI crawler allowlist in robots.txt
I don't yet know the SEO impact — give it 4-6 weeks for AI engines to recrawl and rebuild their entity understanding. But every diagnostic test passes: every identifier resolves, every JSON-LD validates, every bidirectional handshake closes.
If you run a small business or studio whose AI visibility lives or dies on whether Claude/ChatGPT/Gemini can verify you exist, the playbook is reusable. The code is open. The checklist is in the README.
Built and used in production by ThatDevPro — SDVOSB-certified veteran-owned web + AI engineering studio. Source: github.com/Janady13/aio-surfaces · Live tool: huggingface.co/spaces/Janady07/llms-txt-generator.
Top comments (0)