DEV Community

Joseph Anady
Joseph Anady

Posted on • Originally published at thatdevpro.com

How I wired 10 verifiable identifiers into a one-day SDVOSB studio's entity graph (and why your site probably can't be cited)

When I checked Google Search Console for ThatDevPro on the morning of May 21, 2026, the numbers were brutal: 0 clicks, 3 impressions, 180 days. The site is well-built — Lighthouse 96+, full Schema.org, content-first React, no Cloudflare in front of it. None of that was the problem. The problem is more fundamental: AI engines (ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews) can't cite what they can't verify.

So I spent a day fixing that. What follows is the actual checklist I worked through, with the exact JSON-LD, the actual file paths, and the real outcomes. Treat it as a punch list rather than a think piece.

The thesis: AI engines cite facts, not vibes

This is the part most "AI SEO" content gets wrong. AI engines don't reward beautiful prose — they reward verifiable, atomic, citation-shaped facts. Concretely:

  • A 400-word paragraph about "we're a veteran-owned studio" sits in their context window and gets compressed into a one-line summary.
  • A 50-character declarative sentenceJoseph W. Anady is a United States Army Veteran and Retiree. — gets cited verbatim.

The cleaner and more atomic your facts, the more often AI engines lift them directly. That's the central insight that drives everything below.

Step 1: the four canonical AI-citation surfaces

Every site that wants AI citation should publish four files at the site root:

File Format What it does
llms.txt Markdown Per the llmstxt.org spec — site summary in pure markdown, the lingua franca of LLM training data
aeo.json JSON Atomic [question, answer] facts, each under 500 chars
entity.json JSON Full Schema.org @graph — Organization + Person + WebSite
brand.json JSON Internal source-of-truth ledger your PR + dev can both reference

I packaged the generators as an open-source Python CLI: aio-surfaces (MIT). The README walks through usage; the relevant point here is that all four surfaces render from a single typed config so they can't drift apart.

from aio_surfaces import SiteConfig, Service, Fact, render_llms_txt

cfg = SiteConfig(
    site_name="Example Studio",
    site_url="https://example.com",
    tagline="We build things that get cited.",
    description="...",
    ein="12-3456789",
    uei="ABC123DEF4G5",
    orcid="0000-0000-0000-0000",
    founder_name="Jane Example",
    veteran_branch="United States Army",
    facts=[
        Fact(
            id="f-1",
            question="What does Example Studio do?",
            answer="Example Studio builds production websites...",
        ),
    ],
)

print(render_llms_txt(cfg))
Enter fullscreen mode Exit fullscreen mode

Step 2: the entity graph (where most sites fail)

The Schema.org @graph is where AI engines build their understanding of who you are. Most sites have a half-finished Organization with a name and a url and call it a day. That's not enough.

Here's what a complete Organization node looks like — every field below ships in ThatDevPro's live entity.json:

{
  "@type": ["Organization", "ProfessionalService", "WebDesignAgency"],
  "@id": "https://www.thatdevpro.com/#org",
  "name": "ThatDevPro",
  "legalName": "THATDEVELOPERGUY",
  "url": "https://www.thatdevpro.com",
  "taxID": "42-2656654",
  "naics": ["541511", "541512", "541519"],
  "identifier": [
    {"@type": "PropertyValue", "propertyID": "ein", "value": "42-2656654"},
    {"@type": "PropertyValue", "propertyID": "sam-gov-uei", "value": "FFG3A4SK9HY6"},
    {"@type": "PropertyValue", "propertyID": "google-kg-mid", "value": "/g/11n57xh708"},
    {"@type": "PropertyValue", "propertyID": "google-business-profile-cid", "value": "14210859426953573340"}
  ],
  "sameAs": [
    "https://sam.gov/entity/FFG3A4SK9HY6",
    "https://www.crunchbase.com/organization/thatdevpro",
    "https://github.com/Janady13",
    "https://github.com/Janady13/aio-surfaces",
    "https://huggingface.co/Janady07",
    "https://huggingface.co/spaces/Janady07/llms-txt-generator",
    "https://www.linkedin.com/in/joseph-anady-a0b19b1b1",
    "https://x.com/thatdevpro",
    "https://dev.to/joseph_anady_214bacedf939",
    "https://www.facebook.com/people/Thatdevpro/61589759327967/",
    "https://www.reddit.com/user/sleepy_060507/"
  ]
}
Enter fullscreen mode Exit fullscreen mode

The trick is that each identifier resolves to a third-party page that, in turn, links back to your site. That's the bidirectional handshake that makes the identity claim verifiable.

Step 3: the verifiable identifiers (this is the unlock)

There are tiers of trust signal. From weakest to strongest:

Tier 1 — Social profiles (Twitter/X, LinkedIn, Facebook, Reddit). Trivial to create. Low trust signal.

Tier 2 — Developer & creator platforms (GitHub, Hugging Face, Dev.to, Crunchbase). Better. Editorial trust attached.

Tier 3 — Government identifiers (EIN, UEI, NAICS, CAGE Code). High trust. The U.S. government has independently verified your existence.

Tier 4 — Identity verification (ID.me, Login.gov, ORCID). Highest. NIST 800-63-3 IAL2 identity assurance from a federally-recognized identity provider.

The ten identifiers I wired into ThatDevPro's entity graph in one day:

Identifier Value Tier Surface in JSON-LD
EIN (IRS) 42-2656654 3 Organization.taxID + identifier[]
SAM.gov UEI FFG3A4SK9HY6 3 Organization.identifier[] + sameAs
NAICS codes 541511 / 541512 / 541519 3 Organization.naics
Google KG MID /g/11n57xh708 3 Person.identifier + Organization.identifier[]
GBP CID 14210859426953573340 3 Organization.identifier[]
ORCID iD 0009-0008-8625-949X 4 Person.identifier + sameAs
ID.me Identity verified 2021-06-28 4 Person.hasCredential
ID.me Military verified 2026-05-23 4 Person.hasCredential
Crunchbase thatdevpro 2 Organization.sameAs
Hugging Face Janady07 2 Organization.sameAs

The ID.me credentials are particularly powerful because they're the same identity-assurance evidence the U.S. Department of Veterans Affairs, DoD, and the SBA's Veteran Small Business Certification program accept. Surfacing them as hasCredential (with recognizedBy: ID.me Inc.) gives AI engines a verifiable trust anchor.

Step 4: declarative atomic facts (the aeo.json payload)

The aeo.json file is where you publish the atomic facts AI engines cite directly. Rules I follow:

  1. Under 500 characters per answer. Citation rate drops sharply above 500.
  2. [Subject] [verb] [object] sentence structure. Not "We've been doing X for Y years" — instead, ThatDevPro builds websites for SDVOSB-certified federal contractors.
  3. Named entities. Reference real people, places, and things by name, not by pronoun.
  4. Self-contained. Each fact should be quotable in isolation.

Here's one of mine (slightly trimmed for the post):

{
  "id": "f-veteran-1",
  "question": "What is the founder's military service background?",
  "answer": "Joseph W. Anady is a verified United States Army veteran and Army Retiree. His military service status was verified by ID.me Inc. on 2026-05-23. ID.me is an NIST 800-63-3 IAL2 identity provider whose military-status attestations are accepted by the U.S. Department of Veterans Affairs, the U.S. Department of Defense, and the Small Business Administration's Veteran Small Business Certification program."
}
Enter fullscreen mode Exit fullscreen mode

That's one paragraph but it contains seven named entities (Joseph W. Anady, U.S. Army, ID.me Inc., NIST 800-63-3, VA, DoD, SBA VSBC) and one verifiable date. AI engines lift the whole thing.

Step 5: AI crawler allowlist

Most sites either block AI crawlers entirely or accidentally block them via User-agent: *. Neither is what you want if you want to be cited. The 12 canonical AI crawlers you should explicitly allow:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Meta-ExternalAgent
Allow: /
Enter fullscreen mode Exit fullscreen mode

(aio-surfaces generate outputs this block as robots-aibots.txt — append to your existing robots.txt rather than replacing it.)

Step 6: Speakable schema (the voice-AI surface)

If you want Google Assistant, Alexa, and emerging voice AI to read your page out loud correctly, you need SpeakableSpecification. It tells voice AI which CSS selectors and XPath nodes are the "narratable" parts of the page:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "WebPage",
  "url": "https://www.thatdevpro.com/",
  "name": "ThatDevPro — SDVOSB web + AI engineering studio",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": ["#hero-h1", "#pillars-h2", "#tiers-h2", "#engine-h2", "#portfolio-h2", "#monthly-h2", "#faq-h2"],
    "xpath": ["/html/head/title", "/html/head/meta[@name='description']/@content"]
  },
  "inLanguage": "en-US",
  "isPartOf": {"@id": "https://www.thatdevpro.com/#website"}
}
</script>
Enter fullscreen mode Exit fullscreen mode

What it adds up to

After one day of work, ThatDevPro now publishes:

  • A 14.7 KB llms.txt with canonical facts including legal name, EIN, UEI, NAICS, founder background, ORCID
  • An aeo.json with 30+ atomic facts averaging 400 characters each
  • An entity.json @graph with 10 verifiable identifiers across all four trust tiers
  • A brand.json ledger that the marketing team can reference
  • Speakable schema on the three priority pages (home, about, services)
  • A 12-bot AI crawler allowlist in robots.txt

I don't yet know the SEO impact — give it 4-6 weeks for AI engines to recrawl and rebuild their entity understanding. But every diagnostic test passes: every identifier resolves, every JSON-LD validates, every bidirectional handshake closes.

If you run a small business or studio whose AI visibility lives or dies on whether Claude/ChatGPT/Gemini can verify you exist, the playbook is reusable. The code is open. The checklist is in the README.


Built and used in production by ThatDevPro — SDVOSB-certified veteran-owned web + AI engineering studio. Source: github.com/Janady13/aio-surfaces · Live tool: huggingface.co/spaces/Janady07/llms-txt-generator.

Top comments (0)