How to check whether AI recommends your site — the honest AEO audit I run for clients

#ai #seo #webdev #devtools

Author: Alex Isa (Webappski). This is the dev-tutorial cut of a longer piece on the Webappski blog — terminal-first, fewer words on the why.

If a buyer asks ChatGPT "best CDN providers 2026" and your product is not in the answer, you lose the sale before you ever see the lead. The only honest way to know whether that is happening is to ask the engines the questions your buyers ask and read the raw answers — not trust a single dashboard score.

Here is the loop we at Webappski run for a client, with the open-source tool aeo-platform (MIT, zero runtime deps).

1. Install and point it at the client's domain

npm install -g aeo-platform
cd client-audit && aeo-tracker init

init writes a .aeo-tracker.json. The three things that matter:

{
  "brand": "Northwind CDN",         // illustrative, fictional brand
  "domain": "northwind.example",    // registrable domain — subdomains count, spoof hosts don't
  "engines": ["openai", "gemini", "anthropic"],   // ChatGPT, Gemini, Claude
  "queries": [
    "best CDN providers 2026",
    "best low-latency video streaming CDN 2026",
    "alternatives to the market-leading CDN 2026"
  ]
}

The questions ARE the audit. A basket of vanity phrases produces a flattering, useless number; a basket of the buyer's real decision questions produces a number that predicts revenue. Freeze it, so next month's run is comparable.

2. Run it — sampled, not one noisy shot

AI answers are non-deterministic: ask the same question twice and you can get a different list. A single pass turns that noise into a fake-precise number. So run each cell several times and let the score carry a confidence interval instead of pretending one shot is the truth:

# plain single-shot run
aeo-tracker run

# sample each cell N times — the score comes back with a Wilson confidence interval
aeo-tracker run --samples=5

With --samples=5, every (query × engine) cell is queried five times; the headline presence rate is then reported as a Wilson interval, and small samples are flagged as small rather than sold as certainty. The cost scales with the multiplier and the CLI tells you the new call count before it spends anything.

3. Read the report — every number is click-to-reveal

aeo-tracker report

report writes a single self-contained HTML file (and a markdown twin) into aeo-reports/<date>/ and opens the HTML in your browser. The point of the report is not the headline number — it is that every cell in the matrix is click-to-reveal: open it and you read the engine's verbatim answer for that question. Every number has a receipt; you see exactly how each engine answered, in its own words, behind the score.

Here is an illustrative fragment showing the shape of the report (CDN buyer basket).

Illustrative example — synthetic data for a fictional brand (Northwind CDN, northwind.example), not a real run. Shown only to demonstrate the report's shape.

Report element	What the report shows
Headline fact	Named in 5 of 18 answers (28% presence) — illustrative
A hit, clicked open	Gemini, "best low-latency video streaming CDN 2026": "Northwind CDN: a strong choice for ultra-low-latency real-time streaming, with a large global edge network..." — illustrative, not a real engine response
A second hit, clicked open	Claude, "best CDN providers 2026": "...providers worth shortlisting include Northwind CDN for its edge footprint..." — illustrative, not a real engine response
A miss, clicked open	ChatGPT, "best GPU cloud for AI inference 2026": names Vendor A, Vendor B, and a hyperscaler GPU cloud; the brand is absent — a gap in the raw text (illustrative)
Citation match	Gemini cited the brand's own domain (northwind.example) on "best CDN providers 2026" — counted because it is the registrable domain
Competitors (two-model verified)	CDN A, CDN B, CDN C, a hyperscaler CDN, a niche CDN (illustrative generic labels)
Hit-rate by intent	Split into core / adjacent / aspirational — so a low overall % reads in context: strong in your core, still reaching where it's aspirational
Disclaimer (header)	API surface via your keys (ChatGPT, Gemini, Claude) — a proxy, not the consumer apps; excludes Google AI Overviews / Copilot (no query API) and Perplexity (manual paste only, not part of the reproducible API run)

In a real run, the 28% is not a number to take on faith — click it open and under it sits the exact Gemini sentence that names the brand and the exact ChatGPT answer that leaves it out. (The figures above are illustrative; on your own domain the receipts are your engines' real words.)

Why this beats a closed score

It states what it does not measure. API surface via your keys, not the consumer app; no AI Overviews / Copilot.
The score is re-derivable. Open formula over the saved answers, with your own keys.
Competitors are dual-model verified — only brands two models both named, so the list does not hallucinate rivals.
A 0% is a hypothesis. Sample each cell N times, report a Wilson confidence interval; small samples are flagged as small.

One honest expectation: an audit is a readout, not a lever. Engines re-crawl on their own schedule, so a change you ship today usually shows up two to four weeks later.

Full version, the buyer walkthrough, and the engineering behind why the number is honest: on the Webappski blog. If you want it run on your site, Webappski does a free AEO audit.

Top comments (2)

Alex Shev • Jun 23

The raw-answer audit is the right instinct. A single visibility score can hide too much. For local and B2B search, I would also track which sources the AI answer cites or seems to trust, because that tells you what content gap to fix next.

Alex Shev • Jun 23

An honest AEO audit is more useful than a screenshot of one lucky answer. I would track query set, source mentions, factual errors, citation quality, competitor overlap, and whether the answer changes after the site facts improve.