Author: Alex Isa (Webappski). This is the dev-tutorial cut of a longer piece on the Webappski blog — terminal-first, fewer words on the why.
If a buyer asks ChatGPT "best CDN providers 2026" and your product is not in the answer, you lose the sale before you ever see the lead. The only honest way to know whether that is happening is to ask the engines the questions your buyers ask and read the raw answers — not trust a single dashboard score.
Here is the loop we at Webappski run for a client, with the open-source tool aeo-platform (MIT, zero runtime deps).
1. Install and point it at the client's domain
npm install -g aeo-platform
cd client-audit && aeo-tracker init
init writes a .aeo-tracker.json. The three things that matter:
{
"brand": "Northwind CDN", // illustrative, fictional brand
"domain": "northwind.example", // registrable domain — subdomains count, spoof hosts don't
"engines": ["openai", "gemini", "anthropic"], // ChatGPT, Gemini, Claude
"queries": [
"best CDN providers 2026",
"best low-latency video streaming CDN 2026",
"alternatives to the market-leading CDN 2026"
]
}
The questions ARE the audit. A basket of vanity phrases produces a flattering, useless number; a basket of the buyer's real decision questions produces a number that predicts revenue. Freeze it, so next month's run is comparable.
2. Run it — sampled, not one noisy shot
AI answers are non-deterministic: ask the same question twice and you can get a different list. A single pass turns that noise into a fake-precise number. So run each cell several times and let the score carry a confidence interval instead of pretending one shot is the truth:
# plain single-shot run
aeo-tracker run
# sample each cell N times — the score comes back with a Wilson confidence interval
aeo-tracker run --samples=5
With --samples=5, every (query × engine) cell is queried five times; the headline presence rate is then reported as a Wilson interval, and small samples are flagged as small rather than sold as certainty. The cost scales with the multiplier and the CLI tells you the new call count before it spends anything.
3. Read the report — every number is click-to-reveal
aeo-tracker report
report writes a single self-contained HTML file (and a markdown twin) into aeo-reports/<date>/ and opens the HTML in your browser. The point of the report is not the headline number — it is that every cell in the matrix is click-to-reveal: open it and you read the engine's verbatim answer for that question. Every number has a receipt; you see exactly how each engine answered, in its own words, behind the score.
Here is an illustrative fragment showing the shape of the report (CDN buyer basket).
Illustrative example — synthetic data for a fictional brand (Northwind CDN, northwind.example), not a real run. Shown only to demonstrate the report's shape.
| Report element | What the report shows |
|---|---|
| Headline fact | Named in 5 of 18 answers (28% presence) — illustrative |
| A hit, clicked open | Gemini, "best low-latency video streaming CDN 2026": "Northwind CDN: a strong choice for ultra-low-latency real-time streaming, with a large global edge network..." — illustrative, not a real engine response |
| A second hit, clicked open | Claude, "best CDN providers 2026": "...providers worth shortlisting include Northwind CDN for its edge footprint..." — illustrative, not a real engine response |
| A miss, clicked open | ChatGPT, "best GPU cloud for AI inference 2026": names Vendor A, Vendor B, and a hyperscaler GPU cloud; the brand is absent — a gap in the raw text (illustrative) |
| Citation match | Gemini cited the brand's own domain (northwind.example) on "best CDN providers 2026" — counted because it is the registrable domain |
| Competitors (two-model verified) | CDN A, CDN B, CDN C, a hyperscaler CDN, a niche CDN (illustrative generic labels) |
| Hit-rate by intent | Split into core / adjacent / aspirational — so a low overall % reads in context: strong in your core, still reaching where it's aspirational |
| Disclaimer (header) | API surface via your keys (ChatGPT, Gemini, Claude) — a proxy, not the consumer apps; excludes Google AI Overviews / Copilot (no query API) and Perplexity (manual paste only, not part of the reproducible API run) |
In a real run, the 28% is not a number to take on faith — click it open and under it sits the exact Gemini sentence that names the brand and the exact ChatGPT answer that leaves it out. (The figures above are illustrative; on your own domain the receipts are your engines' real words.)
Why this beats a closed score
- It states what it does not measure. API surface via your keys, not the consumer app; no AI Overviews / Copilot.
- The score is re-derivable. Open formula over the saved answers, with your own keys.
- Competitors are dual-model verified — only brands two models both named, so the list does not hallucinate rivals.
- A 0% is a hypothesis. Sample each cell N times, report a Wilson confidence interval; small samples are flagged as small.
One honest expectation: an audit is a readout, not a lever. Engines re-crawl on their own schedule, so a change you ship today usually shows up two to four weeks later.
Full version, the buyer walkthrough, and the engineering behind why the number is honest: on the Webappski blog. If you want it run on your site, Webappski does a free AEO audit.
Top comments (0)