Every CLI Command in Our Own Blog Post Was Fabricated. Here's How We Caught Them.

#agents #ai #cli #testing

Last week I ran every shell command from one of our own blog posts against our CLI. Three of them didn't exist. One referenced an npm package that had never been published. The JSON-LD FAQ schema on the page confidently told AI search engines how to install and authenticate with a subsystem that wasn't real.

The post had been live for weeks. Nobody had noticed, because nobody had actually tried to copy-paste the commands.

This is a short write-up of how we caught it, what was fake versus real, and why the fix had to include editing the structured-data schema — not just the rendered prose.

The post

The page in question is a ~3,500-word tutorial titled "API-First AI Monitoring" on our site. It was written by an AI content agent, like most of our programmatic content, and had been lightly reviewed by a human (me) before shipping.

The tutorial had a section called "CLI and MCP Connector for Developer Workflows." Under it were two code blocks:

A Quick Start block showing npx foglift auth, npx foglift geo https://example.com, npx foglift audit --depth=deep --wait.
A CI/CD Integration block showing how to wire the same CLI into a GitHub Actions job.

Below that was a subsection titled MCP Connector, describing how to install @foglift/mcp-server from npm and register it with Claude Desktop.

All of this reads plausibly. All of it is wrong.

What was actually broken

Our real CLI is published on npm as foglift-scan, not foglift. The binary is foglift. The subcommands are scan, scan batch, scan ai-check, scan results, scan sentiment, scan usage, scan history, scan prompts, scan models — structured as foglift scan <subcommand>, never as bare foglift <subcommand>.

Concretely:

Post claimed	Reality
`npx foglift auth`	No such subcommand. Auth is via `FOGLIFT_API_KEY` env var.
`npx foglift geo <url>`	No such subcommand. The actual scan command is `foglift scan <url>`.
`npx foglift audit --depth=deep --wait`	No `audit` subcommand. No `--depth` or `--wait` flags exist anywhere in the CLI.
`npm install -g @foglift/mcp-server`	The package does not exist on npm. 404.

The fabrications share a pattern: they're plausible commands that a tool like ours would have, written in the idiom of well-known CLIs (gh auth, vercel deploy --prod, stripe listen --forward-to). They read like the AI had seen a lot of CLI documentation in training and was reasoning from prior structure rather than from the actual --help output.

How we caught it

Our dogfooding protocol says: every session, actually run the CLI against foglift.io and check whether the in-product recommendations line up with our own content. One week the agent running that protocol decided to also copy-paste the commands from the tutorial into a terminal, just to verify they worked.

The very first command — npx foglift auth — exited with unknown command. Five minutes later we had a list of six fabrications on that one page.

Copy-paste-and-run is the cheapest test I know of for AI-generated technical content. It catches the class of errors that linters, type checkers, grammar tools, and even human reviewers routinely miss: the commands read correctly, so unless you execute them, you don't see the problem.

This is the same class of failure that shows up in ACL 2023 research on hallucination in code generation — the Codex/GPT failures on package-name completion found that models fabricate module references at rates between 5% and 22% depending on domain specificity, and the fabrications are structurally indistinguishable from real references without external grounding.

For a tool whose entire pitch is "make your content trustworthy to AI search engines," having fake CLI commands on our own pages is a brand-damage problem, not just a correctness problem. We ship the fix the same day.

The part most people miss: the schema has to be fixed too

Modern technical blog posts are not just prose. They ship with embedded JSON-LD — specifically FAQPage, HowTo, and Article schemas — that AI search engines ingest directly and re-serve as answers.

Our tutorial's FAQPage schema contained a question like "Does Foglift have a CLI?" with an acceptedAnswer.text that said:

"Yes. Install the Foglift CLI with npm install -g @foglift/cli. Authenticate with foglift auth. Run foglift audit <url> to audit a site..."

When we fixed the rendered HTML but left the JSON-LD alone, we'd have left the AI-search-facing copy still poisoned. That's the surface the engines actually eat. AI crawlers (PerplexityBot, OAI-SearchBot, GPTBot, ClaudeBot, GoogleExtended) parse structured data preferentially over rendered body text for entity grounding — there's public documentation from Google's structured-data team and multiple citation-pattern studies from Otterly.AI's 2025 analysis confirming this behavior.

So the fix was three coordinated edits:

Rewrote the two code blocks against foglift-scan with real subcommands (foglift scan <url>, foglift scan ai-check --prompt "..." --domain foglift.io, etc.).
Removed the "MCP Connector" subsection and the heading "CLI and MCP Connector for Developer Workflows" since no MCP server ships.
Updated the acceptedAnswer.text in the FAQPage JSON-LD so both the rendered FAQ and the structured-data mirror tell the same true story.

Then we deployed, waited for the CDN cache to refresh, and re-grepped the live HTML for any residual fabrications. Zero matches across @foglift|foglift auth|foglift geo|foglift audit|mcp-server|--depth|--wait. Ten matches on the real names (foglift-scan, FOGLIFT_API_KEY, --threshold=80).

Bonus: the copy-paste test also found two real CLI bugs

Going through the post command-by-command surfaced two bugs in the CLI itself, not the content.

foglift scan prompts list and foglift scan prompts add fail with Error: workspace_id parameter required, even with a valid FOGLIFT_API_KEY exported. Other endpoints (scan results, scan sentiment, scan history) auto-resolve the workspace from the key. The prompts subcommand doesn't. That's a server-side resolver gap (fixed in the next release cycle — the API now auto-resolves workspace from the key, matching the other endpoints).
foglift --version reports 1.0.0 while the npm package metadata says 1.0.1. Cosmetic, low priority — but the kind of thing that undermines trust in a "we're the honest-evidence source" pitch.

Neither bug would have been caught by unit tests because both subcommands exist and parse correctly; they just fail at runtime against real credentials. The copy-paste-and-run test is what surfaced them.

What I'd change going forward

Three things, in the order we're now adopting them:

1. Every technical tutorial has to pass copy-paste-execute before merge. Not "the build passes." Not "lint clean." Literally open a terminal, paste every command, see it succeed. This is how I verify PRs now for content pages with CLI content in them. Add a CI check that extracts fenced code blocks tagged bash and runs them in a throwaway environment where it's safe to do so — I haven't built this yet but it's the next automation on the list.

2. JSON-LD has to pass the same truth test as the rendered HTML. The structured data is where AI engines form their beliefs about your product, and it's the easiest thing to forget when you're editing. Any content edit touching a HowTo, FAQPage, or Article block has to re-read the schema top-to-bottom. Better: move the structured data to be generated from the rendered content, so there's a single source of truth to audit.

3. Be suspicious of AI-generated content that sounds confident about specific strings. The commands in our tutorial were the most confident-sounding part of the post. Confidence plus specificity in AI-generated content is often a signal of fabrication, not accuracy — the model backfilled plausible-looking strings when it didn't actually have the facts. This pattern is reproducible enough that it shows up as a measurable signal in academic work (see Lin et al., "TruthfulQA", ACL 2022 and multiple follow-ups).

Closing

If you run AI-generated technical content on a site — and you probably do, because everyone does now — you almost certainly have fabricated commands, fake package names, or hallucinated API parameters somewhere in your content. The rendered prose will look fine. The schema will look fine. The build will pass. The readers who notice will mostly just close the tab instead of filing a bug.

The fix is boring: copy-paste the commands, run them, fix what breaks, and check the JSON-LD. But nobody is doing this, and AI search engines are ingesting the fabrications in the meantime.

We fixed ours in an afternoon. There's a good chance you can fix yours in one too.

We run Foglift — a GEO/AEO platform that audits sites for how AI search engines will interpret them. The CLI is npm install -g foglift-scan and runs foglift scan <url> against your site. (We checked.)