Rob

Posted on May 7 • Originally published at vibescoder.dev

Your AI Strategy Has a Blind Spot: An SEO and AEO Audit of vibescoder.dev

#seo #aeo #cloudflare #agents

I spend a lot of time thinking about how AI agents discover and consume content. I run a company that builds developer tools. I write a blog about building with AI agents. And most importantly, I'm married to a woman that runs an AI consulting practice. Through the home-office wall I've heard her warn many a client that they have a silent suppressor in their content strategy if they're a Cloudflare customer. She recommends a site audit.

And she was right. Until this morning, every major AI crawler was blocked from reading my site.

Not by choice. Not by misconfiguration. By a Cloudflare setting I'd already turned off — that got silently re-enabled by a different setting I didn't know existed.

If you're a content creator, marketer, or engineer who cares about whether ChatGPT, Perplexity, Google AI Overviews, or Claude can find your work — read this. The infrastructure between your content and your audience may be working against you.

The TL;DR for Non-Technical Readers

If you don't want to read the whole audit, here's what matters:

Cloudflare's free tier blocks AI search engines by default. If your site uses Cloudflare (and millions do), your content may be invisible to ChatGPT, Perplexity, Claude, and Google's AI features — even if you never asked for that.
There are now two categories of discoverability. Traditional SEO (Google search results) and AEO — Answer Engine Optimization (AI-powered search and assistants). You need both. They require different things.
The fix for Cloudflare takes 60 seconds — but you have to know it exists. Go to Security → Settings → "Manage your robots.txt" and switch from "Instruct AI bots to not scrape content" to either "Content Signals Policy" or "Disable robots.txt configuration."
There's a new file called llms.txt that's becoming the robots.txt for AI. It tells AI agents what your site is, what it covers, and where to find content. If you don't have one, you're leaving discoverability on the table.

The TL;DR for Technical Readers

We ran a full SEO + AEO audit against vibescoder.dev and found 20 issues across 4 severity levels. The highlights:

4 P0 (critical): Cloudflare's managed robots.txt was blocking GPTBot, ClaudeBot, Google-Extended, and 5 others. RSS feed had wrong URL prefix (15 broken links). Sitemap.xml was referenced but returned 404. Duplicate User-agent: * blocks in robots.txt.
6 P1 (high): No JSON-LD structured data. No llms.txt. No canonical URLs. No heading anchor IDs. Missing article:author/tag meta. Homepage force-dynamic.
Everything was fixed in a single session — 17 files changed, 428 insertions, pushed and deployed.

The commit: SEO/AEO overhaul.

The Audit

I asked my Coder agent to evaluate vibescoder.dev on two dimensions: traditional search engine optimization (SEO) and Answer Engine Optimization (AEO) — making the site discoverable and citable by AI agents like ChatGPT Search, Perplexity, Google AI Overviews, and Claude.

The agent cloned the engine repo, crawled the live site, inspected every response header, parsed every meta tag, and cross-referenced the codebase against both SEO and AEO best practices.

The results were humbling.

The Cloudflare Gotcha (Yes, Again)

I wrote about Cloudflare's AI crawler settings two weeks ago. In that post, I specifically called out that Cloudflare's free tier has "Block AI bots" and "AI Labyrinth" turned on by default. I explicitly turned both off. I even wrote this:

"If your site exists for thought leadership, you want AI services to find, index, and cite your content. Blocking AI crawlers is blocking your distribution channel."

I was right. And I was still blocked.

The problem: Cloudflare has a separate setting called "Manage your robots.txt" under Security → Settings. It's not the same as "Block AI bots." It's a newer feature that injects directives directly into your robots.txt file at the edge — after your origin server responds.

Here's what the agent found when it compared my repo's robots.txt (100 bytes, 7 lines) to what Cloudflare was actually serving:

Metric	Value
My robots.txt	100 bytes, 7 lines
Content-Length header	100 (Vercel's original)
Actual response body	1,838 bytes, ~65 lines

Cloudflare was prepending 1,738 bytes of content — including Disallow: / rules for ClaudeBot, GPTBot, Google-Extended, Amazonbot, CCBot, Bytespider, and meta-externalagent — without updating the Content-Length header. The setting responsible? "Instruct AI bots to not scrape content," which was selected by default.

The fix: Security → Settings → "Manage your robots.txt" → select "Disable robots.txt configuration." This tells Cloudflare to stop modifying your robots.txt entirely. Your origin file gets served as-is.

Why "Disable" and not "Content Signals Policy"? The Content Signals option keeps a Content-Signal: ai-train=no directive, which tells AI crawlers not to use your content for model training. That sounds reasonable — but for a personal blog trying to maximize reach, being in the training corpus means AI models are more likely to know about you and reference your ideas. The risk it protects against (content absorbed without credit) is theoretical. The cost (reduced presence in AI systems) is concrete.

Gotcha #1: Cloudflare has three separate AI-related settings, and changing one doesn't affect the others. You need to check all three:

Setting	Location	What It Does
Block AI Bots Scope	Security → Settings	Deploys firewall rules blocking AI training crawlers
AI Labyrinth	Security → Settings	Injects fake content links to trap non-compliant bots
Manage your robots.txt	Security → Settings	Modifies robots.txt at the edge to add AI crawler directives

I had turned off #1 and #2 weeks ago. But #3 was still on — silently rewriting my robots.txt at the CDN layer.

Here's the full picture — the Security Overview flagging the AI-related action items, and each of the three settings:

What Is AEO?

AEO — Answer Engine Optimization — is the practice of making your content discoverable and citable by AI agents. (You'll also see it referred to as AI Engine Optimization or Agentic Engine Optimization — the discipline is new enough that the name is still settling.) It's the emerging counterpart to SEO. Where SEO focuses on Google's traditional index, AEO targets the systems that power ChatGPT Search, Perplexity, Google AI Overviews, Claude, and whatever comes next.

The key differences:

	SEO	AEO
Primary consumer	Googlebot	GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Content format	HTML with meta tags	Structured data (JSON-LD), plain text (llms.txt), RSS
Discovery mechanism	Sitemap, backlinks, crawling	Sitemap, RSS, llms.txt, structured data
Ranking signal	PageRank, content quality, Core Web Vitals	Authorship (Person schema + sameAs), recency, structured data
Citation style	Blue link with snippet	Inline citation with direct quote and link
Key enabler	Canonical URLs, meta descriptions	JSON-LD, llms.txt, heading anchors for deep linking

You need both. Many of the improvements help both. But some are AEO-specific.

AEO-Specific Changes

These improvements specifically target AI agent discoverability:

llms.txt and llms-full.txt

llms.txt is an emerging convention — think of it as robots.txt for AI comprehension rather than crawling. It tells AI agents what your site is, what topics it covers, and where to find content.

We created two files:

/llms.txt — a structured summary: site description, author, topics, key posts, and links
/llms-full.txt — a dynamic route that serves every published post's full content as plain text

The full-content version is the important one. When an AI agent wants to cite your work, it needs the actual content — not just metadata. llms-full.txt is a single endpoint that gives it everything.

Person Schema with `sameAs`

JSON-LD structured data tells AI engines who wrote something and where else that person exists online. The sameAs property connects identity across platforms:

{
  "@type": "Person",
  "name": "Rob Whiteley",
  "url": "https://vibescoder.dev/about",
  "jobTitle": "CEO",
  "sameAs": [
    "https://www.linkedin.com/in/rwhiteley",
    "https://github.com/carryologist",
    "https://x.com/rwhiteley0"
  ],
  "worksFor": {
    "@type": "Organization",
    "name": "Coder",
    "url": "https://coder.com"
  }
}

When ChatGPT or Perplexity decides whether to cite "Rob Whiteley, CEO of Coder" in a response about AI-assisted development, this structured data is what gives it confidence in the attribution.

Full-Content RSS

The existing RSS feed only had <description> (a short excerpt). AI agents that consume RSS — and Perplexity in particular indexes it — get significantly more context from full-content feeds. We added <content:encoded> with the full post body, plus <author> and <managingEditor> tags.

Unblocking AI Crawlers

The Cloudflare fix described above. The single highest-impact AEO change — going from completely invisible to fully accessible.

SEO-Specific Changes

These target traditional Google search:

Sitemap.xml

robots.txt referenced it. It didn't exist. Every SEO tool and Google Search Console would flag this. We created src/app/sitemap.ts with dynamic generation — all posts, tags, and static pages with lastmod dates from the changelog.

Canonical URLs

No page had <link rel="canonical">. Without it, Google can treat URL variants (?utm_source=twitter, ?ref=hackernews) as separate pages. We added explicit canonical URLs to every page type — homepage, posts, about, tags, and individual tag pages.

Homepage Caching

The homepage was set to force-dynamic — every request hit the server with zero caching. For a blog that publishes daily at most, that's unnecessary. We switched to ISR with a 60-second revalidation window. (Vercel still serves it dynamically due to a cookies() call for admin detection — a future refactor.)

Custom 404 Page

The default Next.js 404 is a dead end. Our custom version shows recent posts and navigation links — keeping both users and crawlers moving through the site instead of bouncing.

Changes That Help Both

Most improvements benefit both SEO and AEO:

JSON-LD Structured Data

The single biggest miss. We added three schema types:

WebSite — site-level metadata with author info (every page)
BlogPosting — per-post schema with headline, dates, author, keywords, reading time (post pages)
BreadcrumbList — navigation hierarchy (post pages)

For SEO, this enables rich results in Google — article carousels, author info, breadcrumbs. For AEO, it's how AI engines understand content relationships and authorship with confidence.

Heading Anchor IDs

Added rehype-slug to the MDX pipeline. Every H2 and H3 now gets an auto-generated id attribute.

SEO: Google uses these for "jump to" links in search results and featured snippets.
AEO: AI agents cite specific sections via fragment URLs (#the-cloudflare-gotcha). Without heading IDs, citations can only link to the full page.

RSS Feed Fix

Every link in the RSS feed was a 404. The feed used /blog/ as the URL prefix, but the actual routes use /posts/. All 15 posts were broken. One-line fix, massive impact — RSS is a primary discovery mechanism for both Google and AI agents.

Article Meta Tags

Added article:author, article:tag, article:modified_time, and og:site_name to post OpenGraph metadata. These help both Google and AI engines categorize and attribute content correctly.

Image Improvements

MDX images now render inside <figure> with <figcaption> elements, and images without explicit alt text get an auto-generated fallback from the filename. Both changes improve how crawlers — traditional and AI — understand image content.

The Cloudflare Settings While We Were in the Dashboard

While fixing the robots.txt issue, we also optimized two other Cloudflare settings:

Early Hints — enabled. Cloudflare sends 103 Early Hints responses from the edge, letting browsers start loading fonts and CSS before Vercel even responds.
Smart Tiered Caching — enabled. Cloudflare edge nodes share cached content with each other, reducing origin hits. Ready to deliver benefits once ISR caching is fully enabled.
AI Labyrinth — confirmed still off. This injects fake content links to trap AI crawlers — the opposite of what a content site wants.

The Complete Scorecard

Every change, its impact, and whether it addresses AEO, SEO, or both:

Change	Impact	AEO	SEO
Disable Cloudflare managed robots.txt	Critical — AI crawlers could not access the site	✅	—
Fix RSS feed URLs (`/blog/` → `/posts/`)	Critical — all 15 RSS links were 404s	✅	✅
Create sitemap.xml	Critical — referenced in robots.txt but returned 404	✅	✅
Consolidate robots.txt (disable CF injection)	Critical — duplicate User-agent blocks caused ambiguity	—	✅
Add JSON-LD structured data	High — zero structured data across entire site	✅	✅
Create llms.txt + llms-full.txt	High — no AI discovery files existed	✅	—
Add canonical URLs to all pages	High — no page declared itself as canonical	—	✅
Add heading anchor IDs (rehype-slug)	High — no deep linking possible	✅	✅
Add article:author, article:tag to OG meta	High — tags and author missing from metadata	✅	✅
Add Person schema with sameAs	High — no cross-platform identity linking	✅	✅
Switch homepage to ISR (revalidate: 60)	Medium — every request was a cold server render	—	✅
Add RSS author + full content (`content:encoded`)	Medium — feed had excerpts only, no author	✅	✅
Add twitter:site and twitter:creator	Medium — social cards had no account attribution	—	✅
Create custom 404 page	Medium — default 404 was a dead end	—	✅
Wrap images in figure/figcaption	Low — bare img tags with no semantic context	✅	✅
Alt text fallback from filenames	Low — empty alt on content images	✅	✅
Remove x-powered-by header	Low — minor information disclosure	—	✅
Add humans.txt	Low — minor authorship signal	✅	—
Enable Cloudflare Early Hints	Low — browsers preload assets faster	—	✅
Enable Smart Tiered Caching	Low — prepared for when ISR is fully active	—	✅

Total: 20 changes. 13 help AEO. 17 help SEO. 11 help both.

What I Learned

AEO is a real discipline now, not a buzzword. The gap between "my content exists on the internet" and "AI agents can find, understand, and cite my content" is significant. Structured data, llms.txt, full-content RSS, heading anchors — these aren't nice-to-haves. They're the difference between being in the AI conversation and being invisible to it.

Your CDN can silently undermine your content strategy. This is the one that stings. I knew about the Cloudflare AI bot setting. I wrote a blog post about turning it off. And a different setting — one I didn't know existed — was doing the same thing through a different mechanism. If you use Cloudflare, check your robots.txt right now. Not the file in your repo — the one Cloudflare is actually serving. curl https://yoursite.com/robots.txt and compare it to what you expect.

The audit paid for itself in the first finding. Everything else — the JSON-LD, the canonical URLs, the sitemap — those are incremental improvements that compound over time. But the Cloudflare fix was binary: invisible → visible. Every day that setting was on was a day ChatGPT Search, Perplexity, and Google AI Overviews couldn't index my content.

What's Next

The one thing we identified but didn't implement: FAQPage schema for how-to posts. Several posts follow a problem/solution pattern that could surface as direct answers in AI search. The frontmatter already has a type field distinguishing how-to from opinion — the infrastructure is there. That's next.

By the Numbers

1,738 bytes of robots.txt injected by Cloudflare without updating Content-Length
8 AI crawlers blocked (GPTBot, ClaudeBot, Google-Extended, Amazonbot, CCBot, Bytespider, Applebot-Extended, meta-externalagent)
15 RSS feed links returning 404 — every single one
0 → 3 JSON-LD schema types (WebSite, BlogPosting, BreadcrumbList)
0 → 5 pages with canonical URLs
17 files changed, 428 lines added
3 Cloudflare settings that control AI crawlers — and you have to check all of them
60 seconds to fix the Cloudflare setting that was blocking all AI visibility
~2 hours for the full audit and implementation of all 20 changes
1 blog post that I thought had solved this problem — it hadn't