DEV Community

Rob
Rob

Posted on • Originally published at vibescoder.dev

Your AI Strategy Has a Blind Spot: An SEO and AEO Audit of vibescoder.dev

I spend a lot of time thinking about how AI agents discover and consume content. I run a company that builds developer tools. I write a blog about building with AI agents. And most importantly, I'm married to a woman that runs an AI consulting practice. Through the home-office wall I've heard her warn many a client that they have a silent suppressor in their content strategy if they're a Cloudflare customer. She recommends a site audit.

And she was right. Until this morning, every major AI crawler was blocked from reading my site.

Not by choice. Not by misconfiguration. By a Cloudflare setting I'd already turned off — that got silently re-enabled by a different setting I didn't know existed.

If you're a content creator, marketer, or engineer who cares about whether ChatGPT, Perplexity, Google AI Overviews, or Claude can find your work — read this. The infrastructure between your content and your audience may be working against you.

The TL;DR for Non-Technical Readers

If you don't want to read the whole audit, here's what matters:

  1. Cloudflare's free tier blocks AI search engines by default. If your site uses Cloudflare (and millions do), your content may be invisible to ChatGPT, Perplexity, Claude, and Google's AI features — even if you never asked for that.

  2. There are now two categories of discoverability. Traditional SEO (Google search results) and AEO — Answer Engine Optimization (AI-powered search and assistants). You need both. They require different things.

  3. The fix for Cloudflare takes 60 seconds — but you have to know it exists. Go to Security → Settings → "Manage your robots.txt" and switch from "Instruct AI bots to not scrape content" to either "Content Signals Policy" or "Disable robots.txt configuration."

  4. There's a new file called llms.txt that's becoming the robots.txt for AI. It tells AI agents what your site is, what it covers, and where to find content. If you don't have one, you're leaving discoverability on the table.

The TL;DR for Technical Readers

We ran a full SEO + AEO audit against vibescoder.dev and found 20 issues across 4 severity levels. The highlights:

  • 4 P0 (critical): Cloudflare's managed robots.txt was blocking GPTBot, ClaudeBot, Google-Extended, and 5 others. RSS feed had wrong URL prefix (15 broken links). Sitemap.xml was referenced but returned 404. Duplicate User-agent: * blocks in robots.txt.
  • 6 P1 (high): No JSON-LD structured data. No llms.txt. No canonical URLs. No heading anchor IDs. Missing article:author/tag meta. Homepage force-dynamic.
  • Everything was fixed in a single session — 17 files changed, 428 insertions, pushed and deployed.

The commit: SEO/AEO overhaul.

The Audit

I asked my Coder agent to evaluate vibescoder.dev on two dimensions: traditional search engine optimization (SEO) and Answer Engine Optimization (AEO) — making the site discoverable and citable by AI agents like ChatGPT Search, Perplexity, Google AI Overviews, and Claude.

The agent cloned the engine repo, crawled the live site, inspected every response header, parsed every meta tag, and cross-referenced the codebase against both SEO and AEO best practices.

The results were humbling.

The Cloudflare Gotcha (Yes, Again)

I wrote about Cloudflare's AI crawler settings two weeks ago. In that post, I specifically called out that Cloudflare's free tier has "Block AI bots" and "AI Labyrinth" turned on by default. I explicitly turned both off. I even wrote this:

"If your site exists for thought leadership, you want AI services to find, index, and cite your content. Blocking AI crawlers is blocking your distribution channel."

I was right. And I was still blocked.

The problem: Cloudflare has a separate setting called "Manage your robots.txt" under Security → Settings. It's not the same as "Block AI bots." It's a newer feature that injects directives directly into your robots.txt file at the edge — after your origin server responds.

Here's what the agent found when it compared my repo's robots.txt (100 bytes, 7 lines) to what Cloudflare was actually serving:

Metric Value
My robots.txt 100 bytes, 7 lines
Content-Length header 100 (Vercel's original)
Actual response body 1,838 bytes, ~65 lines

Cloudflare was prepending 1,738 bytes of content — including Disallow: / rules for ClaudeBot, GPTBot, Google-Extended, Amazonbot, CCBot, Bytespider, and meta-externalagent — without updating the Content-Length header. The setting responsible? "Instruct AI bots to not scrape content," which was selected by default.

The fix: Security → Settings → "Manage your robots.txt" → select "Disable robots.txt configuration." This tells Cloudflare to stop modifying your robots.txt entirely. Your origin file gets served as-is.

Why "Disable" and not "Content Signals Policy"? The Content Signals option keeps a Content-Signal: ai-train=no directive, which tells AI crawlers not to use your content for model training. That sounds reasonable — but for a personal blog trying to maximize reach, being in the training corpus means AI models are more likely to know about you and reference your ideas. The risk it protects against (content absorbed without credit) is theoretical. The cost (reduced presence in AI systems) is concrete.

Gotcha #1: Cloudflare has three separate AI-related settings, and changing one doesn't affect the others. You need to check all three:

Setting Location What It Does
Block AI Bots Scope Security → Settings Deploys firewall rules blocking AI training crawlers
AI Labyrinth Security → Settings Injects fake content links to trap non-compliant bots
Manage your robots.txt Security → Settings Modifies robots.txt at the edge to add AI crawler directives

I had turned off #1 and #2 weeks ago. But #3 was still on — silently rewriting my robots.txt at the CDN layer.

Here's the full picture — the Security Overview flagging the AI-related action items, and each of the three settings:

What Is AEO?

AEO — Answer Engine Optimization — is the practice of making your content discoverable and citable by AI agents. (You'll also see it referred to as AI Engine Optimization or Agentic Engine Optimization — the discipline is new enough that the name is still settling.) It's the emerging counterpart to SEO. Where SEO focuses on Google's traditional index, AEO targets the systems that power ChatGPT Search, Perplexity, Google AI Overviews, Claude, and whatever comes next.

The key differences:

SEO AEO
Primary consumer Googlebot GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Content format HTML with meta tags Structured data (JSON-LD), plain text (llms.txt), RSS
Discovery mechanism Sitemap, backlinks, crawling Sitemap, RSS, llms.txt, structured data
Ranking signal PageRank, content quality, Core Web Vitals Authorship (Person schema + sameAs), recency, structured data
Citation style Blue link with snippet Inline citation with direct quote and link
Key enabler Canonical URLs, meta descriptions JSON-LD, llms.txt, heading anchors for deep linking

You need both. Many of the improvements help both. But some are AEO-specific.

AEO-Specific Changes

These improvements specifically target AI agent discoverability:

llms.txt and llms-full.txt

llms.txt is an emerging convention — think of it as robots.txt for AI comprehension rather than crawling. It tells AI agents what your site is, what topics it covers, and where to find content.

We created two files:

  • /llms.txt — a structured summary: site description, author, topics, key posts, and links
  • /llms-full.txt — a dynamic route that serves every published post's full content as plain text

The full-content version is the important one. When an AI agent wants to cite your work, it needs the actual content — not just metadata. llms-full.txt is a single endpoint that gives it everything.

Person Schema with sameAs

JSON-LD structured data tells AI engines who wrote something and where else that person exists online. The sameAs property connects identity across platforms:

{
  "@type": "Person",
  "name": "Rob Whiteley",
  "url": "https://vibescoder.dev/about",
  "jobTitle": "CEO",
  "sameAs": [
    "https://www.linkedin.com/in/rwhiteley",
    "https://github.com/carryologist",
    "https://x.com/rwhiteley0"
  ],
  "worksFor": {
    "@type": "Organization",
    "name": "Coder",
    "url": "https://coder.com"
  }
}
Enter fullscreen mode Exit fullscreen mode

When ChatGPT or Perplexity decides whether to cite "Rob Whiteley, CEO of Coder" in a response about AI-assisted development, this structured data is what gives it confidence in the attribution.

Full-Content RSS

The existing RSS feed only had <description> (a short excerpt). AI agents that consume RSS — and Perplexity in particular indexes it — get significantly more context from full-content feeds. We added <content:encoded> with the full post body, plus <author> and <managingEditor> tags.

Unblocking AI Crawlers

The Cloudflare fix described above. The single highest-impact AEO change — going from completely invisible to fully accessible.

SEO-Specific Changes

These target traditional Google search:

Sitemap.xml

robots.txt referenced it. It didn't exist. Every SEO tool and Google Search Console would flag this. We created src/app/sitemap.ts with dynamic generation — all posts, tags, and static pages with lastmod dates from the changelog.

Canonical URLs

No page had <link rel="canonical">. Without it, Google can treat URL variants (?utm_source=twitter, ?ref=hackernews) as separate pages. We added explicit canonical URLs to every page type — homepage, posts, about, tags, and individual tag pages.

Homepage Caching

The homepage was set to force-dynamic — every request hit the server with zero caching. For a blog that publishes daily at most, that's unnecessary. We switched to ISR with a 60-second revalidation window. (Vercel still serves it dynamically due to a cookies() call for admin detection — a future refactor.)

Custom 404 Page

The default Next.js 404 is a dead end. Our custom version shows recent posts and navigation links — keeping both users and crawlers moving through the site instead of bouncing.

Changes That Help Both

Most improvements benefit both SEO and AEO:

JSON-LD Structured Data

The single biggest miss. We added three schema types:

  • WebSite — site-level metadata with author info (every page)
  • BlogPosting — per-post schema with headline, dates, author, keywords, reading time (post pages)
  • BreadcrumbList — navigation hierarchy (post pages)

For SEO, this enables rich results in Google — article carousels, author info, breadcrumbs. For AEO, it's how AI engines understand content relationships and authorship with confidence.

Heading Anchor IDs

Added rehype-slug to the MDX pipeline. Every H2 and H3 now gets an auto-generated id attribute.

  • SEO: Google uses these for "jump to" links in search results and featured snippets.
  • AEO: AI agents cite specific sections via fragment URLs (#the-cloudflare-gotcha). Without heading IDs, citations can only link to the full page.

RSS Feed Fix

Every link in the RSS feed was a 404. The feed used /blog/ as the URL prefix, but the actual routes use /posts/. All 15 posts were broken. One-line fix, massive impact — RSS is a primary discovery mechanism for both Google and AI agents.

Article Meta Tags

Added article:author, article:tag, article:modified_time, and og:site_name to post OpenGraph metadata. These help both Google and AI engines categorize and attribute content correctly.

Image Improvements

MDX images now render inside <figure> with <figcaption> elements, and images without explicit alt text get an auto-generated fallback from the filename. Both changes improve how crawlers — traditional and AI — understand image content.

The Cloudflare Settings While We Were in the Dashboard

While fixing the robots.txt issue, we also optimized two other Cloudflare settings:

  • Early Hints — enabled. Cloudflare sends 103 Early Hints responses from the edge, letting browsers start loading fonts and CSS before Vercel even responds.
  • Smart Tiered Caching — enabled. Cloudflare edge nodes share cached content with each other, reducing origin hits. Ready to deliver benefits once ISR caching is fully enabled.
  • AI Labyrinth — confirmed still off. This injects fake content links to trap AI crawlers — the opposite of what a content site wants.

The Complete Scorecard

Every change, its impact, and whether it addresses AEO, SEO, or both:

Change Impact AEO SEO
Disable Cloudflare managed robots.txt Critical — AI crawlers could not access the site
Fix RSS feed URLs (/blog//posts/) Critical — all 15 RSS links were 404s
Create sitemap.xml Critical — referenced in robots.txt but returned 404
Consolidate robots.txt (disable CF injection) Critical — duplicate User-agent blocks caused ambiguity
Add JSON-LD structured data High — zero structured data across entire site
Create llms.txt + llms-full.txt High — no AI discovery files existed
Add canonical URLs to all pages High — no page declared itself as canonical
Add heading anchor IDs (rehype-slug) High — no deep linking possible
Add article:author, article:tag to OG meta High — tags and author missing from metadata
Add Person schema with sameAs High — no cross-platform identity linking
Switch homepage to ISR (revalidate: 60) Medium — every request was a cold server render
Add RSS author + full content (content:encoded) Medium — feed had excerpts only, no author
Add twitter:site and twitter:creator Medium — social cards had no account attribution
Create custom 404 page Medium — default 404 was a dead end
Wrap images in figure/figcaption Low — bare img tags with no semantic context
Alt text fallback from filenames Low — empty alt on content images
Remove x-powered-by header Low — minor information disclosure
Add humans.txt Low — minor authorship signal
Enable Cloudflare Early Hints Low — browsers preload assets faster
Enable Smart Tiered Caching Low — prepared for when ISR is fully active

Total: 20 changes. 13 help AEO. 17 help SEO. 11 help both.

What I Learned

AEO is a real discipline now, not a buzzword. The gap between "my content exists on the internet" and "AI agents can find, understand, and cite my content" is significant. Structured data, llms.txt, full-content RSS, heading anchors — these aren't nice-to-haves. They're the difference between being in the AI conversation and being invisible to it.

Your CDN can silently undermine your content strategy. This is the one that stings. I knew about the Cloudflare AI bot setting. I wrote a blog post about turning it off. And a different setting — one I didn't know existed — was doing the same thing through a different mechanism. If you use Cloudflare, check your robots.txt right now. Not the file in your repo — the one Cloudflare is actually serving. curl https://yoursite.com/robots.txt and compare it to what you expect.

The audit paid for itself in the first finding. Everything else — the JSON-LD, the canonical URLs, the sitemap — those are incremental improvements that compound over time. But the Cloudflare fix was binary: invisible → visible. Every day that setting was on was a day ChatGPT Search, Perplexity, and Google AI Overviews couldn't index my content.

What's Next

The one thing we identified but didn't implement: FAQPage schema for how-to posts. Several posts follow a problem/solution pattern that could surface as direct answers in AI search. The frontmatter already has a type field distinguishing how-to from opinion — the infrastructure is there. That's next.

By the Numbers

  • 1,738 bytes of robots.txt injected by Cloudflare without updating Content-Length
  • 8 AI crawlers blocked (GPTBot, ClaudeBot, Google-Extended, Amazonbot, CCBot, Bytespider, Applebot-Extended, meta-externalagent)
  • 15 RSS feed links returning 404 — every single one
  • 03 JSON-LD schema types (WebSite, BlogPosting, BreadcrumbList)
  • 05 pages with canonical URLs
  • 17 files changed, 428 lines added
  • 3 Cloudflare settings that control AI crawlers — and you have to check all of them
  • 60 seconds to fix the Cloudflare setting that was blocking all AI visibility
  • ~2 hours for the full audit and implementation of all 20 changes
  • 1 blog post that I thought had solved this problem — it hadn't

Top comments (0)