K. Bear

Posted on Apr 6

Making ChatGPT, Perplexity and Claude actually cite your SPA — a GEO field report

#seo #ai #webdev #javascript

The problem nobody warned me about

I shipped a single-HTML-file global conflict monitor (crisispulse.org) last week. Classic SPA: one index.html, D3 map, Netlify Functions for the backend, ~100KB total. It works beautifully in browsers.

Then I asked ChatGPT: "What does crisispulse.org do?"

"I don't have information about that specific website."

Perplexity: "I couldn't find reliable sources about crisispulse.org."

Claude: "This URL doesn't appear in my training data or available tools."

Three of the biggest answer engines in the world, and none of them could describe a site that literally exists and has a comprehensive <title>, meta description, and Open Graph tags. Why?

Because SPAs are invisible to AI crawlers in ways traditional SEO never had to worry about.

This post is a field report on what I changed, in what order, and what actually moved the needle. Everything here is open â you can diff the commit yourself: 2d94a57.

Why AI crawlers are different from Googlebot

Two inconvenient truths I had to internalize:

1. Most AI crawlers don't execute JavaScript

Googlebot renders JS (mostly). GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot â most of them just fetch the raw HTML, parse it, and move on. That means for a pure SPA:

<body>
  <div id="app"></div>
  <script src="bundle.js"></script>
</body>

â¦what the crawler sees is literally a div and a script tag. No content. No headings. No context. Zero signal to feed into an embedding.

2. AI crawlers are citation-optimizing, not click-optimizing

Traditional SEO is a ranking game: show my link first. GEO (Generative Engine Optimization) is a citation game: when the model answers a user's question, have it actually name-check and link to my site.

The optimization targets are different. Google wants to show you the best ten results. ChatGPT wants to write one paragraph that correctly attributes sources. To get cited, you need to be:

Easy to fetch (no JS, no login walls, no Cloudflare challenges on bot UAs)
Easy to parse (structured data, headings, lists, not walls of text)
Easy to summarize (short factual statements, not marketing prose)
Easy to cite (a canonical URL and a clear "what this is in one sentence")

None of that happens by default on a modern SPA.

The five things I changed

Here's the stack, in order of ROI (highest first):

1. A `<noscript>` SEO block

The simplest, cheapest win. Right after <body>, before the app mount point, I added a block that only exists for non-JS crawlers:

<noscript>
  <h1>Crisis Pulse â Global Conflict Monitor</h1>
  <p>Crisis Pulse is a free, single-file web app that tracks 25+ active
     global conflict zones in real time and generates a personalized
     emergency supply list based on your location.</p>

  <h2>Currently tracked conflicts</h2>
  <ul>
    <li>Russia-Ukraine war â Eastern Europe</li>
    <li>Gaza / Israel-Hamas conflict â Middle East</li>
    <li>Sudan civil war â North Africa</li>
    <li>Myanmar civil war â Southeast Asia</li>
    <!-- â¦21 moreâ¦ -->
  </ul>

  <h2>Frequently asked questions</h2>
  <h3>Is Crisis Pulse free?</h3>
  <p>Yes, completely free and open source. No sign-up required.</p>
  <!-- â¦7 more Q/A pairsâ¦ -->
</noscript>

Why this works: browsers with JS enabled never render it (users see your real UI). But every crawler that doesn't execute JS â which is most AI crawlers â gets a clean, structured, semantically-tagged summary of what your site is. H1, H2, H3, lists, and eight FAQ pairs in plain HTML. That's eight potential citations every time a user asks a related question.

Time to implement: 20 minutes. Impact: massive.

Counter-argument I've seen: "But Google dings you for hidden content!" The <noscript> element is explicitly allowed â it's the standards-defined way to expose content to non-JS agents. Google has confirmed this repeatedly. Don't conflate it with display:none spam.

2. A rich JSON-LD `@graph`

The second-biggest lever is structured data. Not the tiny WebSite snippet most tutorials show â a proper @graph with multiple linked entities:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "WebApplication",
      "@id": "https://crisispulse.org/#webapp",
      "name": "Crisis Pulse",
      "applicationCategory": "NewsApplication",
      "operatingSystem": "Any",
      "offers": { "@type": "Offer", "price": "0" },
      "featureList": [
        "Real-time global conflict map",
        "Daily intensity scoring (0-10)",
        "Personalized emergency supply calculator",
        "Bilingual EN/ZH support",
        "25+ tracked conflict zones"
      ],
      "inLanguage": ["en", "zh"],
      "isAccessibleForFree": true,
      "softwareVersion": "1.0.0"
    },
    {
      "@type": "Organization",
      "@id": "https://crisispulse.org/#org",
      "name": "Crisis Pulse",
      "url": "https://crisispulse.org",
      "sameAs": [
        "https://www.producthunt.com/products/crisis-pulse",
        "https://dev.to/xkbear"
      ]
    },
    {
      "@type": "FAQPage",
      "@id": "https://crisispulse.org/#faq",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "What is Crisis Pulse?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Crisis Pulse is a free, single-file web application that tracks 25+ active global conflicts in real time..."
          }
        }
        // â¦7 more questionsâ¦
      ]
    }
  ]
}
</script>

Three things to notice:

@graph with multiple entities, not one lonely WebApplication. Each entity has its own @id, so crawlers can deduplicate across pages.
FAQPage with 8 entries, directly in the schema, plus the same Q/A text in the <noscript> block. Belt and suspenders â the schema gives machine-readable intent, the noscript gives human-readable content. Both say the same thing.
sameAs linking to Product Hunt and Dev.to. This is the identity graph â it tells AI models "this site, this Product Hunt listing, this Dev.to profile are all the same project." Worth the 30 seconds.

3. `llms.txt` and `llms-full.txt`

This is the one that surprised me by mattering. llms.txt is a proposed standard (by Jeremy Howard) that's rapidly becoming the robots.txt equivalent for AI crawlers: a single, human-readable markdown file at your root that tells LLMs exactly what you are, in the format they want to consume.

My /llms.txt is ~40 lines:

# Crisis Pulse

> A free, single-HTML-file global conflict monitor and emergency
> supply calculator. Tracks 25+ active conflict zones with daily
> intensity scoring and personalized prep recommendations.

## Core features

- Real-time global conflict map (D3 + TopoJSON)
- Daily intensity scoring algorithm (0-10 scale)
- Emergency supply calculator based on geolocation
- Bilingual English / Simplified Chinese
- 100% free, no sign-up, no tracking

## Key facts

- Launched: 2026
- Open source: yes
- Architecture: single HTML file + Netlify Functions
- ...

And then a /llms-full.txt with the long version â every tracked conflict listed, the intensity scoring methodology, the tech stack, the philosophy, the roadmap.

Why two files? The short one is for the model's system prompt injection; the long one is for the crawler's deep index. Both live at the root, both are text/plain, both are ~10KB total. Zero cost, enormous context gain.

Caveat: llms.txt isn't a W3C standard and not every AI crawler reads it (yet). But Anthropic, Perplexity, and Cursor have publicly committed. ChatGPT is "exploring." Google hasn't said. The downside of shipping it is basically zero; the upside if it becomes table stakes is large.

4. A maximally permissive `robots.txt` for AI bots

By default, a lot of frameworks ship a robots.txt that's implicitly "crawl everything." But AI bots are increasingly checking for explicit allow directives because of the post-2023 opt-out backlash. If you want to be cited by ChatGPT, you probably want to go from "implicit allow" to "explicit allow."

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

# â¦and about ten moreâ¦

Sitemap: https://crisispulse.org/sitemap.xml

The list is long (I put 20+ bots in mine) because the AI crawler ecosystem is fragmented. Some bots fall back to the most restrictive directive they can match; an explicit Allow resolves ambiguity.

If you're running analytics on 402 responses, you'll see a surprising number of these UAs showing up the day after you deploy this.

5. Meta tags per bot

Small but free. In addition to the standard <meta name="robots">, you can ship bot-specific meta directives:

<meta name="robots" content="index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1">
<meta name="GPTBot" content="index, follow">
<meta name="ChatGPT-User" content="index, follow">
<meta name="ClaudeBot" content="index, follow">
<meta name="PerplexityBot" content="index, follow">
<meta name="Google-Extended" content="index, follow">
<meta name="category" content="Geopolitics, Emergency Preparedness, News Monitoring, OSINT">

max-snippet:-1 is the one that matters most â it tells crawlers "feel free to quote the entire page in an answer," which is exactly the behavior you want for citation.

What I skipped (and why)

A few things I chose not to do, and the reasoning:

Pre-rendering / SSG. Would work, but introduces a build step and breaks the "one HTML file" constraint that defines the project. The <noscript> block covers 90% of the value.
Separate /about, /faq, /features pages. Every "you need 10+ pages to rank" SEO guide recommends this. I disagree for AI crawlers â they prefer one authoritative page that's easy to cite over ten thin ones. One URL is one citation target.
Paid AI SEO tools. There's a whole class of "GEO dashboards" emerging. They're mostly wrapping curl + prompt templates. Skip until you have organic signal worth measuring.
Semantic HTML restructuring of the entire app. Diminishing returns. The <noscript> block gives crawlers what they need; the interactive app can stay divs-and-JS.

How I'm measuring this

Honest answer: it's early and the feedback loop is slow. Here's what I'm tracking:

Log file analysis for the UAs listed in robots.txt. Did GPTBot show up? ClaudeBot? How often? Netlify makes this trivial.
Manual citation checks every few days â ask ChatGPT, Perplexity, Claude, and You.com the same five questions about "global conflict tracker" and see whether crisispulse.org surfaces.
Referral traffic from chat.openai.com, perplexity.ai, you.com, claude.ai. These show up in analytics once you're indexed.
Brand search growth in Google Search Console â the boring but reliable leading indicator.

I'll write a follow-up in 30 days with actual numbers. No hype, whatever the result.

The meta point

SEO used to be about ranking in a list. GEO is about being the sentence the model writes. The optimization targets rhyme, but they're not the same â and they reward different trade-offs.

The encouraging thing is that most of it is boringly standards-compliant: use semantic HTML, ship structured data, respect noscript, write a clear one-sentence description of what your thing is. All of it is free, all of it takes a few hours, all of it is testable. You don't need an agency.

If you've shipped something and it's invisible to AI search, I'd genuinely love to hear what you tried â drop it in the comments and I'll diff our approaches.

Built with vanilla JS, shipped on Netlify, cited by (hopefully) an LLM near you. The whole thing is one HTML file on GitHub. If you want the full GEO commit, it's 2d94a57 â 5 files, 369 insertions.