<p>Your React Single Page Application (SPA) might look beautiful in the browser, but to AI web scrapers like <strong>GPTBot</strong> (OpenAI), <strong>ClaudeBot</strong> (Anthropic), <strong>Google-Extended</strong>, and <strong>PerplexityBot</strong>, it's a blank HTML shell with a single <code><div id="root"></div></code> and a JavaScript bundle URL.</p>
<p>These bots don't execute JavaScript. They see only your initial HTML response. If your content, meta tags, and structured data are injected client-side by React, <strong>AI models cannot index your content.</strong></p>
<h2>The Client-Side Rendering Problem</h2>
<p>A typical React SPA (built with Vite or Create React App) serves this initial HTML:</p>
<pre><code><!DOCTYPE html>
<html>
<head>
<title>My App</title>
</head>
<body>
<div id="root"></div>
<script src="/assets/main.abc123.js"></script>
</body>
</html>
<p>Everything else — page content, meta descriptions, JSON-LD, Open Graph tags — is injected by JavaScript after the bundle loads. AI scrapers receive only the empty shell above.</p>
<p>The result: your pages don't appear in AI-generated answers, Google AI Overviews can't cite your content, and ChatGPT with browsing reports that your page has no relevant content.</p>
<h2>Solution 1: react-helmet-async for Meta Tag Management</h2>
<p><code>react-helmet-async</code> is a React library that manages the document <code><head></code>. While it still relies on client-side rendering, it ensures that meta tags are injected consistently and can be pre-rendered by server-side solutions.</p>
<p>Key implementation details:</p>
<ul>
<li>Wrap your app in <code><HelmetProvider></code></li>
<li>Use the <code>data-rh="true"</code> attribute to prevent duplicate tags — the library will manage deduplication</li>
<li>Set the same <code>data-rh="true"</code> on your static HTML fallback tags so Helmet replaces them cleanly</li>
</ul>
<pre><code>import { Helmet } from 'react-helmet-async';
const SEO = ({ title, description, canonicalUrl }) => (
<Helmet>
<title data-rh="true">{title}</title>
<meta data-rh="true" name="description" content={description} />
<link data-rh="true" rel="canonical" href={canonicalUrl} />
<script type="application/ld+json">
{JSON.stringify({
"@context": "https://schema.org",
"@type": "WebPage",
"name": title,
"description": description
})}
</script>
</Helmet>
);
<p><strong>Important:</strong> <code>react-helmet-async</code> alone does not solve the AI scraper problem because it still requires JavaScript execution. You need to pair it with one of the pre-rendering solutions below.</p>
<h2>Solution 2: Pre-rendering with Headless Browsers</h2>
<p>Pre-rendering services like <a href="https://prerender.io" target="_blank" rel="noopener">Prerender.io</a> or self-hosted Puppeteer/Playwright instances detect bot user agents and serve a fully rendered HTML snapshot instead of the SPA shell.</p>
<p>The flow works like this:</p>
<ol>
<li>A bot requests <code>/blog/my-article</code></li>
<li>Your server detects the bot user agent (GPTBot, ClaudeBot, Googlebot, etc.)</li>
<li>Instead of serving the SPA shell, the server sends the request to a headless browser</li>
<li>The headless browser renders the React app, waits for content, and captures the final HTML</li>
<li>The fully rendered HTML (with all meta tags, JSON-LD, and content) is returned to the bot</li>
</ol>
<h3>User Agent Detection</h3>
<p>The most common AI bot user agents to detect:</p>
<ul>
<li><code>GPTBot</code> — OpenAI's web crawler for ChatGPT</li>
<li><code>ChatGPT-User</code> — ChatGPT browsing mode</li>
<li><code>ClaudeBot</code> — Anthropic's web crawler</li>
<li><code>PerplexityBot</code> — Perplexity AI's crawler</li>
<li><code>Google-Extended</code> — Google's AI training crawler</li>
<li><code>Googlebot</code> — Standard Google search crawler</li>
<li><code>Bingbot</code> — Microsoft Bing crawler</li>
</ul>
<h2>Solution 3: Edge Functions for Dynamic Meta Tags</h2>
<p>For hosting platforms like Firebase Hosting, Vercel, or Cloudflare Pages, you can use <strong>edge functions</strong> to intercept requests and inject meta tags and JSON-LD into the HTML response before it reaches the client.</p>
<p>On Firebase Hosting, this is done via <code>firebase.json</code> rewrites that route specific paths to a Cloud Function. The function reads the request path, looks up the page metadata, and injects it into the HTML template before returning the response.</p>
<pre><code>// Firebase Cloud Function (simplified)
exports.seo = functions.https.onRequest((req, res) => {
const path = req.path;
const metadata = getMetadataForPath(path);
const html = baseHtml
.replace('TITLE', metadata.title)
.replace('DESCRIPTION', metadata.description)
.replace('JSONLD', JSON.stringify(metadata.jsonLd));
res.status(200).send(html);
});
<h2>Solution 4: Static Site Generation (SSG) for Key Pages</h2>
<p>If your React SPA uses a build tool like Vite, you can pre-render critical pages at build time using plugins like <code>vite-plugin-ssr</code> or <code>vite-ssg</code>. This generates static HTML files for your most important pages (homepage, blog posts, product pages) while keeping the SPA experience for dynamic routes.</p>
<h2>Our Approach at AI Prompt Architect</h2>
<p>AI Prompt Architect is a React SPA built with Vite and <code>react-helmet-async</code>. We solve the AI scraper problem using a combination of:</p>
<ul>
<li><strong>Static fallback meta tags</strong> in <code>index.html</code> with <code>data-rh="true"</code> so they're available before JavaScript loads</li>
<li><strong>JSON-LD injection</strong> via our unified <code><SEO></code> component on every page</li>
<li><strong>A comprehensive <code>sitemap.xml</code></strong> with all 161+ URLs for crawler discovery</li>
<li><strong>Proper <code>robots.txt</code></strong> that allows GPTBot, ClaudeBot, and all major crawlers</li>
</ul>
<p>The result: our pages are cited in AI-generated search results and our structured data validates in Google's Rich Results Test. <a href="/signup">See it in action</a> — our entire platform is a living example of React SPA optimisation for AI scrapers.</p>
This article was originally published with extended interactive STCO schemas on AI Prompt Architect.
Top comments (0)