DEV Community

New Way Capital Advisory
New Way Capital Advisory

Posted on

Google indexed 2 of our 9,200 pages. ChatGPT answered questions about 3 of them the same day

Last week I ran site:https://nwc-advisory.com on Google. It returned 2 results.

The same day, our nginx logs showed ChatGPT fetching 3 of our pages in real time to answer user questions. One of them was a Burnham-on-Sea postcode. I don't know who asked.

Here is what 15 hours of crawler traffic looked like across our 12 domains:

Source Hits What it means

Googlebot 298 Crawling aggressively
Bingbot 217 Crawling aggressively
OAI-SearchBot 20 Building ChatGPT's search index
GPTBot 14 OpenAI training data
ClaudeBot 10 Anthropic crawler
ChatGPT-User 3 Real users getting our data via ChatGPT
PerplexityBot 0
Applebot 0

Google crawled us 298 times in 15 hours and has indexed 2 of our 9,152 canonical pages in the six weeks since we launched. OpenAI's three bots together crawled us 37 times and served 3 live answers from our pages the same day. Anthropic's ClaudeBot added another 10. Perplexity and Apple did not show up at all.

For a small SaaS with no domain authority and few backlinks, these are not the same distribution channel anymore.

The receipts:

Three log lines (I have removed the client IPs but kept everything else):

  • [10/Apr/2026:08:03:43 +0000] "nwc-advisory.com"
    "GET / HTTP/2.0" 200 38145 "-"
    "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);
    compatible; ChatGPT-User/1.0; +https://openai.com/bot"

  • [10/Apr/2026:09:06:21 +0000] "property.nwc-advisory.com"
    "GET /prices/sg7-5aa HTTP/2.0" 200 6831 "-"
    "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);
    compatible; ChatGPT-User/1.0; +https://openai.com/bot"

  • [10/Apr/2026:09:46:47 +0000] "property.nwc-advisory.com"
    "GET /prices/ta8-1aa HTTP/2.0" 200 6816 "-"
    "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);
    compatible; ChatGPT-User/1.0; +https://openai.com/bot"

OpenAI operates three distinct crawlers and our logs show all three in the same window:

  • GPTBot:training data collection
  • OAI-SearchBot:building the search index used inside ChatGPT
  • ChatGPT-User:real-time fetches triggered when a user asks ChatGPT a question

The first two are indexers, like Googlebot or Bingbot. The third is different. A ChatGPT-User hit means someone asked ChatGPT a question, and ChatGPT decided to fetch your page to answer them.

The three pages our logs:

  • Homepage (https://property.nwc-advisory.com)
  • /prices/sg7-5aa, Hertfordshire. 134 transactions last year, median £348,750.
  • /prices/ta8-1aa -- Burnham-on-Sea, Somerset. 223 transactions, median £275,000.

Someone asked ChatGPT about UK house prices in Baldock. Someone else asked about Burnham-on-Sea.

They got numbers. They never visited our site.

If you want to see the raw pages ChatGPT fetched, they are here:

Why our pages are AI-readable?

Nothing exotic. Three boring things.

  1. robots.txt whitelists AI bots

Most of the "how to block ChatGPT" posts got this backwards. We do the opposite:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent:
Disallow: /v1/
Disallow: /api/

The API endpoints are protected. The landing pages are wide open. Block what is monetized, allow what you want distributed.

  1. Server-rendered HTML, no SPA

Every landing page is a complete HTML document at request time. No React, no Vue, no client-side fetch() for the data. ChatGPT-User does not execute JavaScript -if your data is populated client-side, it is invisible to the fetch. We use a Python script that renders Jinja2 templates into static HTML files at build time. 9,152 pages across 12 domains, all pre-rendered, all around 6KB each.

  1. IndexNow, not just Google Search Console

IndexNow is a free push protocol that notifies Bing, Yandex, and a few others instantly when you publish a URL. Bing's index directly feeds ChatGPT Search, Grok, and Copilot. We submit every new page via IndexNow within seconds of deploying it. Waiting for Googlebot to re-crawl is slower by orders of magnitude, and for a low-authority domain, Google may never index the page at all.

Here is the Python snippet we use to classify bot traffic when we parse our nginx logs:

python
AI_BOTS = {
'ChatGPT-User': 'Live query - user asking ChatGPT now',
'OAI-SearchBot': 'ChatGPT search index',
'GPTBot': 'OpenAI training data',
'ClaudeBot': 'Anthropic crawler',
'PerplexityBot': 'Perplexity search',
'Bytespider': 'ByteDance / TikTok',
'Applebot': 'Apple / Siri',
}

SEARCH_BOTS = {
'Googlebot': 'Google Search',
'bingbot': 'Bing Search',
'DuckDuckBot': 'DuckDuckGo',
}

def classify(user_agent: str) -> str:
ua = user_agent.lower()
for bot, purpose in AI_BOTS.items():
if bot.lower() in ua:
return f"AI: {purpose}"
for bot, purpose in SEARCH_BOTS.items():
if bot.lower() in ua:
return f"Search: {purpose}"
return "Human"

Run that across your access log and you will see AI traffic showing up in categories you do not have dashboards for yet.

What we built that Google ignored

Because I want to be precise about this: we did the SEO work. Not a rushed job. Over six weeks we shipped:

  • 9,152 canonical tags across 12 domains
  • hreflang tags linking en to fr on the bilingual property apps
  • Sitemaps on all 12 domains, 9,157 URLs total
  • IndexNow batch submissions, 9,157 URLs pushed to Bing and Yandex
  • Google Search Console verified on every domain, sitemaps submitted
  • Disallow: /v1/ and Disallow: /api/ on all property apps so crawlers do not waste their budget on API endpoints
  • FAQPage JSON-LD on every landing page for rich snippets
  • Open Graph and Twitter Card meta on every domain

Result after six weeks: 2 URLs indexed on Google.The homepage (nwc-advisory.com/) and one NYC property page (property-nyc.nwc-advisory.com/comps/11225). That is the entire Google index for our brand.

This is not a Google bug. It is a domain authority floor -- Google rate-limits indexing for new domains without backlinks. The old playbook was: wait three to six months, build backlinks, eventually get indexed. The new playbook is: while you are waiting for Google, OpenAI and Microsoft are already crawling you and serving your data to their users.

Six takeaways if you are building something data-heavy

  1. Allow AI bots in robots.txt. Block sensitive paths, not the crawlers themselves.
  2. Server-render every data page.ChatGPT-Userdoes not execute JavaScript. If your data is in a fetch() call, it is invisible.
  3. Submit to IndexNow.Bing feeds ChatGPT Search, Grok, and Copilot. Waiting for Google is slower and less likely to work for new domains.
  4. Write long-tail pages for long-tail queries. ChatGPT's live fetches in our logs were for specific UK postcodes. One page per specific question.
  5. Parse your access logs weekly. Grep for ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot. You will see AI distribution working before it shows up in any dashboard.
  6. Keep doing SEO for Google anyway. Do both. But calibrate your expectations for the first six months - if your domain is new, Google is not a near-term channel.

Top comments (1)

Collapse
 
adnan-hasan profile image
Adnan Hasan

This is an eye-opening comparison that really challenges the "Google-first" SEO mindset. Your observation that ChatGPT-User fetches are happening in real-time for specific long-tail queries, even while Google is still gated by domain authority floors, is a major shift in how we should think about distribution.

The technical breakdown—especially regarding server-rendered HTML and the IndexNow protocol—is incredibly practical. It's a clear reminder that the "invisible" web for client-side JS is becoming a bigger liability in an AI-driven search landscape.

Thanks for sharing the nginx log classification script too—that's a great tool for anyone wanting to see the "hidden" AI traffic they’re already getting!