Originally published at thatdevpro.com. Part of ThatDevPro's open SEO + AI framework library. ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio. Open-source AI citation toolkit: github.com/Janady13/aio-surfaces.
Earning Citations and Brand Mentions on OpenAI's Web Search Augmented Surfaces: ChatGPT Search, SearchGPT, ChatGPT Agent and Atlas
A comprehensive installation and audit reference for winning citation and brand mention on OpenAI's search surface family. ChatGPT Search (browsing inside ChatGPT, launched October 31, 2024 per OpenAI's announcement), the SearchGPT prototype (July 25, 2024 per OpenAI), the ChatGPT agent (integrated July 17, 2025, replacing the Operator preview that shut down August 31, 2025), and ChatGPT Atlas (the OpenAI Chromium based browser, October 21, 2025) share a bot family, an index dependency, and a retrieval pattern, but each surface produces distinct citation behavior. This framework specifies the structural, schema, freshness, entity, and access signals that drive citation on those surfaces, the bot family and how each member behaves, the audit rubric, and the maintenance cadence. Dual purpose: installation manual and audit document.
Cross stack implementation note: code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents see framework-cross-stack-implementation.md. For pure client rendered SPAs (no SSR/SSG) see framework-react.md. For Tailwind specific concerns see framework-tailwind.md.
1. Document Purpose
1.1 What This Document Is
This is the canonical operational reference for citation on OpenAI's web search augmented surfaces. ChatGPT reached approximately 800 million weekly active users by October 2025 and approximately 900 million weekly active users by February 2026 per TechCrunch and the Slashdot summary of Sam Altman's October 2025 statement, processing approximately 2.5 billion prompts per day and 18 billion messages per week per DemandSage's 2026 statistics aggregation. ChatGPT Search alone is estimated at approximately 250 to 500 million weekly queries per the Similarweb 2026 AI Search report, placing it among the top five global search properties by query volume.
The OpenAI surface family is structurally distinct from Google's AI Overview surface. Different bot family. Different index dependency. Different reading mode behavior. Different citation density per response. Different brand mention to citation ratio. Different freshness window. Different source preferences (Wikipedia and Reddit dominance, per the 5W Public Relations Q1 2026 research finding 13.15 percent Wikipedia and 11.97 percent Reddit share of US ChatGPT citations across the analyzed sample). Optimization that earns Google AI Overview citation does not automatically earn ChatGPT Search citation, and vice versa. This document covers the OpenAI specific signals.
For broader multi engine AI citation work spanning Claude, Perplexity, Gemini, Copilot, Grok, and Meta AI see framework-aicitations.md. For Google AI Overviews and AI Mode see framework-aioverviews.md. For the substrate doctrine that every AI surface optimization depends on see framework-contentfirst.md.
1.2 Three Operating Modes
Mode A, Install Mode. Build ChatGPT Search ready infrastructure on a new or existing site. Follow Sections 2 through 14 in order.
Mode B, Audit Mode. Evaluate an existing site for current OpenAI surface citation state. Skip to Section 13.
Mode C, Hybrid Mode. Audit first, then install for failing items.
1.3 How Claude Code CLI Should Consume This Document
- Read Section 2 and collect client variables, especially current ChatGPT citation state, llms.txt status, and bot access status.
- Run Section 13 audit on representative pages to baseline current state.
- Apply Section 5 reading mode and Section 6 ranking signals to priority pages.
- Configure Section 8 robots.txt and Section 9 llms.txt access patterns.
- Establish Section 13.4 first 90 days tracking before measuring lift.
- Apply Section 14 maintenance cadence.
- Generate the Section 14 report.
1.4 Conflict Resolution Rules
| Conflict | Rule |
|---|---|
| Page ranks classically but fails reading mode | Critical. Apply Section 5. ChatGPT bots do not execute JS. |
| OAI-SearchBot disallowed | Critical. Section 8 specifies how to allow it while keeping GPTBot opt out. |
| GPTBot disallowed but OAI-SearchBot allowed | Acceptable. Bots independently controlled per OpenAI. Section 4. |
| llms.txt as primary AI surface map | Caution. No major AI company has committed to reading it in production per SERanking November 2025. Section 9. |
| FAQ accordion built with JS reveal | Replace with <details> and <summary>. Content must live in DOM at first byte. |
| Client side rendered SPA | Critical. Bots produce empty output. Apply framework-contentfirst.md first. |
1.5 Required Tools
-
curlwith custom user agent for substrate verification - Server log access to identify GPTBot, OAI-SearchBot, ChatGPT-User, and ChatGPT agent traffic by user agent and source IP range
- An AI citation tracking tool: Profound, Otterly, Athena HQ, or equivalent (comparison in framework-aicitations.md)
- Manual sampling discipline: weekly priority query sampling in ChatGPT Search with browsing enabled
- IndexNow submission infrastructure (Bing IndexNow) for freshness signaling since ChatGPT Search retrieval still uses Bing infrastructure as its primary index layer per the Yoast and ClickRank 2026 ChatGPT Search analysis
1.6 Relationship to Neighboring Frameworks
This framework covers ChatGPT Search, SearchGPT, ChatGPT agent, and ChatGPT Atlas. The broader AI citation surface lives in framework-aicitations.md. The Google AI Overview and AI Mode surface lives in framework-aioverviews.md. The substrate prerequisite lives in framework-contentfirst.md. The four pillars architecture (SEO, AEO, AIO, GEO) lives in SEO-Search-Appearance.md. The agentic browsing playbook (ChatGPT agent, Operator successors, Atlas) cross references framework-agenticaisearch.md.
2. Client Variables Intake
# SEARCHGPT AND CHATGPT SEARCH FRAMEWORK CLIENT VARIABLES
# --- Business and Site Identity (REQUIRED) ---
business_name: ""
primary_domain: ""
business_industry: ""
# --- Current ChatGPT Citation State (REQUIRED) ---
chatgpt_search_citation_status: "" # "regularly_cited" | "occasionally_cited" | "rarely_cited" | "never_cited" | "unknown"
queries_currently_cited_in_chatgpt_search: 0
priority_queries: [] # 10 to 25 queries the site targets for ChatGPT Search citation
priority_queries_with_browsing_triggered: 0 # of those 25, how many trigger ChatGPT Search vs being answered from training only
brand_mention_rate_in_chatgpt: "" # "high" | "medium" | "low" | "none"
brand_mention_vs_citation_ratio: 0 # mentions per cited link, BrightEdge benchmark 3.2
# --- Bot Access Status (REQUIRED) ---
gptbot_allowed: false # for training data; site policy may explicitly opt out
oai_searchbot_allowed: false # for ChatGPT Search retrieval. Block this and citation is impossible.
chatgpt_user_allowed: false # user invoked browsing fetches
chatgpt_agent_allowed: false # agent browser fetches
verified_via_server_logs: # which bots have been observed in logs
- ""
robots_txt_path: "" # default /robots.txt
# --- llms.txt Status (REQUIRED) ---
llms_txt_exists: false # /llms.txt curated AI surface map
llms_full_txt_exists: false # /llms-full.txt expanded content version
llms_txt_last_updated: ""
# --- Content First Baseline (REQUIRED, see framework-contentfirst.md) ---
contentfirst_score: 0 # out of 30
substrate_in_first_byte: false
schema_in_first_byte: false
js_required_for_primary_content: false
# --- Reading Mode Readiness (REQUIRED) ---
curl_gptbot_useragent_returns_full_html: false # substrate test pass
heading_hierarchy_clean: false # single H1, logical H2 to H6
faq_uses_details_summary: false # not JS accordion
tables_for_comparison_data: false # <table>, not prose
ordered_lists_for_sequential: false # <ol>, not prose
# --- Bing Index Coverage (REQUIRED, ChatGPT Search retrieval foundation) ---
indexed_in_bing: false # Bing Webmaster Tools verification
bing_indexnow_configured: false # for sub minute new content notification
bing_priority_pages_indexed: 0 # of priority pages, how many appear in Bing top 10 for target query
# --- Freshness Posture (REQUIRED) ---
dateModified_visible_and_in_schema: false
content_refresh_cadence: "" # "weekly" | "monthly" | "quarterly" | "annual" | "ad_hoc"
percent_priority_pages_updated_last_90_days: 0
# --- Tracking (REQUIRED) ---
chatgpt_citation_tracker: "" # "Profound" | "Otterly" | "AthenaHQ" | "manual_only" | "none"
manual_sampling_cadence: "" # default "weekly_top_10_priority"
server_log_chatgpt_bot_monitoring: false
Citation defense work cannot start until contentfirst_score is at least 22, OAI-SearchBot is allowed in robots.txt, and a baseline citation sample exists for the priority query set. Sites failing those dependencies route back to those frameworks first.
3. What SearchGPT and ChatGPT Search Are
3.1 The Three OpenAI Search Surfaces
OpenAI does not operate a single search product. It operates a family of web search augmented surfaces. As of Q2 2026, the family includes:
ChatGPT Search. The browsing capability inside ChatGPT itself. Triggers automatically when a user query benefits from current web data, or on explicit user invocation. Launched October 31, 2024 per OpenAI's launch announcement (initially Plus and Team tier, later free tier). Approximately 18 percent of ChatGPT conversations now trigger at least one web search per the Profound 7,000 query analysis published 2025. Average 2 searches per query, each search 5 to 6 words long, per the same Profound study. This is the dominant OpenAI search surface by query volume and the primary optimization target of this framework.
SearchGPT. Originally a dedicated standalone search interface, launched July 25, 2024 as a prototype per OpenAI's prototype announcement. The standalone SearchGPT was always described by OpenAI as temporary, with the intent to integrate the search features directly into ChatGPT. In practice, the SearchGPT prototype became ChatGPT Search in October 2024. The "SearchGPT" name is now mostly historical, though the OAI-SearchBot user agent retains the prototype era naming and OpenAI's bot documentation still references the SearchGPT prototype on the OAI-SearchBot details page.
ChatGPT agent and Atlas. ChatGPT agent (announced July 17, 2025 per OpenAI's agent announcement) is the unified agentic capability that replaced the earlier Operator preview (Operator launched January 23, 2025 per OpenAI's Operator announcement, shut down August 31, 2025 per the TechCrunch and Wikipedia summaries). ChatGPT agent is integrated into ChatGPT itself, accessible through the tools dropdown. ChatGPT Atlas (launched October 21, 2025 per OpenAI's Atlas announcement) is OpenAI's Chromium based browser with ChatGPT and agent mode built into the interface. Both surfaces use ChatGPT-User (and in some cases ChatGPT agent specific user agents) for the actual fetches, with the agent mode making real browser style requests during task execution. These are the surfaces where citation matters because ChatGPT acts on behalf of the user.
3.2 Why These Are A Distinct Optimization Target
The OpenAI surface family is not interchangeable with Google AI Overviews. Two studies establish the divergence pattern:
- Cross surface overlap is low. Ahrefs February 2026 study found only 12 percent of AI cited URLs rank in Google's top 10 for the original prompt. Per the same study, only 6.82 percent of ChatGPT results overlap with Google's top 10 organic results. 28 percent of ChatGPT's most cited pages have zero organic visibility in Google search.
- Source preference diverges. Per Profound's AI platform citation patterns analysis 2025, ChatGPT prioritizes Wikipedia (approximately 48 percent of its citation surface in the analyzed sample) and Reddit, while Google AI Overviews features Reddit more prominently. The 5W Public Relations Q1 2026 study found Wikipedia at 13.15 percent and Reddit at 11.97 percent of US ChatGPT citations across a multi week sample. Outside Wikipedia and Reddit, no domain exceeds 3 percent of ChatGPT citations.
The combined pattern: ChatGPT Search citation is determined by a different ranking signal weighting, a different index source, a different bot family, and a different surface preference distribution than Google AI Overviews. The frameworks must be separated.
3.3 The 2026 State Of ChatGPT Search
Five numbers define the surface for planning purposes:
- Approximately 900 million weekly active users (February 2026 per TechCrunch).
- Approximately 2.5 billion prompts per day, 18 billion messages per week (DemandSage 2026 aggregation).
- Approximately 18 percent of ChatGPT conversations trigger at least one web search (Profound 7,000 query analysis 2025).
- Approximately 250 to 500 million weekly queries pass through the ChatGPT Search retrieval layer (Similarweb 2026 AI Search report estimate).
- ChatGPT Search citation tracking by independent third party tools (Profound, Otterly, AthenaHQ) is now mature and treated as a primary visibility KPI by enterprise SEO teams alongside Google Search Console.
These five numbers move citation work on this surface from a speculative bet (as it was in 2024) to a measurable, defensible visibility channel (as it is in Q2 2026).
3.4 Four Pillars Context
ChatGPT Search citation work sits in the GEO pillar (Generative Engine Optimization, broader AI citation across the multi engine surface) of the four pillars architecture: SEO (classic ten blue links), AEO (Answer Engine Optimization), AIO (Google AI Overview, see framework-aioverviews.md), GEO (this framework and framework-aicitations.md). The four pillars are independent surfaces. A page can win SEO and fail GEO, or win GEO and fail SEO. Each requires its own optimization pass.
4. The OpenAI Bot Family
OpenAI's bot family has four members as of Q2 2026. Each serves a distinct purpose. Each is controlled independently in robots.txt per OpenAI's bot documentation. Treating them as interchangeable is the single most common ChatGPT Search optimization mistake.
4.1 GPTBot (Training)
Purpose: collects publicly accessible content for training OpenAI's foundation models per OpenAI's bot documentation.
User agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot. IP ranges at https://openai.com/gptbot.json.
Blocking GPTBot opts out of training. Blocking GPTBot does not block ChatGPT Search citation. The two are controlled independently. Critical distinction. Per OpenAI, GPTBot prioritizes thoroughness; crawl frequency is a function of perceived training value.
4.2 OAI-SearchBot (ChatGPT Search Retrieval)
Purpose: crawls to index content surfaced as citations in ChatGPT Search per OpenAI's bot documentation and the publishers FAQ.
User agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot. IP ranges at https://openai.com/searchbot.json.
This bot determines citation eligibility. Blocking it makes the site invisible to ChatGPT Search. The OpenAI publisher documentation is explicit: a site can allow OAI-SearchBot while disallowing GPTBot, surfacing in search while opting out of training. Per OpenAI, OAI-SearchBot prioritizes freshness over thoroughness; crawl frequency is sporadic compared to Googlebot or Bingbot.
4.3 ChatGPT-User (User Invoked Browsing)
Purpose: fetches pages when a ChatGPT user triggers browsing during a conversation per OpenAI's bot documentation.
User agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot. IP ranges at https://openai.com/chatgpt-user.json.
Operates on demand. Visits when conversation needs current web data. Not systematic indexing. ChatGPT-User traffic surged approximately 2,825 percent YoY per industry bot traffic reports 2025 (Search Engine Journal coverage of AlliAI's analysis: ChatGPT-User volume now exceeds Googlebot by approximately 3.6x). Blocking ChatGPT-User blocks live fetch during a user's conversation. Distinct from OAI-SearchBot pre indexing. Allow both for full ChatGPT Search visibility.
4.4 ChatGPT Agent And Atlas Browsing
The ChatGPT agent (and Atlas browser host) makes web requests during agentic task execution per OpenAI's agent and Atlas announcements. Traffic uses ChatGPT-User identification with potential additional context headers per Simon Willison's August 2025 ChatGPT agent user agent analysis. Blocking ChatGPT-User blocks agent access during a task. For citation purposes, treat ChatGPT-User and ChatGPT agent as a single access category. See framework-agenticaisearch.md.
4.5 Bot Family Summary
| Bot | Purpose | Frequency | Blocking Impact |
|---|---|---|---|
| GPTBot | Training | Sporadic, thorough | Opts out of training. No ChatGPT Search citation impact. |
| OAI-SearchBot | ChatGPT Search index | Sporadic, fresh | Removes from ChatGPT Search citation eligibility. |
| ChatGPT-User | User invoked browsing | On demand, surging 2,825 percent YoY | Removes from live ChatGPT browsing. |
| ChatGPT agent | Agentic task execution | On demand | Removes from agentic task visibility. |
Default posture: allow OAI-SearchBot, allow ChatGPT-User, make a deliberate decision on GPTBot per training opt out policy.
5. Reading Mode Behavior
5.1 What Reading Mode Is
Reading mode is the plain HTML parse mode AI engine bots use. None of OpenAI's bots execute JavaScript per the Prerender 2025 web crawler analysis and the multiple 2025 analyses (xSeek, Known Agents, Hall AI, Search Engine Land coverage). The first byte HTML returned by the server is the entire surface available for ChatGPT Search ingestion. CSS does not run. Images do not load. JavaScript does not execute. The bot reads the document the way curl reads it.
This is the single highest leverage signal for ChatGPT Search citation. A site that fails the reading mode substrate test is invisible regardless of every other optimization. A site that passes it is in the candidate pool.
5.2 Reading Mode Prevalence And Bounce Behavior
Industry crawl behavior analysis of ChatGPT bot traffic across millions of 2025 fetches (Vercel "rise of the AI crawler" 2025 analysis, Search Engine Journal coverage of AlliAI's ChatGPT-User volume data showing 3.6x Googlebot) establishes the framework's planning numbers. Approximately 46 percent of ChatGPT bot crawls begin and substantially end in plain HTML reading mode, consuming only the unrendered first byte response. Approximately 63 percent of pages requiring JavaScript for primary content result in a bot bounce. If reading mode produces an empty extraction, the page is not in the citation candidate pool.
A reading mode bounce is silent. No error reaches the publisher. The site simply is not cited. The publisher continues to optimize for human visitors, never realizing the bot driving ChatGPT Search citation has been seeing nothing.
5.3 The Reading Mode Substrate Test
The single most important test in this framework:
curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot" \
-s https://example.com/priority-page/ | less
Read the output. Look for:
- The H1 of the page
- The lede paragraph immediately under the H1
- All H2 section headings
- FAQ content (questions and answers as actual text)
- Comparison data (in tables, not just prose)
- Procedural content (in ordered lists, not just prose)
- Schema JSON-LD blocks (as actual text, not injected by JS)
- Author byline, dateModified, internal links
If any of these are missing from the curl output, they will be missing from what OAI-SearchBot, ChatGPT-User, and ChatGPT agent see. The page fails reading mode regardless of how it looks in a browser. Fix the substrate before proceeding with any other ChatGPT Search work.
5.4 Heading Hierarchy In Reading Mode
In reading mode, the bot parses by document structure, not by rendered layout. The H1 to H6 hierarchy carries citation weight that no visual styling provides. The framework standard:
- One H1 per page, containing the page's primary topic
- H2 sections that decompose the topic into the sub queries a ChatGPT Search might generate
- H3 within H2 where additional depth is needed
- No skipping levels (H1 to H3 without H2 is a failure)
The ChatGPT search retrieval layer (Bing infrastructure with OpenAI ranking on top, see Section 6) heavily weights heading structure as a signal of content organization. Pages with messy or absent heading hierarchy lose to pages with clean hierarchy regardless of word count.
5.5 Direct Answer Density
Per the Searchengineland coverage of the December 2025 content study (44 percent of ChatGPT citations from the first third of content), and the Profound and Stridec 2025 studies on direct answer patterns, ChatGPT Search prefers content that places the answer near the start of the section. The framework standard:
- Lede paragraph under H1: 40 to 75 words, directly answering the page's central question
- Every H2 opens with a 40 to 75 word direct answer paragraph before elaboration
- The "answer capsule" pattern, per the 72.4 percent of cited blog posts pattern found in the Stridec study, is consistent: concise, self contained 40 to 60 word capsules placed directly after section headings
5.6 Reading Mode Friendly Element Selection
Per the multiple 2025 and 2026 AI citation studies including kime.ai and the Stridec analysis, certain HTML elements outperform their prose equivalents for AI citation:
| Element | When To Use | Citation Lift vs Prose |
|---|---|---|
<table> |
Comparison data, specs, pricing tiers | Approximately 4x |
<ol> |
Sequential procedures, ranked lists | Approximately 2.7x |
<ul> |
Unordered enumerations, feature lists | Approximately 1.8x |
<dl> <dt> <dd>
|
Term definitions, glossaries | Approximately 2x |
<details> <summary>
|
FAQ blocks | Equivalent extraction to plain Q and A, with progressive disclosure UX |
<blockquote> with <cite>
|
Expert quotes, sourced statements | Citation friendly attribution |
The conversion from prose to structured element is mechanical and has no downside for human readers (with appropriate styling). Sites that systematically convert comparison prose to tables and procedural prose to ordered lists see measurable ChatGPT Search citation lift across the converted pages.
5.7 JavaScript Reveal Patterns That Fail Silently
The patterns below produce a passable human experience and a complete reading mode failure. They are the most common ChatGPT Search invisibility causes in the field.
| Pattern | Failure Mode |
|---|---|
FAQ accordion built with aria-expanded toggling and JS DOM injection |
Q and A content not in initial HTML; bot sees only the question summaries, never the answers. |
| Tab interfaces where only the active tab's content is in initial DOM | Bot sees only the default tab; content behind other tabs is invisible. |
| Infinite scroll or "load more" content that fetches additional items via JS | Bot sees only the initial item count. |
Client side rendered React, Vue, or Svelte component trees with empty <div id="root">
|
Bot sees the empty root and no content at all. |
| Schema markup injected via Google Tag Manager or client side JS | Bot sees no schema. The schema is invisible to ChatGPT Search retrieval. |
| Hero or feature copy delivered via JS animation reveal | Bot sees the empty placeholder. |
| Cookie consent gates that hide content until JS resolves consent state | Bot sees the consent shell, never the gated content. |
For every pattern above, the fix is the same: server render the content. The visual reveal can still happen for human users (via progressive enhancement layered on top of the substrate). The substrate must contain the actual content in the first byte.
6. ChatGPT Search Ranking Signals
Ordered by leverage (high to low) for ChatGPT Search citation specifically. Each tagged as Shared with classic SEO or ChatGPT specific.
6.1 Bing Index Presence (HIGHEST leverage, ChatGPT Search specific)
ChatGPT Search retrieval is built on Bing's index as its primary real time retrieval layer per the multiple 2025 and 2026 analyses (Yoast 2026 ChatGPT Search explainer, ClickRank 2026 ChatGPT Search indexing guide, Martech Zone IndexNow analysis). OpenAI has stated that "we use a set of services and Bing is an important one" per the CNBC and GeekWire October 2024 ChatGPT Search launch coverage. OpenAI has layered its own retrieval and ranking refinements on top of the Bing foundation across 2025, but as of Q2 2026 a page that does not appear in Bing's index for a query has near zero chance of being cited in ChatGPT Search for that query.
Triage check: query the page's target topic in Bing. If the page is not in the top 20 organic Bing results, ChatGPT Search retrieval will rarely surface it.
Operational tactic: use Bing Webmaster Tools to verify indexing. Configure Bing IndexNow to push new content to Bing within seconds. ChatGPT Search retrieval memory updates within hours for high authority news sites and typically 24 to 72 hours for standard websites per the ClickRank 2026 analysis.
6.2 Server Rendered Substrate (HIGHEST leverage, ChatGPT specific)
The first byte server response is the entire surface available for ChatGPT bot parsing. See Section 5. Without this, no other signal matters.
Triage: curl -A "GPTBot" https://example.com/page/ must return H1, lede, H2s, FAQ content, schema JSON-LD. Missing any one means the page fails the substrate signal.
6.3 Freshness And dateModified (HIGH leverage, Shared with bias toward freshness)
Per the multiple 2025 and 2026 ChatGPT citation studies (Ahrefs 1.4M prompt analysis, Stridec analysis, ClickRank analysis), content updated within the last 30 days receives approximately 3.2 times more citations than older material. Per the same Ahrefs analysis, 89.7 percent of pages cited by ChatGPT had been updated in 2025, and 60.5 percent were published within the last two years.
The framework standard:
-
dateModifiedvisible in page UI and present in schema - Genuine substantive update, not date manipulation (a date refresh with no content change is detected by both ChatGPT Search retrieval and Google's HCS layer, per framework-hcs.md)
- Time sensitive content refreshed on a documented cadence (Section 14)
6.4 Content Density And Direct Answers (HIGH leverage, ChatGPT specific)
Per the Stridec content density analysis 2025 and the Discovered Labs and Ranking Lens 2025 citation pattern analyses, ChatGPT Search prefers content with high factual claim density: specific numbers, dates, named entities, primary source attributions per paragraph. Pages dense with factual claims are cited more often than pages dense with opinion or generality.
The framework standard:
- Lede paragraph: 40 to 75 words, direct answer to the page topic
- Every H2 opens with a 40 to 75 word direct answer paragraph
- Approximately one citable, hyperlinked statistic per 150 to 200 words of body content per the Wellows December 2025 study cited in the position.digital 2026 AI SEO statistics roundup
- Sections of 120 to 180 words earn approximately 70 percent more citations than longer or shorter sections per the same Wellows analysis
6.5 Schema Completeness (HIGH leverage, Shared)
Per multiple AI citation studies, pages with valid, server rendered JSON-LD using the @graph pattern with @id cross references and sameAs links earn higher citation rates on every AI surface. ChatGPT leverages schema for entity disambiguation: a brand with Organization schema sameAs Wikipedia or Wikidata plus Article schema increases recognition probability substantially.
Recommended stack: Organization at site level with sameAs to Wikipedia, Wikidata, LinkedIn, Crunchbase; WebSite and WebPage linking back to organization; Article or BlogPosting with author and dateModified; FAQPage mirroring visible <details> content; BreadcrumbList on every page; Person for authors. See framework-schema.md.
6.6 Entity Recognition (HIGH leverage, ChatGPT specific)
Per AtomicAGI 2026 and Moonrank 2026 ChatGPT ranking analyses, ChatGPT Search uses entity recognition to match query entities to candidate pages. Pages that explicitly declare entities (business name, owner, location, services, partners, geographic scope) in visible text and sameAs network earn higher citation probability than pages that imply entities.
Triage: a reader new to the site should identify business name, owner, location, and topic in the first 200 words. Schema sameAs array should connect to Wikipedia entry (if any), Wikidata Q-ID (see framework-knowledgegraph.md), LinkedIn company page, industry registries.
6.7 Citation Position In Content (MEDIUM leverage, ChatGPT specific)
Per Search Engine Land coverage of the December 2025 content study, 44 percent of ChatGPT citations come from the first third of each piece. Per the Stridec 2025 analysis, 72.4 percent of cited blog posts include an identifiable answer capsule (40 to 60 word explanation placed directly after titles).
Implication: if the answer is not near the top of the section, the model does not wait for it. Move the answer up.
6.8 Third Party Brand Mentions (MEDIUM leverage, ChatGPT specific)
Per BrightEdge's 2025 mentions vs citations analysis, ChatGPT mentions brands approximately 3.2x more often than it cites them. Per the SEMrush mention source divide finding (September 2025), fewer than one in five brands achieve both frequent mentions and consistent citations. Per DOMINATE's 2026 ChatGPT visibility roundup, brand mentions get a domain cited approximately 10x more often than backlinks across the 75,000 brand sample.
Third party mention pace across industry publications, review platforms, podcasts, YouTube transcripts, and forum content (especially Reddit, given 11.97 percent share) is a higher leverage signal than backlink building. See framework-digitalpr.md and framework-trustsignals.md.
6.9 Wikipedia And Reddit Presence (MEDIUM leverage, ChatGPT specific)
The 5W Public Relations Q1 2026 study found Wikipedia at 13.15 percent and Reddit at 11.97 percent of US ChatGPT citations. Outside those two, no domain exceeds 3 percent. Per Profound's platform citation patterns analysis 2025, ChatGPT prioritizes Wikipedia at approximately 48 percent of citations in its sample.
For clients with notable activity, a Wikipedia presence (where notability and reliable secondary sources support it) is high leverage. Reddit presence (organic contribution on relevant subreddits, not promotional spam) is similar. See framework-knowledgegraph.md.
Caution: Reddit's ChatGPT citation share dropped from approximately 60 percent to approximately 10 percent in two weeks in September 2025 per the PRNewswire 5W coverage, reflecting a technical change. Volatility is real. Do not over rotate on any single platform.
6.10 YouTube Transcript Mentions (MEDIUM leverage, ChatGPT specific)
Per the Ahrefs December 2025 correlation study, YouTube mentions correlate with AI search visibility at 0.737, the strongest single predictor measured in any 2025 to 2026 study. ChatGPT Search ingests YouTube transcripts; brand mentions in popular videos translate to citation probability. Getting named in industry YouTube content (creator partnerships, podcast appearances, interview placements) compounds visibility across all surfaces.
6.11 Signal Triage Summary
For a new client engagement, the highest leverage signals are 6.1 Bing Index Presence, 6.2 Server Rendered Substrate, 6.3 Freshness, 6.4 Content Density, 6.5 Schema Completeness, and 6.6 Entity Recognition. These six gate ChatGPT Search citation eligibility. The medium leverage signals (6.7 through 6.10) compound the gains and stabilize against the surface's known volatility.
7. The Citation Pattern On ChatGPT Search
7.1 Sources Tab Mechanics
ChatGPT Search responses surface a "Sources" affordance below or alongside the response. The panel lists cited sources and other relevant links retrieved during the response per OpenAI's ChatGPT search help documentation. Typically 3 to 7 cited sources per response, varying with query complexity and model confidence.
Per the Profound 7,000 query analysis 2025, ChatGPT performs an average of 2 searches per query, each 5 to 6 words long. ChatGPT cites approximately 15 percent of retrieved pages; 85 percent are never cited. Citation is selective, not comprehensive.
7.2 Citation Selection Criteria
Selection logic (inferred from the 2025 to 2026 studies, since OpenAI does not publish the algorithm):
- Source in the retrieval candidate set (gated by Bing index presence and OAI-SearchBot access)
- Source contributes a unique factual element (factual claim density)
- Source readable in reading mode (substrate test)
- Source meets confidence threshold for the claim (E-E-A-T, entity authority)
- Source ranks adequately in the retrieval layer for the sub query
Pages passing the first three and failing the last two are retrieved but not cited. Pages failing the first three are not retrieved.
7.3 Brand Mention vs Citation
Per BrightEdge and DOMINATE analyses: a mention is when ChatGPT names a brand in the response body without linking (user sees name, no traffic). A citation is when ChatGPT includes a clickable link in the Sources panel (click through possible). Mentions occur approximately 3.2x more often than citations per BrightEdge. Track both.
7.4 Prioritization Within The Sources Panel
The Sources panel orders by retrieval relevance, not chronologically. The top listed source earns disproportionate click through. Section 6 signals cascade into higher Sources panel position when cited.
7.5 Citation Persistence Across Reruns
Per Ahrefs ChatGPT citation analysis 2025, ChatGPT Search citation is more stable than Google AI Overview (which sees 45.5 percent of cited sources replaced on regeneration per framework-aioverviews.md). Estimated ChatGPT rerun stability: approximately 60 to 75 percent for established high authority sources, dropping to 30 to 50 percent for newer or lower authority sources. Measure across a rolling four week window, not single point in time.
8. Robots.txt And Access Patterns
8.1 The Default Recommended Posture
For any site optimizing for ChatGPT Search visibility, the default robots.txt posture:
# robots.txt
# OpenAI search retrieval, user invoked browsing, and agent allowed
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
# OpenAI training: deliberate decision required
User-agent: GPTBot
Allow: /
# OR Disallow: / if site policy opts out of training corpus
# All other rules
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
The critical line is User-agent: OAI-SearchBot Allow: /. Without it, the site cannot be cited in ChatGPT Search.
8.2 Common Misconfigurations
Observed patterns that cause silent ChatGPT Search invisibility:
| Misconfiguration | Effect |
|---|---|
Disallow GPTBot interpreted as blocking all OpenAI bots |
Silent invisibility. OAI-SearchBot independently controlled but publisher does not realize the distinction. |
User-agent: * Disallow: / left from staging |
Total invisibility. |
Disallow: /blog/ style content tree blocks |
Tree invisible to all bots. |
| Robots.txt returns 5xx | Bots interpret as block. Monitor availability. |
Accumulated per route Disallow rules |
Surprise blocks. Audit annually. |
8.3 Verifying Bot Access Via Server Logs
Robots.txt configuration is one thing. Actual bot access verified in logs is another. Confirm OAI-SearchBot, ChatGPT-User, and (if not blocked) GPTBot in server logs at least monthly per Section 14.
grep -E "OAI-SearchBot|ChatGPT-User|GPTBot" /var/log/nginx/access.log | \
awk '{print $1, $7, $9, $12}' | sort | uniq -c | sort -rn | head -50
Adjust field positions to the access log format in use. Status 200 means the bot read the page. 4xx or 5xx indicates a problem.
8.4 IP Range Verification
For high authority sites, verify request IPs match OpenAI's published ranges: GPTBot at https://openai.com/gptbot.json, OAI-SearchBot at https://openai.com/searchbot.json, ChatGPT-User at https://openai.com/chatgpt-user.json. Reverse DNS is not reliable; trust the published IP ranges per OpenAI's bot documentation.
8.5 Crawl Delay
OpenAI bots respect Crawl-delay. Most sites need none. Constrained servers: Crawl-delay: 5 is reasonable. Above 30 seconds risks losing fresh content from the retrieval index.
9. llms.txt And AI Surface Maps
9.1 What llms.txt Is
llms.txt is a proposed standard from Jeremy Howard of Answer.AI in September 2024 per the Answer.AI original posting and Search Engine Land coverage. A Markdown file at /llms.txt that provides AI engines a curated guide to the site's most important content, with optional /llms-full.txt expanded version. Intent: give LLM context assembly a clean map rather than requiring full site crawl and importance inference.
9.2 Current Adoption State (Q2 2026)
Per SERanking November 2025 research on approximately 300,000 domains, only approximately 10.13 percent of sites have llms.txt. Adoption is approximately even across traffic tiers (approximately 9 to 10 percent each). Per BuiltWith tracking October 25, 2025, approximately 844,000 websites have implemented it.
Per SERanking and the LinkBuildingHQ 2026 assessment, no major AI company including OpenAI, Google, Anthropic, Meta, or Mistral has publicly committed to reading or acting on llms.txt in production as of Q1 2026. GPTBot occasionally requests /llms.txt, but the rate is not distinguishable from generic root level discovery.
9.3 The Framework Position On llms.txt
Low cost (a few KB of Markdown), asymmetric upside (if OpenAI announces production use, sites with llms.txt are ready). The framework recommends implementing it as low priority hygiene, not a primary citation lever. Useful as documentation for MCP server integrations and IDE agents per Codersera's May 2026 analysis, and as a forcing function to articulate content hierarchy.
9.4 Recommended llms.txt Structure
# Example Business
> One sentence description of the business, its primary topic authority, and what content the site offers.
## Core Pages
- [About](/about/): who we are, credentials, location
- [Services](/services/): what we offer
- [Authors](/authors/): expert team and credentials
## Authoritative Content
- [Cornerstone topic 1](/topic-1/): definitive guide
- [Cornerstone topic 2](/topic-2/): reference document
- [Cornerstone topic 3](/topic-3/): methodology
## Recent Updates
- [Latest article](/blog/latest/): published 2026-MM-DD
- [Methodology update](/methodology/): updated 2026-MM-DD
## Schema And Entity
- Wikipedia: [link if entity has Wikipedia entry]
- Wikidata: Q-ID if assigned (see framework-knowledgegraph)
- LinkedIn: [link]
The optional /llms-full.txt is an expanded version with the same structure plus complete content excerpts. For most clients, /llms.txt alone is sufficient until adoption signals from OpenAI or others appear.
9.5 Validating The File
/llms.txt should:
- Return HTTP 200 with Content-Type
text/plainortext/markdown - Be present at the exact root path
/llms.txt - Validate as Markdown (no broken links to internal pages)
- Reflect the actual current site structure (review quarterly per Section 14)
10. SearchGPT Specifics vs Generic ChatGPT
10.1 The Historical SearchGPT Prototype
The standalone SearchGPT prototype (July 2024 per OpenAI) was always described as temporary, with explicit intent to integrate into ChatGPT. The standalone ran approximately three months and was absorbed into ChatGPT Search at the October 31, 2024 launch.
As of Q2 2026, "SearchGPT" persists in three places: the OAI-SearchBot user agent and https://openai.com/searchbot documentation page; historical analyses from mid 2024; and occasional informal industry references using the terms interchangeably. For practical optimization in 2026, there is one OpenAI search surface (ChatGPT Search) and one bot family (Section 4). The standalone prototype interface is no longer live.
10.2 Standalone Search Interface Differences
If OpenAI relaunches a standalone search interface (a possibility per GeekWire and The National's 2024 launch coverage of OpenAI's long term ambitions), optimization signals are expected to remain largely the same. The retrieval layer (Bing infrastructure with OpenAI ranking refinements) and the bot family (OAI-SearchBot, ChatGPT-User) are common.
Differences that could emerge: stronger ten blue link style presentation; direct query syntax favoring classic SERP friendly answer structure; different ad inventory affecting which sources surface. Framework guidance: optimize for ChatGPT Search per Sections 5 through 9 and a relaunch inherits substantially the same optimization base.
11. ChatGPT Agent And Atlas Agentic Use Cases
11.1 The Agent As A Citation Surface
ChatGPT agent (July 17, 2025 per OpenAI) and ChatGPT Atlas (October 21, 2025 per OpenAI) shift the citation surface fundamentally. When ChatGPT acts on the user's behalf, the user does not always see the source pages. The agent reads, reasons, and acts. The user sees the outcome. The source that influenced the agent's decision earns an agentic citation: the page that was read and acted on, even when the user never clicks through.
11.2 Optimization For Agentic Surfaces
Signals overlap heavily with ChatGPT Search citation, with additional weight on:
- Machine readable structured commercial data: pricing tables, comparison matrices, availability flags, product schema
- Schema completeness for Product, Service, Offer, FAQPage (see framework-schema.md)
- Unambiguous calls to action with reachable destination URLs
- No CAPTCHA or hard interaction gates on critical commercial pages. Operator was deprecated in part for reliability issues on JS heavy flows and CAPTCHAs per TechCrunch's deprecation coverage. Agent surfaces systematically deprioritize gated content.
See framework-agenticaisearch.md for the broader agentic browsing playbook.
11.3 How Agent Traffic Identifies Itself
ChatGPT agent requests typically appear as ChatGPT-User user agent strings, sometimes with additional context headers per the Simon Willison August 2025 analysis. The basic optimization treatment is the same: allow access, serve substrate, return structured data.
11.4 Atlas Browser Considerations
ChatGPT Atlas is Chromium based per OpenAI's Atlas announcement. Unlike OAI-SearchBot (no JavaScript), the Atlas browser renders full pages because it is a real browser used by a human or by agent mode. Hybrid pattern:
- For OAI-SearchBot retrieval indexing: JavaScript free substrate mandatory
- For Atlas browser sessions (human user or agent mode): JavaScript executes normally
The substrate is the floor, not the ceiling. JavaScript enhancement on top of the substrate works in Atlas the same way it works in any modern browser.
11.5 Browser Memory
Atlas browser memory allows ChatGPT to remember facts from visited sites per the Atlas launch coverage. Long term recall in future ChatGPT conversations maps to the same signals that earn ChatGPT Search citation: clear entity declarations, structured commercial data, unambiguous brand presence.
12. Common ChatGPT Search Mistakes (Top 10 Anti Patterns)
The ten anti patterns below cause silent ChatGPT Search invisibility on otherwise well optimized sites. In observed client engagements across the audit work informing this framework, addressing any three of these on a site that previously failed reading mode produces measurable ChatGPT Search citation lift within four to six weeks.
12.1 Blocking OAI-SearchBot While Allowing Others
Most common ChatGPT Search invisibility cause. Publisher blocks GPTBot to opt out of training but accidentally also blocks OAI-SearchBot through overly broad User-agent: * rules or misunderstanding the bot family. Fix per Section 8.1.
12.2 Client Side Rendered Content
React or Vue SPA delivers an empty root div. Bots see nothing. Fix per framework-contentfirst.md and Section 5.
12.3 JavaScript Accordion FAQ
FAQ content lives in JS that runs on interaction. Bot sees only question summaries. Fix: <details> and <summary> per Section 5.7.
12.4 Schema Injected Via Google Tag Manager
Schema injected client side via GTM. Bot sees no schema. Fix: server rendered JSON-LD in document head.
12.5 No Direct Answer Near The Top Of The Page
Page answers eventually but the answer is buried 600 words in. Bot extracts the first third and finds nothing. Fix per Section 6.4.
12.6 Generic Or Empty Image Alt Text
Retrieval bots are not multimodal at retrieval time as of Q2 2026. Generic alt text provides nothing to entity recognition. Fix: descriptive alt text. See framework-imageseo.md.
12.7 dateModified That Does Not Match Visible Update Date
Schema says one date, visible content says another. Bot detects the inconsistency and discounts the freshness signal. Keep schema and visible date in sync; both must reflect genuine updates.
12.8 No Bing Index Verification
Site optimized for Google but never indexed in Bing. ChatGPT Search retrieval cannot find the page. Fix per Section 6.1.
12.9 Brand Entity Ambiguity
Multiple businesses share the name, or one business uses inconsistent naming across the web. Bot cannot resolve which entity. Fix per framework-entitysalience.md and framework-knowledgegraph.md.
12.10 Optimization Frozen After Launch
A 2024 page with no updates is at a freshness disadvantage to a 2024 page with quarterly substantive refreshes. Fix per Section 14.
13. Audit Rubric
13.1 First 90 Days Page Subset Rubric
The five item gating rubric. Every priority page must pass these five for ChatGPT Search visibility consideration.
| # | Criterion | Pass/Fail |
|---|---|---|
| F1 | curl test with GPTBot user agent returns H1, lede, H2s, FAQ content, schema all in first byte | |
| F2 | OAI-SearchBot allowed in robots.txt | |
| F3 | ChatGPT-User allowed in robots.txt | |
| F4 | Page indexed in Bing (verified in Bing Webmaster Tools) | |
| F5 | Schema JSON-LD server rendered with Organization, WebPage, and Article or BlogPosting graphs |
A page passing these five is in the ChatGPT Search candidate pool. A page failing any one is not, regardless of other work.
13.2 Per Page Audit Rubric (Full)
| # | Criterion | Pass/Fail |
|---|---|---|
| P1 | curl test passes: H1, lede, H2s, FAQ, schema all in first byte | |
| P2 | Lede paragraph 40 to 75 words immediately under H1, citeable standalone | |
| P3 | Every H2 opens with 40 to 75 word answer first paragraph | |
| P4 | FAQ uses <details> and <summary>, schema mirrors visible Q and A |
|
| P5 | Comparison data uses <table> not prose |
|
| P6 | Procedural content uses <ol> not prose |
|
| P7 | Core graph schema server rendered in head | |
| P8 | Article or BlogPosting schema with author and dateModified | |
| P9 | Author byline visible at top, linked to author bio with credentials | |
| P10 | dateModified visible and in schema, reflecting actual substantive updates | |
| P11 | Internal links use descriptive anchor text, at least three inbound from related pages | |
| P12 | Inline outbound citations to authoritative sources where claims warrant | |
| P13 | Image alt text descriptive, not generic or empty | |
| P14 | One citable hyperlinked statistic per 150 to 200 words of body | |
| P15 | Sections 120 to 180 words, not 50 word fragments or 400 word walls |
Score 15. World class ChatGPT Search ready page: 13 or higher with zero F1 to F5 fails.
13.3 Site Wide Audit Rubric
| # | Criterion | Pass/Fail |
|---|---|---|
| S1 | Content first doctrine compliance, contentfirst score 22 or higher (see framework-contentfirst.md) | |
| S2 | OAI-SearchBot allowed in robots.txt site wide | |
| S3 | ChatGPT-User allowed in robots.txt site wide | |
| S4 | Bing Webmaster Tools verification active | |
| S5 | Bing IndexNow configured for sub minute new content notification | |
| S6 | Core graph schema present on every indexable page | |
| S7 | Every page has explicit entity declarations | |
| S8 | E-E-A-T self assessment score 90 or higher (out of 130, see framework-eeat.md) | |
| S9 | Wikipedia / Wikidata posture established per framework-knowledgegraph.md (where notability permits) | |
| S10 | Freshness cadence documented and operational per Section 14 | |
| S11 | Server log monitoring detects OAI-SearchBot, ChatGPT-User, GPTBot fetches monthly | |
| S12 | llms.txt present per Section 9 (low priority hygiene) | |
| S13 | Third party AI citation tracker operational (Profound, Otterly, AthenaHQ, or manual) | |
| S14 | Manual ChatGPT Search sampling cadence operational, weekly for top 10 priority queries | |
| S15 | Quarterly brand mention pace tracked (5 to 10 new mentions per quarter minimum) |
Score 15. World class ChatGPT Search ready site: 13 or higher with zero critical fails on S1, S2, S3, S4, S6.
13.4 First 90 Days Tracking
| Day Range | Action |
|---|---|
| 0 to 7 | Capture baseline: priority queries cited, brand mention rate, OAI-SearchBot fetches in last 30 days, Bing index coverage for priority pages |
| 8 to 30 | Apply Section 5 substrate fixes, Section 8 robots.txt fixes, Section 6.5 schema additions on priority pages |
| 31 to 60 | Apply Section 6.4 content density fixes, Section 6.6 entity declarations, Section 9 llms.txt |
| 61 to 90 | Re sample priority queries, document citation deltas, identify next 50 query tier |
13.5 ChatGPT Search Specific Citation Tracking
Manual sampling protocol for ChatGPT Search:
- Open ChatGPT Search interface (web browsing enabled tier required)
- Submit priority query
- Wait for response
- Click "Sources" affordance
- Record cited domains and the order in the Sources panel
- Record whether the brand is mentioned in the response body (mention vs citation distinction per Section 7.3)
- Repeat across the priority query set on a weekly cadence
Tooling: third party trackers (Profound, Otterly, AthenaHQ) automate this at scale. Manual sampling is feasible for sub 50 query lists.
14. Maintenance Schedule And Report Templates
14.1 Maintenance Cadence
Weekly. Sample top 10 priority queries in ChatGPT Search, record citation state. Check server logs for OAI-SearchBot, ChatGPT-User, GPTBot fetches on priority pages. Refresh one priority page with substantive new information.
Monthly. Sample next 50 priority queries. Run schema validation sweep on priority pages. Review competitor pages newly entering the candidate pool. Refresh time sensitive content per the freshness cadence. Audit robots.txt for accidental blocks.
Quarterly. Site wide Section 13 audit. Refresh evergreen content. Review brand mention pace against the target. Update llms.txt to reflect current content. Refresh Bing Webmaster Tools sitemap submissions.
Annually. Full framework review against the current OpenAI surface state (the bot family, the index dependency, and the ranking signal weighting all evolve). Refresh underlying frameworks (contentfirst, eeat, infogain, ymyl, schema) against current state. Strategic review of priority query selection.
14.2 Implementation Report Template
# ChatGPT Search Framework Implementation Report
**Site**: {{BUSINESS_NAME}}
**Implementation Date**: {{TODAY}}
## Summary
- Priority pages audited / passing first 90 days subset / passing full rubric
- Site wide rubric score: X/15
## Baseline ChatGPT Search Citation State
- Priority queries sampled, queries with site cited in ChatGPT Search, brand mention rate, OAI-SearchBot fetches in last 30 days
## Substrate Work
- Pages migrated from client side to server rendered
- Schema blocks moved from JS injection to server rendered head
- curl test pass rate before vs after
## Bot Access Work
- Robots.txt audit and corrections
- Server log verification of OAI-SearchBot, ChatGPT-User, GPTBot fetches
## Schema, Entity, And Freshness Work
- Counts per category, what was added, what was removed
## Bing Index Work
- Bing Webmaster Tools verification, sitemap submission, IndexNow configuration
## Tracking Infrastructure
- Citation tracker selected, manual sampling cadence documented, server log monitoring deployed
## Sign-Off
14.3 Audit Report Template
# ChatGPT Search Framework Audit Report
**Site**: {{BUSINESS_NAME}}
**Audit Date**: {{TODAY}}
## Executive Summary
One paragraph assessment.
**Site wide score**: X/15
**Average priority page score**: X/15
**Current ChatGPT Search citation rate** (priority queries, weekly sampled): X percent
**Brand mention rate** (priority queries, weekly sampled): X percent
## Findings By Section
Substrate, bot access, schema, entity, freshness, brand mention
## Critical Failures
List with remediation
## First 90 Days Subset Findings
Per priority page table of F1 to F5 results
## Recommended Remediation Order
Critical (substrate, OAI-SearchBot access, Bing index)
High (schema, entity, freshness)
Medium (content density, internal linking, llms.txt)
## Sign-Off
14.4 Server Log Monitoring Bash Script
#!/usr/bin/env bash
# /var/www/sites/[domain]/scripts/chatgpt-bot-monthly-report.sh
# Monthly bot access summary for ChatGPT optimization tracking.
LOG="/var/log/nginx/access.log"
ROLL="/var/log/nginx/access.log.1"
DOMAIN="$1"
DATE=$(date +%Y-%m-%d)
DIR="/var/www/sites/${DOMAIN}/reports/chatgpt-bots"
mkdir -p "${DIR}"
OUT="${DIR}/${DATE}.txt"
{
echo "ChatGPT Bot Access Report - ${DOMAIN} - ${DATE}"
echo ""
echo "OAI-SearchBot fetches:"
grep -c "OAI-SearchBot" "${LOG}" "${ROLL}" 2>/dev/null
echo "ChatGPT-User fetches:"
grep -c "ChatGPT-User" "${LOG}" "${ROLL}" 2>/dev/null
echo "GPTBot fetches:"
grep -c "GPTBot" "${LOG}" "${ROLL}" 2>/dev/null
echo ""
echo "Top paths by OAI-SearchBot:"
grep "OAI-SearchBot" "${LOG}" "${ROLL}" 2>/dev/null | \
awk '{print $7}' | sort | uniq -c | sort -rn | head -25
echo ""
echo "Non 200 status codes (potential blocks):"
grep -E "OAI-SearchBot|ChatGPT-User|GPTBot" "${LOG}" "${ROLL}" 2>/dev/null | \
awk '$9 != 200 {print $9, $7}' | sort | uniq -c | sort -rn | head -25
} > "${OUT}"
echo "Report at ${OUT}"
Save at /var/www/sites/[domain]/scripts/chatgpt-bot-monthly-report.sh, chmod +x, cron monthly.
End of Framework Document
Document version: 1.0
Created: 2026-05-14
Maintained by: ThatDeveloperGuy
ChatGPT Search, SearchGPT, ChatGPT agent, and ChatGPT Atlas constitute the dominant OpenAI search surface family. With approximately 900 million weekly active ChatGPT users in February 2026 and approximately 250 to 500 million weekly queries flowing through ChatGPT Search retrieval, citation here is load bearing for any business whose customers ask AI questions. The signals that drive citation are reading mode survival, Bing index presence, freshness, content density, schema completeness, entity recognition, and third party brand mention pace. The bot family (GPTBot, OAI-SearchBot, ChatGPT-User, ChatGPT agent) is independently controlled in robots.txt; the most common invisibility cause is blocking OAI-SearchBot while intending only to opt out of training.
Apply this framework after the substrate doctrine in framework-contentfirst.md, in parallel with schema, entity, E-E-A-T, and brand mention frameworks. For Google AI Overview see framework-aioverviews.md. For multi engine AI citation see framework-aicitations.md. For agentic browsing see framework-agenticaisearch.md.
Companions
- framework-contentfirst.md, substrate doctrine, prerequisite
- framework-aicitations.md, broader AI citation (Claude, Perplexity, Gemini, Copilot, Meta AI)
- framework-aioverviews.md, Google AI Overviews and AI Mode
- framework-hcs.md, Helpful Content System
- framework-infogain.md, Information Gain
- framework-eeat.md, E-E-A-T pillars
- framework-sqrg.md, Search Quality Rater Guidelines
- framework-ymyl.md, YMYL
- framework-schema.md, schema graph pattern
- framework-entitysalience.md, entity declarations
- framework-knowledgegraph.md, Wikipedia, Wikidata
- framework-agenticaisearch.md, agentic surface
- framework-multimodalsearch.md, multimodal retrieval
- framework-internallinking.md, internal link density
- framework-cross-stack-implementation.md, stack patterns
- framework-react.md, SPA retrofit
- framework-tailwind.md, Tailwind concerns
- framework-digitalpr.md, brand mention pace
- framework-trustsignals.md, trust infrastructure
- SEO-Search-Appearance.md, surface map, four pillars
- SERP-Optimization.md, feature targeting
- Scheduled Phase 2: framework-perplexityspaces, framework-ai-content-workflow, framework-brandvoice
- 14 tier Engine Optimization Stack, Tier 3 AI Search Domination
About this framework library
This article is the Dev.to republish of a framework reference document from ThatDevPro's SEO + AI engineering library. Canonical source: https://www.thatdevpro.com/insights/framework-searchgpt/
ThatDevPro is an SDVOSB-certified veteran-owned web + AI engineering studio operating from Cassville, Missouri. The studio runs the full 14-tier Engine Optimization stack and ships open-source tooling for AI citation engineering.
Companion 14-tier Engine Optimization stack (each tier is its own article):
- Tier 1 — Foundation
- Tier 2 — Search Visibility
- Tier 3 — AI Domination
- Tier 4 — Entity and Authority
- Tier 5 — Local Domination
- Tier 6 — Content and Multimedia
- Tier 7 — Social and Community
- Tier 8 — Data, Analytics, Conversion
- Tier 9 — Monitoring and Intelligence
- Tier 10 — Workflow and Operations
- Tier 11 — Marketplace and Retail
- Tier 12 — International
- Tier 14 — Advanced and Immersive
Need this framework implemented on your site? See the Engine Optimization service or hire through ThatDevPro contact.
Top comments (0)