Joseph Anady

Posted on May 25

SEO-Search-appearence

#webdev #javascript #tutorial #productivity

Framework: SEO and Search Appearance Optimization

Comprehensive playbook for queries, pages, countries, devices, search appearance, keywords, keyword phrases, SEO, AEO, AIO, GEO, multi engine optimization, citation tracking, and audit. Single file, standalone, agent executable.

Document version: 2.0
Last updated: 2026-05-03
Maintained by: ThatDeveloperGuy
Authoring authority: Joseph Anady, SDVOSB, BA Computer Engineering CSU, MA Cybersecurity
File ID: framework-seo-search-appearance-v2
Supersedes: framework-searchappearance.md v1.0

0. Agent Operating Instructions

This file is designed to be read and executed by an AI agent (Claude Code, MEGAMIND, or any LLM with file system and shell access) AND read as reference by a human practitioner. Both audiences are first class.

0.1 How an Agent Reads This File

When an agent is handed this file with a build instruction, it walks the file in this order, never skipping ahead:

Read sections 0 through 3 fully (operating instructions, purpose, intake, theory).
Complete the intake in section 2 by gathering answers from the human operator or from prior context.
Run the stack selection decision tree in section 4 and record the assigned stack path.
Walk sections 5 through 8 (Phase 1 through Phase 4) in order. Do not advance until the completion gate at the end of each phase passes.
Jump to the assigned stack subsection in section 9 and execute the stack specific build steps.
Apply sections 10 through 14 in order (off page authority, crawler access, tracking, information gain, surface specific tuning).
Run the full audit in section 15. Output both markdown and JSON reports.
Schedule maintenance per section 16.
Reference appendices A through D as needed.

0.2 Phase Gates

Each phase ends with a completion gate. The gate is a list of conditions that must all be true before the next phase begins. The agent must not proceed when any gate condition fails. If a gate fails, the agent reports which condition failed, what the agent attempted, and what the operator must clarify or fix.

0.3 Standing Rules That Apply Everywhere

These rules apply to every phase, every stack, every output. Violating any of them fails the audit and must be remediated before sign off.

No dashes of any kind in written content. No em dashes, no en dashes, no hyphens used as punctuation. Use commas, periods, or rewrite the sentence. Compound modifiers in body copy are written without hyphens. Code, URLs, file names, and CSS class names retain hyphens because those are technical identifiers, not written content.
Static inline JSON LD only. No FastAPI sidecar. No external schema service. The MEGAMIND port 9090 sidecar pattern is deprecated for all client work.
No third party CDN or proxy in front of the site. No Cloudflare. No Akamai. No Fastly. All caching, performance, and HTTP/3 work happens at the Bubbles nginx layer.
Footer credit on every client site. The exact string "Crafted by ThatDeveloperGuy.com." appears in the footer of every client site Joseph builds.
Always validate and reload after Bubbles changes. Every nginx configuration change ends with nginx -t && systemctl reload nginx. Bulk HTML inserts use systemctl restart nginx.
All demo sites are static HTML, never JSX, never Python generated. Required demo tech stack is in section 9.1.
Pricing convention. Custom prices end in 7 (597, 797, 997, 1497, 1997, 2497, 2997). Monthly tiers are 250, 397, and 500. Client specified prices override the convention.
Use the right language for the job. HTML and CSS and JavaScript directly for web. Bash for scripting. Never use Python to generate HTML pages.
Email signature on outbound communication. Joseph Anady, ThatDeveloperGuy.com, admin@thatdeveloperguy.com, 505.512.3662.
Web development, SEO, AEO, AIO, GEO, and digital presence services only. Never mention computer repair in any client facing material.

0.4 Output Conventions

When this framework produces deliverables:

Markdown files use UTF 8 encoding, LF line endings, blank line at end of file.
JSON outputs are pretty printed with 2 space indentation.
JSON LD uses double quoted strings, no trailing commas, wrapped in <script type="application/ld+json"> for HTML embedding.
Filenames use lowercase letters, numbers, and hyphens. No spaces.
Audit reports use the naming convention audit-[domain]-[YYYYMMDD].md and audit-[domain]-[YYYYMMDD].json.

0.5 When the Agent Lacks Information

If the agent encounters a required intake field with no answer, the agent stops, lists the missing fields, and asks the operator. The agent does not invent values, does not pick defaults silently, and does not proceed with placeholders that look real.

0.6 Audit Mode Versus Build Mode

This file supports two modes of operation:

Build mode: agent is constructing or upgrading a site. Agent walks sections 0 through 16 in order.
Audit mode: agent is evaluating an existing site, fully built or partially installed. Agent walks sections 0 through 4 to gather context, then jumps to section 15 (Audit) and runs the full or partial audit. Remediation steps point back to the relevant phase or stack section.

The operator declares the mode at the start of the engagement. If the mode is unclear, the agent asks.

1. Document Purpose

This framework is the operational standard for making a website visible, citable, and conversion ready across the full 2026 search and answer engine landscape. It covers the traditional Google SERP, Google AI Overviews, Google AI Mode, Bing, DuckDuckGo, Yandex, Baidu, Brave Search, ChatGPT search, Perplexity, Claude with web access, Bing Copilot, and the People Also Ask, Knowledge Panel, image carousel, video carousel, and local pack surfaces.

It produces four primary outputs:

A keyword to page map that eliminates cannibalization and assigns one primary intent per URL.
A pillar and cluster content architecture that wins query fan out coverage across AI search engines.
A schema and technical implementation that earns rich results where eligible and provides AI trust signals everywhere else.
A citation, mention, and visibility tracking system that proves the work is moving the right metrics.

The framework is built for use across Joseph's full client portfolio (currently 130 plus production websites on Bubbles, plus headless and CMS clients), and it is built to be applied to any site type from a one page local service business to a multi state e commerce platform.

The framework does not chase rankings as the only metric. In 2026, ranking and citation have decoupled. A December 2025 Surfer SEO study of 173,902 URLs found that 68 percent of pages cited in AI Overviews are NOT in the top 10 organic results. An Ahrefs February 2026 study of 863,000 keywords found that only 38 percent of pages cited in AI Overviews also rank in the top 10, down from 76 percent seven months earlier. The framework therefore optimizes both targets in parallel and tracks them independently.

2. Client Variables Intake

The agent fills in this YAML block before doing anything else. Every field has a definition. If the operator does not know a value, the agent asks. Defaults are listed where reasonable.

# ============================================
# SEO AND SEARCH APPEARANCE INTAKE
# ============================================

# Section A: Identity
business_name: ""
primary_domain: ""
all_owned_domains: []                     # primary plus all redirects and aliases
brand_terms: []                           # exact strings users search for the brand
service_or_product_lines: []              # 3 to 7 main offerings
geographic_service_area:
  primary_city: ""
  primary_state: ""
  service_radius_miles: 0
  additional_metros: []
  national_service: false
target_audience_personas: []              # 1 to 4 primary buyer personas

# Section B: Mode and Engagement
engagement_mode: ""                       # build, rebuild, audit, partial_audit
engagement_scope: ""                      # full_site, single_page, content_cluster, technical_only
client_tier: ""                           # 597, 797, 997, 1497, 1997, 2497, 2997, custom
monthly_tier: ""                          # 250, 397, 500, none
sdvosb_relevant: false                    # SDVOSB cert applicable to client work

# Section C: Tech Stack
tech_stack_current: ""                    # static_html, sveltekit, nextjs, astro, hugo, wordpress, shopify_headless, shopify_standard, custom, none
tech_stack_preferred: ""                  # if rebuild, what we are moving to
hosting_environment: ""                   # bubbles, third_party, shopify_managed, other
bubbles_subdomain_or_path: ""             # if hosted on Bubbles, where it lives
ssl_status: ""                            # active, missing, wildcard
http_version: ""                          # 1.1, 2, 3
cms_required: false
sso_required: false

# Section D: Existing SEO Posture
google_search_console_verified: false
gsc_property_url: ""
bing_webmaster_tools_verified: false
yandex_webmaster_verified: false
baidu_zhanzhang_verified: false
google_analytics_4_active: false
google_business_profile_active: false
gbp_categories: []
gbp_review_count: 0
gbp_average_rating: 0.0

# Section E: AI Citation Baseline
current_ai_overview_citation_rate: 0     # percent of target queries where domain is cited
current_chatgpt_citation_rate: 0
current_perplexity_citation_rate: 0
current_aimode_citation_rate: 0
current_brand_mentions_in_aio: 0          # times brand name appears without citation
current_brand_mentions_in_chatgpt: 0

# Section F: Query Fan Out and Coverage
sub_query_coverage_audit_complete: false
fan_out_gap_count: 0                      # sub queries domain does not yet answer
topical_authority_score: 0                # 0 to 100 internal score
known_pillar_topics: []
known_cluster_topics: []

# Section G: Content Eligibility Baseline
total_indexed_pages: 0
average_content_freshness_days: 0
pages_with_published_dates: 0
pages_with_lastmod_accurate: 0
pages_with_author_schema: 0
pages_with_first_party_data: 0            # original research, surveys, benchmarks
average_word_count_pillar_pages: 0
average_word_count_cluster_pages: 0

# Section H: Distribution Footprint
earned_media_count_12mo: 0
podcast_appearances_12mo: 0
linkedin_articles_12mo: 0
youtube_brand_mentions: 0                 # videos referencing brand (own or third party)
reddit_brand_mentions: 0
wikipedia_entity_status: ""               # none, draft, live, contested, deleted

# Section I: Entity Clarity
wikidata_q_id: ""
sameas_count: 0                           # cross platform identity links count
google_knowledge_panel_status: ""         # present, partial, missing
organization_schema_present: false
person_schema_for_authors_present: false

# Section J: AI Crawler Access Posture
allow_gptbot: true
allow_oai_searchbot: true
allow_chatgpt_user: true
allow_perplexitybot: true
allow_claudebot: true
allow_google_extended: true
allow_bingbot: true
allow_yandexbot: true
allow_baiduspider: true
allow_amazonbot: true
allow_applebot: true
allow_meta_externalagent: true
allow_bytespider: false                   # ByteDance, often blocked
robots_txt_path: "/robots.txt"
llms_txt_present: false                   # emerging standard

# Section K: Compliance and Constraints
ymyl_content_present: false               # Your Money Your Life category
hipaa_relevant: false
pci_relevant: false
sox_relevant: false
gdpr_relevant: false
ccpa_relevant: false
sec_disclosure_relevant: false
ada_compliance_required: true             # default true for all client work
target_languages: ["en"]
hreflang_required: false

# Section L: Reporting Cadence
reporting_to_client_cadence: ""           # weekly, biweekly, monthly, quarterly
client_dashboard_required: false
dashboard_format: ""                      # markdown_email, looker_studio, custom_html

The agent does not move past this section until every field has a value or an explicit "unknown" with a reason.

3. 2026 Search and Answer Engine Theory

This section establishes the mental model. The strategy decisions in later sections only make sense if the agent and operator share this model.

3.1 The Three Surfaces Model

Search visibility in 2026 happens on three structurally different surfaces. Optimization rules differ across them.

Surface 1, Classic SERP. The original ten blue links, plus rich results, featured snippets, People Also Ask, image carousels, video carousels, local pack, knowledge panels, sitelinks, and direct answer boxes. This surface still exists on Google, Bing, DuckDuckGo, Yandex, Baidu, and Brave. It is increasingly compressed below AI Overviews on Google. CTR for the number one organic position has dropped up to 61 percent on queries that show AI Overviews.

Surface 2, AI Overviews and inline AI summaries. Google's AI Overviews appear at the top of about 48 percent of all search queries in Q1 2026, with coverage above 70 percent for informational and how to queries. Bing has similar inline AI summaries via Copilot. These overviews cite their sources with linked references, but 83 to 93 percent of queries showing AI Overviews end without a click. Sites cited in AI Overviews see roughly 35 percent more clicks than non cited top 10 results, and those visitors convert at about 23 times the rate of standard search traffic.

Surface 3, AI Mode and external answer engines. Standalone conversational interfaces with no blue links. Google AI Mode, ChatGPT, Perplexity, Claude with web access, Bing Copilot, Meta AI. Google AI Mode runs on Gemini 3 Pro since January 2026 and has 75 million daily active users. On these surfaces, you are either cited or invisible. There is no consolation prize for ranking.

Each surface has different volatility, different conversion economics, and different optimization priorities. Tracking and reporting must separate them.

3.2 Query Fan Out

The single most important 2026 concept. AI search engines do not search for the literal user query. They decompose the query into multiple parallel sub queries, run each one against the index simultaneously, then synthesize a single answer from the union of results.

Volumes by platform:

Google AI Mode: 9 to 16 sub queries per user prompt.
Google AI Overviews: 8 to 12 sub queries.
ChatGPT search: 2.3 to 2.8 sub queries.
Perplexity: 4 to 8 sub queries.
Claude with web access: 2 to 6 sub queries.

Volumes by industry (representative):

E commerce: 18 to 22 sub queries with 61 percent citation rate.
Healthcare: 22 to 28 sub queries with 48 percent citation rate (YMYL drag).
Finance: 16 to 20 sub queries with 52 percent citation rate.
Local services: 10 to 16 sub queries.

Strategic implication. A page that ranks moderately for ten related sub queries will outperform a page that ranks number one for the head term but appears nowhere for related queries. Brands optimizing only for traditional rankings miss roughly 88 to 90 percent of AI citation opportunities.

Practical method. For every primary keyword, the agent generates the fan out via three methods, in this order:

Manual inspection of Google AI Mode's exposed sub queries panel for the head term.
LLM prompting: "Generate 12 likely sub queries that an AI search system would run when given the prompt: [query]. Output as a JSON array. Include angle variations for cost, comparison, definition, how to, when, where, who, why, alternatives, recent changes, and pros and cons."
People Also Ask harvest from Google for the head term, recursively two levels deep.

The fan out becomes the input for cluster page topic assignment in Phase 2.

3.3 Citation Decoupling

In 2026, ranking on page one and being cited in AI answers are two separate goals. They overlap, but only partially.

38 percent of pages cited in AI Overviews also rank in the top 10 (down from 76 percent in mid 2025).
25 to 39 percent overlap between traditional Google rankings and AI search citations across platforms.
AI Mode and AI Overviews share only 13.7 percent citation overlap, even though both are Google products.
ChatGPT prefers focused shorter content; pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent.

The framework therefore tracks ranking and citation as distinct metrics and optimizes each independently.

3.4 Citation Volatility

Citations are probabilistic, not deterministic. The operator and the client must understand this from the start.

AI Overview content changes 70 percent of the time for the same query.
When AI Overviews regenerate, 45.5 percent of citations get replaced.
AI Mode self overlap on the same query run three times: 9.2 percent.
Less than 1 in 100 chance that ChatGPT or Google AI gives the same brand list twice across 100 runs.

The goal is to maximize citation probability across many queries and many runs, not to lock down a single result. The framework optimizes for citation rate, not citation in any single instance.

3.5 Mention Versus Citation

Two different visibility events. Both matter. Track both.

Citation: the AI system links to your URL as a source. AI Mode cites sources 76.3 percent of the time. AI Overviews cite 84.9 percent of the time.

Mention: the AI system names your brand in the answer text without linking. AI Mode mentions brands 37.6 percent of the time. AI Overviews mention 61 percent.

Mentions correlate strongly with off site brand authority signals: YouTube references, podcast appearances, Reddit threads, earned media. Citations correlate more with on site signals: schema accuracy, content extractability, freshness, and topical depth.

3.6 SEO, AEO, AIO, GEO, LLMO Vocabulary

Multiple acronyms describe overlapping concepts. The framework uses the following definitions consistently:

SEO (Search Engine Optimization): the foundational discipline of making a website crawlable, indexable, relevant, authoritative, and fast. Still required. Without SEO, none of the others work.

AEO (Answer Engine Optimization): optimizing content to win direct answer slots. Featured snippets, People Also Ask, knowledge panels, voice assistant answers. Output is your page being the answer source on a classic SERP.

AIO (AI Overview Optimization): optimizing specifically for Google's AI Overviews. Heavily favors pages already ranking in the top 10 organic, plus pages with extractable structure and clear entity signals.

GEO (Generative Engine Optimization): optimizing to be cited by generative AI systems. Broader than AIO. Covers ChatGPT, Perplexity, Claude, Bing Copilot, Meta AI, Google AI Mode, and AI Overviews. GEO includes off site signals (earned media, brand mentions, YouTube presence) that AIO does not.

LLMO (Large Language Model Optimization): the technical subset of GEO that focuses on how LLMs retrieve, parse, and cite content. Robots.txt access, llms.txt, schema as trust signal, content structure for retrieval augmented generation pipelines.

Search Everywhere Optimization: the umbrella concept covering SEO plus AEO plus AIO plus GEO plus social search (Reddit, TikTok, YouTube, Pinterest) plus voice search. The framework's output is Search Everywhere Optimization in practice, even though the framework itself is structured around the more granular disciplines.

The operator does not need to memorize these distinctions to follow the framework. The phases below cover all of them. The vocabulary exists to align with industry terminology when communicating with clients and reading source material.

3.7 The 2026 Numbers Worth Memorizing

These statistics anchor every strategic decision in the framework. Sources noted in Appendix D.

AI Overviews appear on about 48 percent of all Google searches (Q1 2026).
AI Overviews appear on 70 plus percent of informational and how to queries.
Organic CTR drops up to 61 percent on AI Overview queries.
Cited pages get a 35 percent click lift versus non cited top 10 results.
Cited pages convert at about 23 times the rate of standard search visitors.
93 percent of AI Mode queries result in zero clicks.
91.8 percent of all searches are long tail (3 plus words).
58.5 percent of all searches now end without a click to any external site.
44.2 percent of all LLM citations come from the first 30 percent of a page's text.
Pages above 20,000 characters average about 10 AI citations each. Pages under 500 characters average 2.39.
AI cites pages that are 25.7 percent fresher than traditional search surfaces.
Earned media distribution can lift AI citations by up to 325 percent versus owned site only.
YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility.
Long tail keywords convert at 2.5 times the rate of head terms.

3.8 Phase Gate for Section 3

Before moving to Phase 1 (section 5), confirm:

The operator and the client both understand the three surfaces model.
The operator can explain query fan out in one paragraph.
The operator has set client expectations on citation volatility (citations are probabilistic).
The operator understands the SEO plus AEO plus GEO layered model.
The agent has read and internalized section 3.7's numbers.

If any condition fails, return to that subsection.

4. Stack Selection Decision Tree

The agent walks this tree at the start of every build engagement. The output is exactly one assigned stack path. The agent then jumps to the matching subsection of section 9.

4.1 Inputs

Pulled from section 2 intake:

engagement_mode
tech_stack_current
tech_stack_preferred
hosting_environment
cms_required
sso_required
engagement_scope
service_or_product_lines (does it include e commerce?)

4.2 The Decision Tree

START
  |
  +-- Q1: Is this an audit only engagement?
  |     YES -> Skip stack selection. Jump to section 15 (Audit Mode).
  |     NO  -> continue
  |
  +-- Q2: Is the client already on a stack we cannot replace?
  |     (existing client investment, contractual hosting, internal team requirement)
  |     YES -> Use the existing stack. Match it to section 9 subsection.
  |             - WordPress  -> 9.5
  |             - Shopify standard -> 9.7
  |             - Headless Shopify -> 9.6
  |             - Other custom -> 9.8
  |     NO  -> continue
  |
  +-- Q3: Is this primarily an e commerce site?
  |     YES -> continue to e commerce branch
  |     NO  -> continue to non e commerce branch
  |
  +-- E COMMERCE BRANCH
  |     Q4: Does the client have existing Shopify investment or specifically request it?
  |       YES, with custom frontend needs -> Headless Shopify (9.6)
  |       YES, simple catalog -> Standard Shopify (9.7)
  |       NO -> Static HTML on Bubbles with Stripe integration (9.1) for catalogs under 50 SKUs
  |             OR Headless Shopify (9.6) for catalogs over 50 SKUs
  |
  +-- NON E COMMERCE BRANCH
  |     Q5: Does the client need a CMS for non technical editors?
  |       YES -> WordPress (9.5)
  |       NO  -> continue
  |
  |     Q6: Does the client need server side rendering for personalization or auth?
  |       YES, modern app feel -> SvelteKit on Bubbles (9.2) or Next.js on Bubbles (9.3)
  |       NO  -> continue
  |
  |     Q7: Is the site primarily content (blog, documentation, knowledge base)?
  |       YES -> Hugo or Astro static (9.3 / 9.4)
  |       NO  -> Static HTML on Bubbles (9.1) [DEFAULT]
  |
END

4.3 Default and Tiebreaker

When the tree produces a tie or the client expresses no preference, the default is Static HTML on Bubbles (section 9.1). Reasons: maximum control, fastest performance, simplest schema injection, no build pipeline complexity, lowest hosting cost, no JavaScript framework version churn, ideal for AI crawler reading mode.

4.4 Document the Decision

The agent records the assigned stack path in the engagement notes:

stack_decision:
  assigned_path: ""                       # one of 9.1 through 9.8
  decision_rationale: ""                  # one paragraph explaining the tree path taken
  fallback_paths: []                      # alternates if the primary cannot be applied
  decided_by: ""                          # agent_auto, joseph_manual, client_required
  decided_at: ""                          # ISO 8601 timestamp

4.5 Phase Gate for Section 4

Before moving to Phase 1, confirm:

Exactly one stack path is assigned.
The decision rationale is documented in plain language.
The hosting environment matches the assigned stack.

5. Phase 1: Intent and Sub Query Research

Purpose: produce a complete, intent classified, sub query mapped keyword universe for the client.

5.1 Step 1, Seed Generation

The agent generates seed keywords from five sources, in this order:

Client interview answers. From section 2: brand_terms, service_or_product_lines, target_audience_personas. Each item becomes one or more seed keywords.
Existing GSC queries. If GSC is verified, export the last 16 months of query data. Every query with at least one impression is a candidate seed. Anonymized queries are excluded.
Competitor reverse engineering. Pick three to five direct competitors. Use Ahrefs, Semrush, or free alternatives (Ubersuggest, SERP Ninja) to extract their top ranking and top trafficked keywords.
Customer language harvest. Pull from sales transcripts, support tickets, contact form messages, Yelp and Google reviews of the client and competitors. Customer phrasing is the most accurate seed source for transactional and informational intent.
Industry vocabulary scan. Read the top three industry publications, the top two trade associations, and the top two regulatory bodies for the client's vertical. Extract terms of art that buyers and decision makers use.

Output: a seed list of 30 to 200 terms, captured in seeds.csv with columns term, source, intent_guess, notes.

5.2 Step 2, Expansion

For each seed, expand using:

Google Autocomplete Alphabet Soup. Type the seed plus each letter a through z. Capture every suggested completion. Repeat with the seed followed by question modifiers (who, what, when, where, why, how, can, does, is, are).
People Also Ask harvest. Search the seed in Google. Click each PAA box twice to expand the tree. Capture all questions revealed.
Related Searches. Capture the suggestions at the bottom of the SERP.
Google AI Mode sub query inspection. Run the seed in AI Mode. Inspect the sub queries panel. Capture all listed sub queries.
LLM expansion prompt. Send the following prompt to a strong LLM (Claude Opus, GPT 5, Gemini 3 Pro):

Generate 15 likely sub queries that an AI search system would run
when given the prompt: "[SEED]". Cover these angles when applicable:
cost, comparison, definition, how to, when, where, who, why,
alternatives, recent changes, pros and cons, common mistakes,
local variations, regulatory considerations, examples.
Output as a JSON array of strings. Plain text only, no commentary.

Reddit and Quora scrape. Search the seed on Reddit and Quora. Capture the literal question phrasings of top posts.

Output: an expanded keyword universe of 500 to 5,000 terms, captured in keywords-raw.csv with columns term, source, parent_seed, length_words, language.

5.3 Step 3, Intent Classification

Every keyword gets exactly one intent label from this expanded eight type taxonomy. The four type model (informational, navigational, commercial, transactional) is preserved as a parent classification, with finer subtypes underneath.

Informational (parent)

info_definition: what is X. who is X. define X.
info_howto: how to do X. step by step X.
info_explanation: why does X happen. how does X work.
info_comparison_neutral: X versus Y as a learning question, not a buying decision.

Navigational (parent)

nav_brand: brand name search, login pages, official site lookup.
nav_branded_product: brand plus specific product or service name.

Commercial (parent)

comm_research: best X for Y. top X. X reviews. X alternatives.
comm_comparison_buying: X versus Y when the searcher is choosing between two specific options.
comm_pricing: how much does X cost. X price. X cost in 2026.

Transactional (parent)

trans_buy: buy X. order X. X near me with intent to purchase.
trans_book: book X. schedule X. appointment for X.
trans_contact: hire X. contact X. quote for X.

Local (parent, can combine with the above)

local_modifier: any keyword with a city, neighborhood, ZIP, or "near me" modifier. Tag with the geographic modifier in a separate column.

YMYL (Your Money Your Life, parent flag)

ymyl: any keyword in finance, health, legal, safety, or major life decision categories. Tag separately. YMYL keywords have stricter E E A T expectations.

Output: keywords-classified.csv with columns term, intent, intent_subtype, ymyl_flag, geo_modifier, length_words, language, parent_seed.

5.4 Step 4, Volume and Difficulty Enrichment

For each classified keyword, enrich with:

Search volume estimate. From Ahrefs, Semrush, Ubersuggest, or Google Keyword Planner. Mark zero volume keywords explicitly. Zero volume does not mean zero value (see 5.6).
Keyword difficulty (KD). Provider scoring (Ahrefs KD, Semrush KD, Moz Difficulty). Normalize to a 0 to 100 scale.
CPC. Provider average cost per click in USD. Useful as a commercial intent proxy.
SERP feature presence. Featured snippet present yes/no, AI Overview present yes/no, People Also Ask present yes/no, video carousel present yes/no, image carousel present yes/no, local pack present yes/no, knowledge panel present yes/no.
Top 10 competitor domain list. Capture the ten domains currently ranking, plus their domain authority. This drives the citation worthiness score in 5.5.

Output: keywords-enriched.csv adding columns volume, difficulty, cpc, has_featured_snippet, has_ai_overview, has_paa, has_video_carousel, has_image_carousel, has_local_pack, has_knowledge_panel, top10_domains, top10_avg_da.

5.5 Step 5, Citation Worthiness Scoring

For each keyword, the agent computes a 0 to 100 citation worthiness score. This score predicts how likely the client can win a citation given current authority.

Components and weights:

Domain authority match (0 to 30): if client DA is within 10 points of the average top 10 DA, score 30. If within 20, score 20. If within 30, score 10. Beyond 30, score 0.
Content depth match (0 to 30): if the client's existing content on this topic is within 25 percent of the average top 10 word count, score 30. Within 50 percent, score 20. Within 75 percent, score 10. Beyond, score 0. New site: score 0.
Freshness match (0 to 20): if the client has updated content on this topic within the last 90 days, score 20. Within 180 days, score 15. Within 365 days, score 10. Older, score 0. No content yet, score 5 (greenfield is better than stale).
Entity clarity (0 to 20): if the client has Wikidata Q ID, Organization schema, and at least three sameAs links, score 20. Each missing element subtracts 7 points.

Composite score at or above 60 means pursue first. Scores 40 to 59 are second tier, pursue after the first tier is shipped. Scores below 40 are deferred until authority improves.

Output: keywords-prioritized.csv adding column citation_worthiness_score, sorted descending.

5.6 Step 6, Zero Volume Keyword Treatment

Zero volume keywords are NOT discarded. In 2026, many high intent and AI critical queries register as zero volume in keyword tools because the tools cannot see the long tail and conversational queries that AI systems handle.

The agent retains zero volume keywords if any of the following are true:

The keyword is a literal sub query from a fan out for a higher volume head term.
The keyword matches the client's customer language harvest from 5.1.
The keyword is a long tail variation of a known commercial intent term.
The keyword has high specificity that signals purchase or contact intent (model numbers, locations, dated events).

Zero volume keywords get a flag zv_high_intent in the prioritized output and are mapped to cluster pages in Phase 2.

5.7 Step 7, Sub Query Mapping

For each priority keyword from 5.5, the agent generates the fan out per section 3.2 method. The fan out is recorded as:

keyword: "[primary keyword]"
intent: "[from 5.3]"
fanout:
  - sub_query: ""
    intent_subtype: ""
    answered_by_existing_url: ""        # path or empty
    target_url: ""                      # to be assigned in Phase 2
    answer_word_budget: 0               # estimated words needed for adequate answer

This file is the input for Phase 2.

5.8 Phase 1 Completion Gate

Before Phase 2, confirm:

keywords-prioritized.csv exists with at least 200 keywords (or all keywords for very small businesses).
Every keyword has an intent classification, a citation worthiness score, and a zero volume flag.
The top 30 priority keywords each have a fan out file.
The operator has reviewed the top 30 priority keywords and signed off.

If gate fails, identify which step's output is missing and complete that step.

6. Phase 2: Pillar Cluster Mapping with Cannibalization Detection

Purpose: assign every priority keyword and every fan out sub query to exactly one URL on the site, preventing cannibalization and building a topic cluster architecture that wins fan out coverage.

6.1 Step 1, Pillar Topic Identification

From the prioritized keyword list, identify pillar topics. A pillar topic is a head subject area with at least 8 supporting cluster topics. Typical client portfolio pillars:

Service line pillars (one per service_or_product_lines item).
Geographic pillars (one per major city or region served).
Audience pillars (one per primary persona).
Compliance or methodology pillars (when relevant: HIPAA, OSHA, SDVOSB contracting, etc.).

Rule: a pillar covers a topic broad enough to support 3,000 to 5,000 words at the head level, with enough sub topics to host 8 to 12 cluster pages.

For Joseph's typical client mix, expect 3 to 8 pillars per site. Heavy programmatic sites (real estate, legal directories) can have many more.

Output: pillars.yaml:

pillars:
  - id: ""
    title: ""
    primary_keyword: ""
    target_url: ""
    estimated_word_count: 0               # 3000 to 5000
    supporting_clusters: []               # IDs of clusters that point to this pillar
    related_pillars: []                   # IDs of sibling pillars

6.2 Step 2, Cluster Page Assignment

Each pillar gets a list of cluster pages. Each cluster page targets one primary sub query and 2 to 4 related secondary sub queries from the fan out.

Rules:

One primary keyword per cluster page.
One primary sub query per cluster page.
The cluster page primary keyword must NOT be the same as its pillar's primary keyword.
Each cluster page is 800 to 2,500 words depending on sub query depth.
Each cluster page links back to its pillar with descriptive anchor text including the pillar's target keyword.

Output: clusters.yaml:

clusters:
  - id: ""
    pillar_id: ""
    title: ""
    primary_keyword: ""
    primary_sub_query: ""
    secondary_sub_queries: []
    target_url: ""
    estimated_word_count: 0
    intent: ""
    intent_subtype: ""
    citation_worthiness_score: 0

6.3 Step 3, Existing URL Reconciliation

For sites that are not greenfield, the agent maps every existing URL to either a pillar, a cluster, or a "to deprecate" bucket.

For each existing URL:

Crawl the page and extract its actual primary keyword (from H1, title tag, and most prominent body text).
Match it to the pillar or cluster whose primary_keyword aligns most closely.
If no match within reasonable similarity, mark the URL for review: either re scope it to fit a cluster, consolidate it into another page, or deprecate it.

Output: url-reconciliation.csv with columns existing_url, current_primary_keyword, assigned_pillar_or_cluster_id, action (keep, rescope, consolidate, deprecate), redirect_target, notes.

6.4 Step 4, Cannibalization Detection

For every primary keyword in the prioritized list, the agent runs the cannibalization check.

Method A, GSC export based (preferred when GSC is available):

Export the last 90 days of GSC Performance data filtered to pages.
For each keyword, list all URLs that have at least one impression for that query.
For URL pairs competing on the same query, compute the Herfindahl Hirschman Index (HHI) of click distribution.
HHI below 0.5 (clicks split nearly evenly) flags the keyword as cannibalized.
HHI between 0.5 and 0.7 is borderline, flag for review.
HHI above 0.7 (one URL dominates) is acceptable.

Method B, manual SERP check (when GSC is unavailable):

Run a site:[domain] [keyword] search in Google.
If two or more URLs from the client's domain appear in the top 10, flag for review.
Inspect each result. If both URLs target the same primary keyword, cannibalization is confirmed.
If both URLs target the same query but for clearly different intents (one informational, one transactional), this is allowed and tracked as intentional dual targeting.

Method C, semantic similarity scan (preventive):

For every URL pair in the cluster map, compute embedding similarity using OpenAI or sentence transformer embeddings.
Pairs above 0.85 cosine similarity are flagged as semantically overlapping.
Investigate manually. Either differentiate the angle, consolidate, or accept and track.

Output: cannibalization-flags.csv with columns keyword, url_a, url_b, hhi, similarity, recommended_action, status.

6.5 Step 5, Cannibalization Remediation

For every flagged cannibalization, the agent applies one of these five fixes:

301 redirect consolidation. When two URLs target the same primary keyword and intent, merge content into the stronger page and 301 redirect the weaker URL. This is the default when in doubt.
Re optimization. When two URLs cover related but genuinely distinct intents, rewrite each to clearly own its sub intent. Update headings, primary keyword usage, and internal links.
Canonical tag. When both URLs must remain accessible (product variants, technical near duplicates), set rel=canonical on the weaker URL pointing to the stronger. Note: Google treats canonicals as hints, not directives. Use only when 301 is not viable.
Noindex. When a page must remain on the site for users (internal reference, form thank you page) but should not compete in search, add <meta name="robots" content="noindex,follow">. Verify in GSC after deployment.
Delete and redirect. When a page is thin, outdated, and has no remaining utility, delete it and 301 redirect to the most relevant surviving page. Submit the URL removal in GSC for faster deindexing.

Output: cannibalization-remediation-log.csv with columns flagged_keyword, action_taken, executed_at, by, notes.

6.6 Step 6, Sub Query Coverage Map

For every priority pillar, build a sub query coverage map showing which sub queries are answered by which URLs.

pillar_id: ""
total_sub_queries: 0
sub_queries_answered: 0
coverage_rate: 0.0                        # answered / total
sub_query_assignments:
  - sub_query: ""
    assigned_url: ""
    answer_strength: ""                   # full, partial, missing
    word_count_dedicated: 0
    last_updated: ""

Target: 70 percent or higher coverage rate for top 10 priority pillars before Phase 3 begins.

6.7 Step 7, Internal Linking Plan

The cluster architecture requires intentional internal linking. The agent produces an internal linking plan:

Every cluster page links to its pillar with descriptive anchor text including the pillar's primary keyword.
Every pillar links to every cluster page in its cluster, organized by sub topic.
Cluster pages within the same pillar link to each other where the topics relate.
Cluster pages link to clusters in sibling pillars where relevant.
The homepage links to every pillar.
The footer links to top three to five pillars (mega footer pattern).
Breadcrumbs are present on every cluster page.

Output: internal-linking-plan.csv with columns from_url, to_url, anchor_text, link_type (pillar, cluster, sibling, footer, breadcrumb), priority.

6.8 Phase 2 Completion Gate

Before Phase 3, confirm:

pillars.yaml exists with at least one pillar.
clusters.yaml exists with at least 8 clusters per pillar.
url-reconciliation.csv has every existing URL mapped or marked for action.
cannibalization-flags.csv shows zero remaining unresolved flags.
Sub query coverage rate is 70 percent or higher across top 10 priority pillars.
internal-linking-plan.csv exists and has been reviewed.

If gate fails, identify the failed step and remediate.

7. Phase 3: Page Structure for Multi Surface Extraction

Purpose: every page is structured so a featured snippet, an AI Overview, an AI Mode citation, a People Also Ask answer, a voice assistant answer, or a knowledge panel can extract the right content cleanly.

7.1 The Mandatory Page Skeleton

Every content page (pillar or cluster) follows this skeleton. Deviations require justification.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>[Primary Keyword] | [Brand Name]</title>
  <meta name="description" content="[Direct answer in 150 to 160 characters. Includes primary keyword in first 90 characters.]">
  <link rel="canonical" href="[full URL]">

  <meta property="og:title" content="[Title]">
  <meta property="og:description" content="[Same as meta description or expanded variant]">
  <meta property="og:url" content="[URL]">
  <meta property="og:image" content="[Image URL]">
  <meta property="og:type" content="[website, article, product, etc.]">
  <meta property="og:site_name" content="[Brand Name]">

  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:title" content="[Title]">
  <meta name="twitter:description" content="[Description]">
  <meta name="twitter:image" content="[Image URL]">

  <!-- Inline JSON-LD blocks here. See Appendix C. -->
</head>
<body>
  <header>...</header>

  <nav aria-label="Breadcrumb">
    <ol>
      <li><a href="/">Home</a></li>
      <li><a href="/[pillar]">[Pillar Name]</a></li>
      <li aria-current="page">[Cluster Name]</li>
    </ol>
  </nav>

  <main>
    <article>
      <h1>[Primary Keyword Phrased as a Headline]</h1>

      <p class="lede"><strong>[40 to 60 word direct answer.
      Stated immediately, no preamble, no "in this article we will discuss".
      Includes the primary keyword naturally.]</strong></p>

      <p>[2 to 3 sentence expansion of the answer with one supporting fact or data point.]</p>

      <h2>[Question matching primary sub query 1]</h2>
      <p>[40 to 60 word direct answer to that sub question.]</p>
      <p>[Supporting context, 100 to 200 words.]</p>

      <h2>[Question matching primary sub query 2]</h2>
      <ol>
        <li>[Step 1]</li>
        <li>[Step 2]</li>
        <li>[Step 3]</li>
      </ol>

      <h2>[Question matching primary sub query 3]</h2>
      <table>
        <thead>
          <tr><th>[Comparison Header]</th><th>Option A</th><th>Option B</th></tr>
        </thead>
        <tbody>
          <tr><td>[Feature]</td><td>[Value]</td><td>[Value]</td></tr>
        </tbody>
      </table>

      <h2>[Additional sub queries as H2]</h2>
      <p>[Self contained answer.]</p>

      <h2>Frequently Asked Questions</h2>
      <h3>[FAQ Question 1]</h3>
      <p>[40 to 60 word answer]</p>
      <h3>[FAQ Question 2]</h3>
      <p>[40 to 60 word answer]</p>

      <aside class="author-bio">
        <p>Written by [Author Name], [credentials, role].
        [One sentence biography establishing relevant experience.]
        Last updated [Month Day, Year].</p>
      </aside>
    </article>
  </main>

  <footer>
    <!-- Footer with Crafted by ThatDeveloperGuy.com link -->
  </footer>
</body>
</html>

7.2 The 40 to 60 Word Direct Answer

The lede paragraph is the single most important element on the page for AEO and AIO. Rules:

Exactly 40 to 60 words. Count them.
States the answer to the primary keyword question completely.
Uses the primary keyword in the first sentence, naturally.
No preamble. No "this guide will cover". No "before we dive in". Start with the answer.
Wrapped in <p class="lede"><strong>...</strong></p> for visual emphasis and a parsing signal.

Example for "what is answer engine optimization":

Answer engine optimization (AEO) is the practice of structuring web content so that AI driven answer systems can extract, summarize, and cite it as the source of a direct answer. Unlike traditional SEO which optimizes for ten blue links, AEO targets featured snippets, AI Overviews, voice assistants, and conversational AI engines that synthesize answers from multiple sources.

That is 56 words. It defines the term, distinguishes it from SEO, and lists the surfaces. An AI Overview can extract any sentence and have it work standalone.

7.3 The 30 Percent Rule

44.2 percent of all LLM citations come from the first 30 percent of a page's text. Whatever the operator most wants cited must appear in the first third.

Practical application:

For a 2,000 word cluster page, the first 600 words must contain the lede, the answer to the primary sub query, and at least one piece of original data or insight.
Save methodology, edge cases, and deep examples for the back half.
Never bury the primary answer below an introduction or context section.

7.4 Heading as Question Pattern

Every H2 is phrased as a literal natural language question that matches a known sub query.

Wrong: Pricing
Right: How much does a custom website cost in 2026?

Wrong: Benefits
Right: Why does answer engine optimization matter for small businesses?

Wrong: Implementation
Right: How do you install schema markup on a static HTML site?

This pattern dramatically increases the probability of extraction because AI systems match heading text to sub query phrasing. The H2 becomes the citable anchor.

7.5 Self Contained Section Design

Every section must make sense without surrounding context. AI systems often extract individual sections from longer articles, so cross references break extraction.

Forbidden:

"As we discussed above..."
"See the next section for..."
"Building on the prior point..."

Required:

Each H2 section restates necessary context in one sentence if needed.
Definitions of jargon happen in the section that uses the jargon, not earlier.
Every paragraph could be quoted standalone and remain accurate.

7.6 Information Gain Elements

Every page needs at least one element that other pages on the same topic do not have. Without information gain, the page is a paraphrase of the competition and earns zero citations.

Acceptable information gain elements (pick at least one, ideally two):

Original survey or benchmark data. Run a survey through your client base or your own portfolio. Even 30 responses produces citable data nobody else has.
Internal case study with real numbers. "We optimized X for [Client]. CTR moved from 2.1 percent to 5.7 percent over 90 days." Specific, verifiable.
Proprietary framework or model. Joseph's SEO BUILD REFERENCE itself is an information gain asset. Frameworks become citable references.
Calculator or interactive tool. A pricing calculator, a savings estimator, a compatibility checker. Drives both citations and conversions.
Side by side comparison nobody has published. Compare three or more options on a dimension that prior articles ignored.
First hand experience narrative. "I tested X for 60 days. Here is what happened by week." E E A T's Experience signal.
Data visualization built from public data. Take a public dataset and present it in a way nobody else has. Cited often by AI systems.

The information gain element appears in the first 30 percent of the page so it lands inside the citation extraction zone.

7.7 The Dual Extraction Layer

Provide critical answers in two formats simultaneously:

Prose for AI Overviews and conversational AI engines.
A list, table, or structured block for traditional featured snippets and rich results.

Example, "what are the steps to implement schema markup":

Prose first (40 to 60 word answer):

To implement schema markup, identify the schema type that matches your page (Article, Product, LocalBusiness, etc.), generate the JSON LD using a validator like Schema.org's tool, embed the JSON LD in the page head, validate with Google's Rich Results Test, then submit the URL to Search Console for re crawl.

Then numbered list:

Identify the schema type matching the page.
Generate the JSON LD using Schema.org's validator.
Embed the JSON LD in the page head as a script tag.
Validate with Google's Rich Results Test.
Submit the URL to Google Search Console for re crawl.

Both formats live on the same page. The prose wins AI Overview citations. The list wins featured snippets.

7.8 Author Attribution

Every content page has visible author attribution. Required elements:

Author full name as a clickable link to the author's bio page.
One sentence credential statement (degree, role, years of experience, certification).
Last updated timestamp in human readable format AND ISO 8601 format embedded in dateModified of Article schema.
Optional but recommended: photograph of the author, link to LinkedIn or other professional profile, link to Wikidata Q ID if available.

Joseph's author byline pattern:

Written by Joseph Anady, SDVOSB owner of ThatDeveloperGuy. BA Computer Engineering, Colorado State University. MA Cybersecurity. Service Disabled Veteran. Last updated [Month Day, Year]. Author profile.

7.9 Lists, Tables, and Semantic HTML

Use native HTML elements for structured content. Styled <div> blocks do not parse for AI systems.

Ordered steps: <ol><li>...</li></ol>. Never <div class="step">.
Bullet lists: <ul><li>...</li></ul>.
Comparisons: <table><thead><tr><th>...</th></tr></thead><tbody>...</tbody></table>. Tables limited to 3 to 6 rows and 2 to 4 columns for snippet extraction.
Code: <pre><code>...</code></pre>.
Definitions: <dl><dt>Term</dt><dd>Definition</dd></dl>.
Quotes: <blockquote>...</blockquote> with <cite>Source</cite>.

7.10 Image and Video Treatment

Visual content earns image and video carousel placement when treated correctly:

Every image has descriptive alt text including relevant keyword in natural phrasing.
Filenames are kebab case and descriptive: quarterly-tax-payment-calendar-2026.png, not IMG_9374.png.
Width and height attributes on every <img> tag to prevent layout shift.
Loading lazy on below the fold images: loading="lazy".
Decoding async on all images: decoding="async".
For videos, use <video> element with poster image, or embed YouTube with VideoObject schema.
WebP or AVIF format with PNG or JPEG fallback.
ImageObject schema on hero images for the page.

7.11 Mobile First and Core Web Vitals

Every page must pass Core Web Vitals on mobile:

Largest Contentful Paint (LCP): under 2.5 seconds.
Interaction to Next Paint (INP): under 200 milliseconds.
Cumulative Layout Shift (CLS): under 0.1.

Joseph's standing infrastructure on Bubbles already handles much of this through nginx HTTP/3, Brotli compression, and proper caching headers. Phase 3 page work must not introduce regression. Validation: PageSpeed Insights, Chrome DevTools Lighthouse, real device testing on at least one mid range Android.

7.12 Phase 3 Completion Gate

Before Phase 4, confirm:

Every priority page has a 40 to 60 word lede in the first paragraph.
Every priority page has at least one information gain element in the first 30 percent.
Every H2 on every priority page is phrased as a question matching a sub query.
Every priority page has visible author attribution.
Every priority page passes Core Web Vitals on mobile.
No <div class="step"> or styled non semantic structures in extraction zones.

If gate fails, list which pages fail which criteria and remediate page by page.

8. Phase 4: Schema and Technical Implementation

Purpose: every page emits accurate, validated structured data that earns rich results where eligible and feeds AI systems with trust signals everywhere else.

8.1 The 2026 Schema Reality

By March 2026, the schema landscape has shifted significantly. The agent must internalize what works and what does not before writing any JSON LD.

Deprecated for rich results (still safe to keep but produces no visible result):

HowTo (deprecated September 2023, removed from desktop and mobile)
Practice Problem (deprecated January 2026)
Dataset for general search (now only serves Dataset Search)
Sitelinks Search Box (integrated into core search)
SpecialAnnouncement (COVID era, deprecated)
Q and A (deprecated January 2026)
Book Actions, Course Info, Claim Review, Estimated Salary, Learning Video, Vehicle Listing (all retired)

Restricted to specific verticals only:

FAQPage: rich results restricted primarily to government and authoritative health sites. Implementation still recommended for AI systems as a trust signal, just do not expect a SERP rich result for general business sites.

Demoted in March 2026 core update:

Review schema on editorial comparison posts (use only on pages where the review is the primary content).
FAQ on supplementary page sections (use only when FAQ is the primary purpose).

Fully supported and recommended:

Organization (foundational, always implement)
LocalBusiness (and its subtypes for service businesses)
Person (for authors and team members)
Article and NewsArticle and BlogPosting
Product with Offers and AggregateRating
Review (first party only, on primary content)
Event
Recipe (food only)
Video and VideoObject
BreadcrumbList (universal, always implement)
Service (for service businesses)
WebSite
WebPage
AboutPage, ContactPage, FAQPage (as page type signals)
ItemList (for category pages)
ImageObject (for hero images)

8.2 The Schema as Trust Signal Shift

Even when a schema type no longer triggers a rich result, the markup still matters. Google AI Mode reads structured data as a trust and entity verification signal during answer synthesis. Accurate schema increases citation probability independent of rich result eligibility.

Therefore: implement comprehensive, accurate schema on every page even when no rich result will display. Do not strip schema just because the rich result feature was deprecated. Strip only when the schema is inaccurate or misrepresentative.

8.3 The Mandatory Schema Set for Every Site

Every site Joseph builds emits, at minimum:

Organization (sitewide, in head of every page or at minimum the homepage).
WebSite (sitewide, with potentialAction for site search if applicable).
BreadcrumbList (every page below the homepage).
LocalBusiness or appropriate subtype (when the client has a physical service area).
Person (one block per author or team member, on the bio page and embedded in Article schema author field).
Article or BlogPosting or NewsArticle (every content page).
Service (every service page).
Product with Offers (every e commerce product page, when applicable).
WebPage type signal (every page, indicating AboutPage, ContactPage, FAQPage as appropriate).

Optional based on content:

Event when the page describes a dated event.
Recipe when the page is a recipe.
Review and AggregateRating when the page contains genuine first party reviews.
VideoObject when the page contains a hero video.
ImageObject for hero images.

8.4 JSON LD Format Standard

All schema is JSON LD, embedded as <script type="application/ld+json"> blocks in the page head (or body for page specific schema, head for sitewide schema).

Format requirements:

Double quoted strings, no single quotes.
No trailing commas.
2 space indentation when human readable. Minified for production if performance dictates, but readability defaults to formatted.
UTF 8 encoding.
Use @graph arrays to combine multiple schema types in a single script block when they share context.

The MEGAMIND port 9090 sidecar pattern is deprecated. All schema is static inline JSON LD generated at build time or hand authored, never injected by a runtime sidecar.

8.5 Schema Nesting and the @graph Pattern

For pages that emit multiple schema types, use the @graph array pattern. Full code reference is in Appendix C, section 19.1. The pattern keeps related entities connected through @id references, which AI systems use for entity resolution.

8.6 The Validation Pipeline

Every schema deployment passes through this validation pipeline. No exceptions.

Schema.org validator (https://validator.schema.org/). Catches type and property errors.
Google Rich Results Test (https://search.google.com/test/rich-results). Catches rich result eligibility issues.
Content alignment check. Every value in the schema appears in the visible page content. Schema describes what is on the page, not what we wish were on the page. Misaligned schema can trigger manual actions.
Date freshness check. dateModified reflects the actual last meaningful content change. datePublished is the original publication date.
Quarterly deprecation recheck. Run the full validation pipeline against the Search Central blog's announced changes every quarter.

The agent runs the full pipeline on every page after deployment. Failures must be resolved before the page is considered complete.

8.7 Common Schema Mistakes to Avoid

Marking up content that is not visible on the page (FAQ answers in schema but not in body).
Using FAQ schema on pages where FAQ is a small supplementary section (will be ignored or manually demoted).
Stale dateModified after content changes.
Missing required properties (Article without author, Product without offers, LocalBusiness without address).
Mismatched data (schema says price 49, page shows 59).
Multiple @type Organization blocks on the same page without @id differentiation.
Ignoring sameAs (omitting sameAs weakens entity resolution for AI systems).
Using deprecated schema types (Practice Problem, Q and A, etc.) and expecting rich results.
Implementing partial Product schema (missing AggregateRating, missing Offers): produces zero rich result lift.
Referencing a Wikidata Q ID that does not exist or has been deleted.

8.8 Robots.txt and AI Crawler Access

The robots.txt file controls which crawlers can access the site. Modern sites must explicitly grant access to AI search engine crawlers, not just traditional search bots. Full default robots.txt template is in Appendix C section 19.10.

Notes on the default policy:

ByteDance's Bytespider is commonly blocked due to aggressive crawl patterns and unclear data use.
Google-Extended controls Google's training data use. Allow it unless the client has a specific reason to opt out.
Each AI crawler must be listed explicitly. A blanket User-agent: * does not always cover newer bots.

8.9 The llms.txt Standard

An emerging standard (https://llmstxt.org) for explicitly signaling content available to LLMs. Place at root of site as /llms.txt. Format is markdown. Example template is in section 11.3.

This standard is not universally adopted by every AI engine yet, but adoption is growing and the cost of implementation is trivial.

8.10 Sitemaps

Every site has at minimum a primary sitemap.xml at the root, submitted to Google Search Console and Bing Webmaster Tools.

For sites with images and videos, separate image and video sitemaps are recommended via a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-05-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-images.xml</loc>
    <lastmod>2026-05-03</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-videos.xml</loc>
    <lastmod>2026-05-03</lastmod>
  </sitemap>
</sitemapindex>

Each <lastmod> value must reflect actual last modification.

8.11 Canonical URLs

Every page has a canonical URL. The canonical points to the preferred version of the URL when duplicates or near duplicates exist.

Joseph's standing decision on trailing slashes: trailing slash present on directories, absent on files. The rule is set sitewide and never mixed.

<link rel="canonical" href="https://example.com/services/web-development/">

8.12 Hreflang for Multi Language Sites

When the site serves multiple languages, every language version of a page declares its alternates:

<link rel="alternate" hreflang="en" href="https://example.com/services/web-development/">
<link rel="alternate" hreflang="es" href="https://example.com/es/servicios/desarrollo-web/">
<link rel="alternate" hreflang="x-default" href="https://example.com/services/web-development/">

Joseph's typical client base is monolingual English. This subsection applies only when target_languages in section 2 includes more than one entry.

8.13 Open Graph and Twitter Cards

Every page has Open Graph tags and Twitter Card tags. These do not affect Google ranking directly, but they do affect:

How the page renders when shared on Facebook, LinkedIn, Slack, Discord.
AI systems that consume social metadata for entity context.
Click through rates from social referrals.

Required Open Graph properties: og:title, og:description, og:url, og:image, og:type, og:site_name.
Required Twitter properties: twitter:card, twitter:title, twitter:description, twitter:image.

8.14 Phase 4 Completion Gate

Before stack specific build, confirm:

Every page passes Schema.org validator.
Every page passes Google Rich Results Test.
robots.txt allows the full AI crawler set per 8.8 (or has documented exclusions).
llms.txt is present at root.
sitemap.xml (or sitemap index) is present and submitted to GSC and Bing Webmaster Tools.
Every page has a canonical URL declared.
Open Graph and Twitter Card tags are present on every page.
The MEGAMIND sidecar (port 9090) is verified disabled on the host.

If gate fails, identify the failed step and remediate before proceeding.

9. Stack Specific Build Instructions

The agent applies exactly one of the following subsections, matching the stack assigned in section 4. Other subsections are reference only for that engagement.

9.1 Static HTML on Bubbles (DEFAULT PATH)

This is the default stack for the majority of Joseph's client work. Static HTML files served by nginx on the Bubbles server (Debian, public IP 169.155.162.118, Tailscale 100.90.97.104).

File layout:

/var/www/[client-domain]/
├── public_html/
│   ├── index.html
│   ├── services/
│   │   ├── index.html
│   │   ├── web-development/index.html
│   │   ├── seo/index.html
│   │   └── [other-service]/index.html
│   ├── pillars/
│   │   └── [pillar-slug]/index.html
│   ├── clusters/
│   │   └── [cluster-slug]/index.html
│   ├── about/index.html
│   ├── contact/index.html
│   ├── team/
│   │   └── [author-slug]/index.html
│   ├── assets/
│   │   ├── css/styles.css
│   │   ├── js/main.js
│   │   └── img/[images]
│   ├── robots.txt
│   ├── llms.txt
│   ├── sitemap.xml
│   └── favicon.ico
└── nginx.conf

Build steps:

Initialize directory structure:

   sudo mkdir -p /var/www/[client-domain]/public_html
   sudo chown -R www-data:www-data /var/www/[client-domain]

Create the page skeleton (per section 7.1) for each pillar and cluster page.
Inline JSON LD schema in the head of each page (per section 8). Use the @graph pattern. Reference Appendix C for the full code library.
Apply the marker based sub_filter pattern for any selective injection. The pattern uses HTML comments as triggers:

   <!-- TDG_SCHEMA_INSERT -->

This marker tells nginx where to inject any per page schema overrides. The default policy is no injection; schema is hand authored inline. The marker exists as a safety hatch for emergency sitewide updates.

Configure nginx site:

   server {
       listen 80;
       listen [::]:80;
       server_name [client-domain] www.[client-domain];
       return 301 https://[client-domain]$request_uri;
   }

   server {
       listen 443 ssl http2;
       listen [::]:443 ssl http2;
       listen 443 quic reuseport;
       listen [::]:443 quic reuseport;
       http3 on;
       add_header Alt-Svc 'h3=":443"; ma=86400';

       server_name [client-domain] www.[client-domain];

       ssl_certificate /etc/letsencrypt/live/[client-domain]/fullchain.pem;
       ssl_certificate_key /etc/letsencrypt/live/[client-domain]/privkey.pem;
       ssl_protocols TLSv1.2 TLSv1.3;

       root /var/www/[client-domain]/public_html;
       index index.html;

       brotli on;
       brotli_comp_level 6;
       brotli_types text/html text/css text/javascript application/javascript application/json application/ld+json image/svg+xml;

       gzip on;
       gzip_vary on;
       gzip_types text/html text/css text/javascript application/javascript application/json application/ld+json image/svg+xml;

       location ~* \.(jpg|jpeg|png|webp|avif|gif|svg|ico|css|js|woff|woff2|ttf|otf)$ {
           expires 30d;
           add_header Cache-Control "public, immutable";
           access_log off;
       }

       location ~* \.html$ {
           expires 1h;
           add_header Cache-Control "public, must-revalidate";
       }

       rewrite ^/([^.]+[^/])$ /$1/ permanent;

       add_header X-Frame-Options "SAMEORIGIN" always;
       add_header X-Content-Type-Options "nosniff" always;
       add_header Referrer-Policy "strict-origin-when-cross-origin" always;
       add_header Permissions-Policy "geolocation=(), microphone=(), camera=()" always;
       add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

       location / {
           try_files $uri $uri/ $uri.html =404;
       }

       location = /robots.txt {
           access_log off;
           log_not_found off;
       }

       location = /llms.txt {
           access_log off;
           log_not_found off;
       }

       location = /sitemap.xml {
           access_log off;
       }
   }

Validate and reload:

   sudo nginx -t && sudo systemctl reload nginx

Issue or renew SSL certificate:

   sudo certbot --nginx -d [client-domain] -d www.[client-domain]

For wildcard SSL on Joseph's primary domains, use the existing wildcard cert pattern.

Submit to search engines:
- Google Search Console: add property, verify via DNS or HTML file.
- Bing Webmaster Tools: add site, verify, import GSC settings.
- Yandex Webmaster: add site if Russian market relevant.
- Submit sitemap.xml to all three.
Footer credit: confirm the string "Crafted by ThatDeveloperGuy.com." appears in the footer of every page.

Post deployment validation:

curl -I https://[client-domain]/
curl https://[client-domain]/robots.txt
curl https://[client-domain]/sitemap.xml
curl https://[client-domain]/llms.txt

Demo site requirement:
Per Joseph's standing rules, demo sites are full static HTML, never JSX, never Python generated. Required tech stack:

Three.js for 3D and 4D effects.
GSAP plus ScrollTrigger for animations.
Anime.js for micro animations.
Canvas API for custom drawing.
SVG animations.
WebGL for advanced visuals.
CSS3 hardware accelerated animations.
Vanilla JavaScript only, no framework.
All loaded from CDN.

Demo sites must include 3D and 4D effects, custom cursor, magnetic buttons, parallax, 20 plus psychology tactics, 20 plus visual tactics, 20 plus marketing tactics. The STAGE-1-DEMO-BUILD-PROMPT and STAGE-2-PRODUCTION-BUILD-PROMPT files govern these specifications and supersede this section when both are present.

9.2 SvelteKit on Bubbles

Used when the client needs server side rendering, modern app feel, or significant interactivity. Reference build: Ernie Tackett (Globe Life AIL).

Initialize:

npm create svelte@latest [client-project]
cd [client-project]
npm install
npm install -D @sveltejs/adapter-node

Configure adapter for Bubbles deployment:
In svelte.config.js:

import adapter from '@sveltejs/adapter-node';

export default {
  kit: {
    adapter: adapter({
      out: 'build',
      precompress: true
    })
  }
};

Schema generation pattern:
Use server load functions to generate JSON LD at build or request time. Single source of truth per route in a +page.server.js file:

export async function load({ params, url }) {
  const pageData = await getPageData(params.slug);

  const schema = {
    "@context": "https://schema.org",
    "@graph": [
      // Organization, WebSite, WebPage, BreadcrumbList, Article
      // generated from pageData
    ]
  };

  return {
    pageData,
    schemaJsonLd: JSON.stringify(schema)
  };
}

In the corresponding +page.svelte:

<svelte:head>
  <title>{pageData.title}</title>
  <meta name="description" content={pageData.description} />
  <link rel="canonical" href={pageData.canonicalUrl} />
  {@html `<script type="application/ld+json">${schemaJsonLd}</script>`}
</svelte:head>

Sitemap generation:
Create src/routes/sitemap.xml/+server.js that builds the sitemap from the route registry at build time.

Deploy to Bubbles:

npm run build
rsync -avz build/ root@[bubbles]:/var/www/[client-domain]/app/

Run as a systemd service behind nginx reverse proxy. nginx config:

location / {
    proxy_pass http://127.0.0.1:[port];
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

End with nginx -t && systemctl reload nginx.

9.3 Next.js or Astro on Bubbles

Used when the client needs static site generation with React or framework agnostic islands. Astro is preferred over Next.js for content heavy sites because of its zero JavaScript by default approach, which is closer to the static HTML ideal.

Astro initialization:

npm create astro@latest [client-project]
cd [client-project]
npx astro add sitemap

Schema component pattern:
Create src/components/SchemaJsonLd.astro:

---
const { schema } = Astro.props;
---
<script type="application/ld+json" set:html={JSON.stringify(schema)} />

Use in pages:

---
import SchemaJsonLd from '../components/SchemaJsonLd.astro';
const schema = {
  "@context": "https://schema.org",
  "@graph": [ /* ... */ ]
};
---
<html>
<head>
  <SchemaJsonLd schema={schema} />
</head>
</html>

Build and deploy:

npm run build
rsync -avz dist/ root@[bubbles]:/var/www/[client-domain]/public_html/

Then validate and reload nginx as in 9.1.

9.4 Hugo and Other SSG

Used for content heavy sites where the client wants markdown authoring and fast builds.

Hugo initialization:

hugo new site [client-project]
cd [client-project]
git init
git submodule add https://github.com/[theme-repo] themes/[theme-name]

Schema partial:
Create layouts/partials/schema.html:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Organization",
      "@id": "{{ .Site.BaseURL }}#organization",
      "name": "{{ .Site.Params.businessName }}",
      "url": "{{ .Site.BaseURL }}",
      "logo": "{{ .Site.BaseURL }}{{ .Site.Params.logo }}"
    },
    {{ if .IsPage }}
    {
      "@type": "Article",
      "headline": "{{ .Title }}",
      "datePublished": "{{ .Date.Format "2006-01-02T15:04:05Z07:00" }}",
      "dateModified": "{{ .Lastmod.Format "2006-01-02T15:04:05Z07:00" }}",
      "author": { "@type": "Person", "name": "{{ .Params.author }}" }
    }
    {{ end }}
  ]
}
</script>

Include in layouts/_default/baseof.html head:

{{ partial "schema.html" . }}

Deploy: hugo builds to public/, rsync to Bubbles, reload nginx.

9.5 WordPress

Used when the client requires a CMS for non technical editors, or already has substantial WordPress investment. The default plugin stack:

Yoast SEO or Rank Math Pro for SEO basics, schema generation, and sitemap.
WP Rocket for caching when WordPress is on a non Bubbles host. On Bubbles, use the nginx caching layer instead.
Imagify or ShortPixel for image optimization.
Wordfence for security baseline.

Theme requirements:

Block based theme or hybrid theme (avoid legacy classic themes for new builds).
Theme must support custom block patterns for the page skeleton in section 7.1.
Theme must allow inline JSON LD injection in head and per page basis.

Schema strategy:
Yoast and Rank Math both generate base Organization, WebSite, BreadcrumbList, and Article schema. Override and supplement when necessary using Code Snippets plugin or a child theme functions.php:

function add_custom_jsonld() {
    if (is_singular('post') || is_page()) {
        $custom_schema = array(
            '@context' => 'https://schema.org',
            '@type' => 'Service',
            'name' => get_post_meta(get_the_ID(), '_service_name', true),
            'provider' => array(
                '@type' => 'Organization',
                '@id' => home_url('/#organization')
            )
        );
        echo '<script type="application/ld+json">' . wp_json_encode($custom_schema) . '</script>';
    }
}
add_action('wp_head', 'add_custom_jsonld');

Caching on Bubbles WordPress:
nginx FastCGI cache configuration:

fastcgi_cache_path /var/cache/nginx/fastcgi levels=1:2 keys_zone=WORDPRESS:100m max_size=1g inactive=60m;
fastcgi_cache_key "$scheme$request_method$host$request_uri";

location ~ \.php$ {
    fastcgi_cache WORDPRESS;
    fastcgi_cache_valid 200 60m;
    fastcgi_pass unix:/run/php/php8.2-fpm.sock;
    include fastcgi_params;
}

End with validate and reload.

9.6 Headless Shopify

Used for e commerce clients with custom frontend needs, large catalogs, or strong design requirements. Reference build: Sara White (Blue Paradise Dairy).

Architecture:

Shopify backend for product data, inventory, checkout, payments.
Custom frontend on Bubbles (typically SvelteKit or Astro) consuming Shopify Storefront API.
Schema injected at build time from Shopify product data.

Storefront API setup:

Generate Storefront API access token in Shopify admin.
Install @shopify/storefront-api-client in the frontend project.

Schema mapping:
Map Shopify metafields to Product schema fields. In the frontend product page:

const productSchema = {
  "@context": "https://schema.org",
  "@type": "Product",
  "name": product.title,
  "description": product.description,
  "image": product.images.map(img => img.url),
  "sku": product.variants[0].sku,
  "brand": {
    "@type": "Brand",
    "name": product.vendor
  },
  "offers": {
    "@type": "Offer",
    "url": canonicalUrl,
    "priceCurrency": "USD",
    "price": product.priceRange.minVariantPrice.amount,
    "availability": product.availableForSale ? "https://schema.org/InStock" : "https://schema.org/OutOfStock"
  },
  "aggregateRating": product.metafields?.reviews ? {
    "@type": "AggregateRating",
    "ratingValue": product.metafields.reviews.average,
    "reviewCount": product.metafields.reviews.count
  } : undefined
};

Faceted navigation canonicals:
Faceted product list pages must canonical to the unfaceted version to prevent index bloat and cannibalization.

<link rel="canonical" href="https://example.com/collections/all-products" />

9.7 Standard Shopify

Used for e commerce clients who want the full Shopify experience without a custom frontend.

Theme:

Use a current generation theme (Dawn or a Dawn derivative).
Customize via theme editor and Liquid template overrides.

SEO apps:

JSON-LD for SEO for advanced schema beyond what Shopify generates by default.
Smart SEO for bulk meta tag management.

Critical Shopify caveats:

Shopify generates Product schema by default, but it is often incomplete (missing AggregateRating, missing Brand). Augment with custom schema via theme template.
Collection pages cannibalize each other in many themes. Audit and either differentiate, canonicalize, or noindex secondary collection pages.
Tag pages (/collections/all/tagged-X) should default to noindex unless specifically optimized.
Shopify's robots.txt is partially editable as of 2024. Add custom directives in robots.txt.liquid.

Custom domain and SSL:
Shopify managed. Joseph's role is configuring the domain, not the SSL.

9.8 Custom Backend

Used rarely. Documented for completeness when a client has an existing custom application that the framework must integrate into.

Approach:

Treat the custom backend as a content source.
Apply the schema, structure, and tracking patterns from Phases 3 and 4 to the rendered output.
If server side rendering is in use, inject schema in the head template.
If client side rendering only, prerender critical pages or use a hybrid approach (Next.js, Nuxt, etc.).

Crawlability check:
Verify that AI crawlers can read the rendered content. Test with:

curl -A "GPTBot" https://[domain]/[page-path]
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1)" https://[domain]/[page-path]

If the response is JavaScript shell only, the agent must implement server side rendering or static prerendering before the page can earn citations. ChatGPT bot visits begin in reading mode (plain HTML, no JavaScript) about 46 percent of the time.

10. Off Page Authority and Brand Mentions

Purpose: earned signals beyond the website that drive AI citation rates. Distribution can lift AI citations by up to 325 percent versus owned site only.

10.1 The 2026 Off Page Stack

Every off page authority program covers seven channels:

Earned media (PR). Press mentions in industry publications and local news.
Podcast appearances. Both as guest and as host.
YouTube presence. Own channel plus third party mentions.
Reddit and forum participation. Authentic, value first.
LinkedIn article publication. Long form thought leadership.
Industry directory citations. Local citations for local businesses; industry directories for vertical authority.
Wikipedia and Wikidata entity establishment. The strongest off page entity signal available.

YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility per multiple 2025 to 2026 studies.

10.2 Earned Media Targets

For Joseph's typical SDVOSB and small business clients:

3 to 5 earned media mentions per 12 month period as a minimum.
Targets: local news (KY3, KOLR10, NWA Democrat Gazette, regional business journals), trade publications relevant to the client's vertical, podcast appearances at minimum quarterly.
HARO replacement (Connectively, Featured, Qwoted, Help A B2B Writer) for source requests.

10.3 YouTube Strategy

For every client where the budget supports it:

Own channel: minimum quarterly publication. Topic mix of educational, behind the scenes, and customer story.
Schema: every video gets VideoObject schema on the page where embedded.
Title and description optimized with primary keyword in first 60 characters of title and first 150 characters of description.
Transcript published on the host page for AI extraction.
Closed captions on every video.

Third party YouTube mentions:

Pitch industry channels for guest appearances or interviews.
Get clients featured in case study videos.
Sponsor relevant niche channels when budget supports.

10.4 Reddit Strategy

Reddit is heavily indexed by Google AI Overviews and ChatGPT search. Authentic, value first participation drives both citations and mentions.

Rules:

Never spam. Never drop links without context.
Comment helpfully on relevant subreddits for 2 to 4 weeks before sharing any branded content.
When sharing, include genuine context and acknowledge the connection to the brand.
Maintain a brand account that participates as a recognizable voice over months.

10.5 Wikipedia and Wikidata

The strongest entity authority signal available. Wikidata Q ID is referenced by every major search engine and AI system for entity resolution.

Joseph's existing Wikidata entities:

Joseph Anady: Q138610626
MEGAMIND: Q138610666

For client work:

Wikipedia article: requires notability. Possible for clients with clear notability (multi state operations, significant earned media, regulatory or community leadership). Most small businesses do not qualify.
Wikidata entity: lower bar than Wikipedia. Can be created for any business with verifiable identity. Provides Q ID that can be referenced in sameAs of Organization schema.

The agent does not create speculative Wikipedia articles (conflict of interest, will be deleted). The agent does help clients establish Wikidata entities where appropriate.

10.6 Citation Building

Local citation building for local businesses:

Google Business Profile (mandatory).
Apple Business Connect.
Bing Places.
Yelp.
Facebook Business Page.
Instagram Business Account.
LinkedIn Company Page.
BBB profile.
Industry specific directories (HomeAdvisor for home services, Avvo for legal, Healthgrades for medical, etc.).
Local chamber of commerce.

Each citation:

NAP (Name, Address, Phone) must be consistent across every directory.
Categories must match across directories.
Hours must match across directories.
Photos uploaded.
Description includes primary keyword and service area.

For SDVOSB clients (Joseph's primary focus): also list on SAM.gov, Vets First Verification Program, and SDVOSB specific directories.

10.7 Off Page Tracking

The agent tracks off page activity in a quarterly review:

off_page_quarter:
  quarter: "2026-Q2"
  earned_media_mentions:
    - publication: ""
      url: ""
      date: ""
      type: ""                            # quote, feature, byline, mention
  podcast_appearances:
    - podcast: ""
      episode_url: ""
      date: ""
  youtube_videos_published: 0
  youtube_third_party_mentions: 0
  reddit_threads_with_brand: 0
  linkedin_articles_published: 0
  new_directory_citations: 0
  wikipedia_status: ""
  wikidata_status: ""

11. AI Crawler Access and Governance

Detailed reference for the robots.txt and llms.txt patterns introduced in section 8.8 and 8.9.

11.1 Known AI Crawlers (Q2 2026)

Crawler	Operator	Purpose	Default Policy
Googlebot	Google	Classic search index	Allow
Google-Extended	Google	Training data and AI features	Allow
GoogleOther	Google	Various Google products	Allow
Bingbot	Microsoft	Bing search index	Allow
BingPreview	Microsoft	Bing preview generation	Allow
MSN Bot	Microsoft	Legacy MSN	Allow
GPTBot	OpenAI	ChatGPT training	Allow
OAI-SearchBot	OpenAI	ChatGPT live search	Allow
ChatGPT-User	OpenAI	User initiated browsing within ChatGPT	Allow
PerplexityBot	Perplexity	Perplexity index and answers	Allow
Perplexity-User	Perplexity	User initiated browsing	Allow
ClaudeBot	Anthropic	Claude training and search	Allow
Anthropic-AI	Anthropic	Anthropic services	Allow
Claude-User	Anthropic	User initiated browsing within Claude	Allow
Meta-ExternalAgent	Meta	Meta AI training and agent browsing	Allow
FacebookExternalHit	Meta	Open Graph fetch for sharing	Allow
Applebot	Apple	Siri, Spotlight, Apple Intelligence	Allow
Applebot-Extended	Apple	Apple AI training	Allow
Amazonbot	Amazon	Alexa	Allow
YandexBot	Yandex	Russian search index	Allow
Baiduspider	Baidu	Chinese search index	Allow
DuckDuckBot	DuckDuckGo	DuckDuckGo index supplement	Allow
Bytespider	ByteDance	TikTok and ByteDance products	Disallow (default)
Diffbot	Diffbot	Knowledge graph extraction	Allow
CCBot	Common Crawl	Open dataset	Allow (operator preference)

When the operator overrides the default, document the override in the engagement notes with reasoning.

11.2 The Reading Mode Reality

About 46 percent of ChatGPT bot visits begin in reading mode: a plain HTML version of the page with no images, no CSS, no JavaScript, and no client side rendered content. This means:

Critical content must render in server side HTML.
Schema in JSON LD must be present in the initial HTML response, not injected by JavaScript.
Navigation and internal linking must work without JavaScript.
Single page applications without server side rendering or prerendering will be invisible to AI crawlers nearly half the time.

For Joseph's static HTML default stack, this is automatic. For SvelteKit and Next.js, server side rendering must be enabled. For pure client side React or Vue applications, the agent flags this as a critical fix before proceeding.

11.3 llms.txt Implementation

Place at /llms.txt on the root of the site. The format is markdown.

Example for ThatDeveloperGuy.com:

# ThatDeveloperGuy

> Veteran owned web development and digital optimization agency based in
> Cassville Missouri serving Northwest Arkansas and Southwest Missouri.
> SDVOSB certified. Specializes in self hosted production websites with
> full SEO, AEO, AIO, and GEO stack implementation.

## Service Lines

- [Web Development Services](https://thatdeveloperguy.com/services/web-development/): Custom static, SvelteKit, and headless Shopify builds with full SEO and AI search optimization.
- [Search Engine Optimization](https://thatdeveloperguy.com/services/seo/): On page, technical, content, and off page SEO with quarterly reporting.
- [Answer Engine Optimization](https://thatdeveloperguy.com/services/aeo/): Featured snippet capture, People Also Ask wins, and voice assistant readiness.
- [AI Overview Optimization](https://thatdeveloperguy.com/services/aio/): Citation eligibility for Google AI Overviews and AI Mode.
- [Generative Engine Optimization](https://thatdeveloperguy.com/services/geo/): ChatGPT, Perplexity, Claude, and Bing Copilot citation building.

## Frameworks and References

- [SEO Search Appearance Framework](https://thatdeveloperguy.com/frameworks/seo-search-appearance/): The full 2026 framework this agency operates from.
- [SEO BUILD REFERENCE v2.4](https://thatdeveloperguy.com/frameworks/seo-build-reference/): The 14 tier Engine Optimization stack.
- [Stage 1 Demo Build Prompt](https://thatdeveloperguy.com/frameworks/stage-1-demo/): Cinematic single file demo specification.
- [Stage 2 Production Build Prompt](https://thatdeveloperguy.com/frameworks/stage-2-production/): Full production build specification.

## About

- [About Joseph Anady](https://thatdeveloperguy.com/about/): SDVOSB, BA Computer Engineering CSU, MA Cybersecurity, martial arts coach and competitor.
- [Contact](https://thatdeveloperguy.com/contact/): admin@thatdeveloperguy.com, 505.512.3662.

This file is regenerated whenever a major pillar is added or restructured. Quarterly review at minimum.

11.4 Server Log Analysis

The agent periodically samples server logs to confirm AI crawlers are visiting and successful. Sample bash check:

sudo tail -10000 /var/log/nginx/access.log | \
  grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Bingbot|YandexBot" | \
  awk '{print $1, $7, $9, $12}' | sort | uniq -c | sort -rn | head -50

Review monthly. Anomalies (sudden spike, sudden drop, 404s on AI bot requests) trigger investigation.

12. Citation and Mention Tracking

Purpose: measure visibility on each surface independently and report the right metrics to clients. This section also contains the deep dive on the Google Search Console Performance report dimensions (queries, pages, countries, devices, search appearance).

12.1 Tracking Surfaces

Track these surfaces separately:

Google classic SERP (rankings, impressions, clicks, CTR, average position).
Google AI Overviews (impressions, citations, mentions).
Google AI Mode (citations, mentions).
Bing classic SERP.
Bing Copilot.
ChatGPT search.
Perplexity.
Claude with web access.
DuckDuckGo (if relevant volume).
Yandex (if relevant market).
Baidu (if relevant market).
Brave Search.

12.2 Tracking Tools

Google Search Console (primary, free):

Performance report covers all five dimensions: Queries, Pages, Countries, Devices, Search Appearance.
AI Overview impressions are included in overall web search type as of 2024 update.
Daily data export via API or Looker Studio connector for historical archive (GSC retains 16 months in interface).

Bing Webmaster Tools (primary, free):

Equivalent dimensions and metrics.
Bing Copilot signals partially exposed in newer dashboard.
Import GSC settings during setup to skip duplicate verification work.

Yandex Webmaster (when relevant):

Russian language analytics.

Baidu Zhanzhang (when relevant):

Chinese search analytics. Requires ICP license for full functionality.

Manual sampling protocol (weekly, mandatory):

For each of the top 10 priority queries, run the query in: Google search, Google AI Mode, ChatGPT, Perplexity, Claude with web access, Bing Copilot.
Record citation status (cited, mentioned without citation, not present).
Record AI Overview presence.
Record changes from prior week.

Third party tools (when budget allows):

Semrush AI Toolkit: AI Overview tracking.
Ahrefs AI Search: citation tracking across multiple AI engines.
Profound: enterprise AI visibility platform.
Geoptie: citation tracking platform.

12.3 The GSC Performance Report Deep Dive

This is the most important free tool for tracking SEO and AEO performance. The framework requires the operator to read this subsection in full. The screenshot in the engagement intake shows the standard interface: Queries, Pages, Countries, Devices, Search Appearance, and Days tabs across the table view, with Search Type, Date Range, and Add Filter controls above.

The report has four metrics across five dimensions plus filters.

The four metrics:

Clicks. Number of times a user clicked through to the property from Google Search results. Going back to the SERP and clicking again counts as one click only. Clicks that stay inside Google Search (e.g., to the knowledge panel) do not count.
Impressions. Number of times the site appeared in Search results. The link must be scrolled into view or visible to count, depending on the type of search element. A knowledge graph with multiple aggregated data items is one impression at the property level.
CTR (Click Through Rate). Clicks divided by impressions. Reported as a percentage.
Average position. Average position of the topmost result from the property for that query, page, country, or device. Position 1 is the top organic result. Below position 30 is generally page 4 or worse. Note: AI Overview citations affect impressions and clicks but the position metric for AI Overview placement is not directly comparable to organic ranking position.

The five dimensions:

Queries. What users typed (or said via voice) into Google Search. Query data is anonymized when fewer than a few dozen users searched for it over two to three months. Anonymized queries are not shown in the Queries tab. Aggregation: by property.
Pages. Which canonical URL on the site appeared in results. The canonical is the URL Google chose to display, which may not be the URL the user lands on if there are redirects. Aggregation: by page.
Countries. Originating country of the searcher. Helps validate whether the property is reaching the intended geography. Aggregation: by property.
Devices. Desktop, Mobile, or Tablet. Mobile is dominant for most consumer sites. Aggregation: by property.
Search Appearance. The SERP feature the result appeared in. Examples: AMP_BLUE_LINK, FAQ_RICH_RESULT, JOB_LISTING, RECIPE, REVIEW_SNIPPET, SITELINKS_SEARCHBOX, VIDEO, WEBLITE, MERCHANT_LISTINGS, PRODUCT_SNIPPETS. The full list expands as Google adds features. Aggregation: by page. This dimension cannot be combined with any other dimension in a single API query; you must run a two step query.

Filters:

Search type. Web (default), Image, Video, News.
Date range. Up to 16 months historical. 24 hour, 7 day, 28 day, 3 month, 6 month, 12 month, 16 month, custom.
Query filter. Contains, equals, does not contain, regex.
Page filter. Contains, equals, does not contain, exact, regex.
Country filter. Single or multi country.
Device filter. Single device.
Search appearance filter. Single or multi feature.

Aggregation rules:

Data grouped by Queries, Countries, Devices, or Dates is aggregated by property.
Data grouped by Pages or Search Appearance is aggregated by page.
This is why chart totals can differ from table totals when switching dimensions.

Limits:

1,000 rows shown in the interface per query.
25,000 rows per API page request.
16 months historical retention.
Anonymized queries dropped when filtering by query.

12.4 Working the Queries Dimension

What it answers: which keywords are bringing the site to user attention?

Workflow:

Filter by date range (default: last 3 months).
Sort by impressions descending. Take top 50.
For each query: is the page that appeared in results the page we wanted to appear? If not, this is a sign of either cannibalization or content mis assignment. Flag for Phase 2 review.
Sort by clicks descending. Take top 50. These are the actual traffic drivers. Confirm they map to the keyword to page map.
Filter for queries with high impressions and low CTR (above 1,000 impressions, below 1.5 percent CTR). These are title and description optimization candidates.
Filter for queries with average position 8 to 20. These are quick win candidates: page two pushes that can move to page one with content depth and on page improvements.
Filter by query contains "?" or "what" or "how" or "why". These are AEO and AIO candidates.
Compare branded versus non branded query share. Branded share above 50 percent suggests the site needs more non branded discovery work. Branded share below 20 percent suggests brand building is needed.

Worked example using the handledtax.com screenshot pattern:

The screenshot shows 8 queries on handledtax.com over 3 months, all variations of the same question about quarterly estimated taxes:

do estimated taxes have to be paid quarterly
what are quarterly estimated tax payments
are quarterly estimated taxes mandatory
do quarterly taxes have to be paid on time
do i need to pay quarterly taxes
do you have to pay estimated taxes quarterly
do i have to make estimated tax payments
do quarterly estimated taxes have to be equal

Every query has 1 impression and 0 clicks.

What this pattern means:

The same underlying question is being asked in 8 different ways. This is exactly what query fan out looks like at the user level.
Each variant has only 1 impression, meaning the site appeared exactly once for each phrasing. The site is on the edge of visibility for these questions.
Zero clicks across all variants means the position is too low or the snippet is not compelling enough to earn a click.
The 8 queries represent a single sub query intent: "are quarterly estimated tax payments required?" That sub query has clear search demand (otherwise GSC would not be capturing impressions even at 1 each).

Recommended action:

Build one comprehensive cluster page on the topic "quarterly estimated tax payments" under the relevant pillar (likely the small business or self employed tax pillar).
Page structure: 40 to 60 word lede directly answering "are estimated taxes required to be paid quarterly?", followed by H2 questions covering each of the 8 variants and their underlying nuances (timing, equal payments, mandatory thresholds, who is exempt).
Schema: Article with full author attribution per section 7.8. The author should have credentials in tax preparation (Amanda Emerdinger holds PTIN, which makes her a qualified author for this content).
Internal link from the homepage to this page, plus from the relevant service pages.
Submit URL via GSC URL Inspection tool after publication.
Re check in 30 days. The 8 variant queries should consolidate impressions onto the new page, and CTR should rise as ranking position improves.

This is exactly the workflow Phase 2 (cluster mapping) is designed to trigger from GSC data.

12.5 Working the Pages Dimension

What it answers: which canonical URLs are doing the work?

Workflow:

Sort by clicks descending. The top 20 pages drive most of the value. Keep them updated and well linked.
Sort by impressions descending. Pages with high impressions and low CTR are extraction targets: they are already showing up, just not earning the click.
Filter for pages not in your priority list. These are unintentional ranking pages. Sometimes they are golden (a thin blog post that accidentally ranks well, opportunity to expand). Sometimes they are noise (tag pages, search result pages, parameter URLs that should be canonicalized away).
Sort by average position ascending (best first). The pages closest to position 1 are defending. Confirm they have current information and strong linking.
Compare year over year for content decay detection. Pages with declining position over 6 to 12 months need content refreshes.
Cross reference with the Pillar and Cluster map from Phase 2. Every priority pillar and cluster page should appear in this report. If a priority page is absent, indexation is broken (check robots.txt, canonical tags, and Index Coverage report).

12.6 Working the Countries Dimension

What it answers: who geographically is finding the site?

Workflow:

Confirm the top countries match the intended service area. For Joseph's typical client, expect United States dominant with possibly Canada, Mexico, and a few English speaking countries showing minor traffic.
If high impressions are coming from countries the client does not serve, this can be an opportunity (international expansion) or noise (bot traffic, geographic anomalies). Investigate.
If the client serves multiple countries, segment performance by country to find regional weakness or strength.
Compare country distribution by language version of the site (when hreflang is in use).
For local service businesses (Joseph's NWA and SW Missouri client base), state level targeting matters more than country. GSC does not break down state level by default in the interface; export data to GA4 or use IP geolocation analysis for state segmentation.

12.7 Working the Devices Dimension

What it answers: how is performance differing by mobile, desktop, and tablet?

Workflow:

Confirm mobile traffic is the majority for consumer sites. For B2B, desktop can be larger.
Compare mobile CTR versus desktop CTR. If mobile CTR is significantly lower, suspect mobile UX issues, slow load times, or mobile specific design problems.
Compare mobile average position versus desktop. They should be similar; large gaps suggest mobile usability issues affecting ranking.
Filter top queries by device. Are the same queries driving traffic on both, or are mobile and desktop workloads diverging?
Use the comparison filter to compare mobile current period versus prior period. Sudden mobile drops often correlate with Core Web Vitals regressions.

12.8 Working the Search Appearance Dimension

What it answers: which SERP features is the site earning, and how are they performing?

Workflow:

List all search appearances with at least one impression. Compare to the schema implementation. If a schema type is implemented but no corresponding search appearance is showing, investigate (validation failure, eligibility issue, or just not chosen by Google).
Compare search appearance CTR. Some appearances drive better CTR (Recipe, Video, Sitelinks, Merchant Listings) than others (basic blue link).
After implementing a new schema type, check the Search Appearance report two to four weeks later to confirm the feature is appearing.
Note that AMP_BLUE_LINK numbers should be near zero by 2026 (AMP is largely deprecated). If significant AMP impressions remain, the site needs an AMP migration.
Note that some features inflate metrics. Google for Jobs counts both "job listing" and "job details" as clicks; only "job details" actually goes to the site.

To query Search Appearance via API: Search Appearance is not available alongside other dimensions in a single API request. Run a two step query: first request dimensions: [searchAppearance] only to see all appearance types with data; then run a second request filtering by a specific appearance type and adding any other dimensions.

12.9 Working the Days Dimension

The Days tab is for time series analysis. Workflow:

Identify volatility in clicks or impressions correlating with known events (Google core updates, content publication dates, schema deployment dates).
Use the comparison filter to compare current 28 day window to prior 28 day window. Highlight pages or queries with greater than 30 percent change.
For sites with strong seasonality (tax services like handledtax.com, retail), overlay year over year data rather than month over month.

12.10 Reporting Cadence and Format

Per section 2 reporting_to_client_cadence, the agent produces a report at the agreed cadence.

Standard monthly report structure:

# [Client Name] SEO and Search Appearance Report
## [Reporting Period]

## Executive Summary
[3 to 5 bullet points covering: total clicks, total impressions, citation count, key wins, key risks]

## Traffic Performance
[Table: clicks, impressions, CTR, average position, with month over month and year over year deltas]

## Top Performing Queries
[Table of top 20 queries by clicks]

## Top Performing Pages
[Table of top 20 pages by clicks]

## Search Appearance Performance
[Breakdown by search appearance type]

## Country and Device Performance
[Breakdown by country, breakdown by device, with deltas]

## AI Citation Status
[Manual sampling results: which queries earned citations on which surfaces]

## Brand Mention Volume
[Off page mentions detected during the period]

## Wins This Period
[Specific accomplishments with before and after metrics]

## Recommendations for Next Period
[3 to 7 prioritized actions for the next reporting cycle]

## Appendix: Raw Data
[Links to GSC export, full citation log, etc.]

For clients with the $397 monthly Full Visibility Stack tier, the report includes the full raw data appendix. For lower tiers, the appendix is summary only.

13. Information Gain Production

Purpose: create the original assets that earn citations and differentiate the site from competitors.

13.1 Why Information Gain Matters

AI search engines are increasingly hostile to derivative content. When ten sites publish the same paraphrase of the same source, AI engines pick the original and ignore the rest. Information gain is the only durable moat in 2026 SEO.

Joseph's portfolio gives him unusual leverage here. With 130 plus production websites across multiple verticals, Joseph can produce benchmarks and surveys that nobody else can match. The framework requires using this leverage.

13.2 The Information Gain Asset Types

Asset types ranked by leverage and citation potential:

Annual benchmark report. Aggregate data across the portfolio (or across the client's vertical). Published once per year, refreshed annually. Highest citation magnet.
Original survey. Even 30 to 50 responses produces citable data. Quarterly cadence is reasonable.
Internal case study with verifiable numbers. Pull a real client engagement (with permission), document the before, the work, and the after with specific metrics.
Proprietary framework or methodology. This framework itself is an asset. Frameworks become reference resources.
Calculator or interactive tool. Hosted on the site, available without registration. Drives both citations and conversions.
First hand experience narrative. "I used X for 60 days, here is what I learned." Strongest E E A T signal possible.
Unique side by side comparison. Compare three or more options on a dimension nobody else has measured.
Public dataset visualization. Take a public dataset and present it in a way nobody else has. Government data, academic data, FOIA data.

13.3 Production Cadence

For a client on the $997 build plus $397 monthly tier:

One major information gain asset per quarter.
Two to three minor information gain elements per month embedded in cluster pages.
One annual flagship asset (benchmark or survey).

For a client on the $597 build plus $250 monthly tier:

One major information gain asset per year.
One minor information gain element per month embedded in cluster pages.

For a client on the $2,997 enterprise tier:

One major information gain asset per month.
One minor information gain element per cluster page.
Two annual flagship assets.

13.4 Promotion of Information Gain Assets

Producing the asset is half the work. Distribution is the other half.

For every major asset:

Press release to local and trade media.
LinkedIn article summarizing key findings.
Twitter or X thread with the top 5 takeaways.
Reddit submission to relevant subreddits with substantive context.
Email to the client's full subscriber list.
Outreach to 10 to 20 industry publications offering exclusive angles.
Social media graphics for each top finding.
Update llms.txt to point AI engines to the asset.

The asset should earn 3 to 5 earned media mentions in the first 90 days post publication. If it does not, the topic was wrong or the promotion was insufficient.

14. Surface Specific Optimization

Purpose: apply the surface specific tweaks that improve performance on each platform. This section covers Google's classic SERP, AI Overviews, AI Mode, plus Bing, DuckDuckGo, Yandex, Baidu, Brave Search, ChatGPT, Perplexity, Claude with web access, Bing Copilot, voice search, and image and video carousels.

14.1 Google Classic SERP

Google holds about 90 percent of global search market share. Every site optimizes for Google first, then layers other engines on top.

Specific tactics:

Title tag: 50 to 60 characters, primary keyword in first 30 characters, brand at end.
Meta description: 150 to 160 characters, includes primary keyword, ends with implicit or explicit call to action.
Heading hierarchy: one H1, multiple H2, H3 nested within H2, no H1 inside body content.
Internal linking density: every important page has at least 3 inbound internal links.
External linking: cite authoritative sources where they support claims. Do not link out gratuitously.
301 redirects on every URL change. Never 302 for permanent moves.
Submit URL changes via the URL Inspection tool.

14.2 Google AI Overviews

Heavily favors pages already ranking in the top 10 organic, but 38 percent of citations now come from outside the top 10. AI Overviews appear on about 48 percent of all queries in Q1 2026, and over 70 percent of informational and how to queries.

Tactics:

Strong Phase 3 page structure (lede, headings as questions, dual extraction layer).
Schema accuracy and completeness per Phase 4.
Information gain elements per section 13.
Author attribution and visible E E A T signals.
Avoid first person plural "we" voice when possible; institutional authority reads better as "the company" or third person.
Update content cadence: every cited page reviewed and updated at least every 90 days.

14.3 Google AI Mode

Favors broad topical coverage over single high performing URLs. Personal Intelligence integration since January 2026 means Gmail and Calendar context can shape responses. Runs on Gemini 3 Pro. Has 75 million daily active users.

Tactics:

Pillar and cluster architecture with 70 plus percent fan out coverage (per Phase 2).
Comprehensive coverage of edge cases and adjacent topics.
Strong entity signals (Wikidata, Organization schema, sameAs links).
Multi format content within the cluster (text, video, images).
Do not assume Personal Intelligence will favor your site. Build for the public retrieval path.

14.4 ChatGPT Search

Different ranking signals than Google. Specific findings:

Pages with semantically relevant title and URL slug are more likely to get cited.
Prefers focused shorter content. Pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent.
92 percent of the time, ChatGPT agents rely on the Bing Search API, so Bing visibility matters here specifically.
46 percent of ChatGPT bot visits begin in reading mode (plain HTML).
63 percent of ChatGPT agents leave immediately after landing (high bounce rate).

Tactics:

Bing Webmaster Tools verification mandatory.
Plain HTML rendering must work (no SPA without prerendering).
URL slugs include primary keyword.
Keep content focused per page; do not overload with unrelated subtopics.

14.5 Perplexity

Strong source citation, prefers academic and authoritative sources, favors recency.

Tactics:

Cite primary sources liberally (academic papers, government data, industry reports).
Date stamps on every published page.
Author credentials prominent.
Avoid affiliate heavy or thinly sourced content; Perplexity downranks it.

14.6 Claude with Web Access

Less publicly studied. Sample manually for the client's priority queries.

Tactics:

General GEO best practices apply.
Strong factual accuracy and citation discipline.
Avoid promotional language; Claude prefers analytical and factual phrasing.

14.7 Bing Copilot

Tied closely to Bing core ranking. Microsoft Copilot integration across Windows, Office, and Edge browser increases Bing's reach beyond raw search market share numbers.

Tactics:

Bing Webmaster Tools verification.
Bing prefers official site signals and social media presence (LinkedIn, X presence helps).
Bing has a stronger preference for institutional sources than Google.
Optimization for Bing carries over to DuckDuckGo (which uses Bing's index in part).

14.8 Bing Classic

Bing holds about 4.98 percent global market share, higher in US (about 7 percent), and is the default search on Windows and Edge browsers. When combined with Yahoo (which uses Bing) and DuckDuckGo (which partially uses Bing), the Bing ecosystem reaches about 13 percent of US search.

Specific Bing differences from Google:

Stronger weight on exact keyword matches in title and content.
Higher preference for institutional sources (.gov, .edu, .org).
Lower diversity of sources surfaced per query (fewer unique domains in top 10 versus Google).
Greater weight on social signals (LinkedIn, X presence).
Microsoft Advertising for paid search has typically lower CPCs than Google Ads.

Tactics:

Submit sitemaps to Bing Webmaster Tools.
Maintain active LinkedIn Company Page with regular posts.
Maintain active X (Twitter) presence.
Earn institutional citations (industry associations, .gov references when applicable, .edu mentions).

14.9 DuckDuckGo

DuckDuckGo holds about 0.76 percent globally and 1.84 percent in the US. About 100 million daily searches. Privacy focused with no user tracking or personalization.

DuckDuckGo pulls results from Bing's index plus its own crawler (DuckDuckBot) plus hundreds of other sources including Wikipedia and Wolfram Alpha. There is no separate DuckDuckGo SEO playbook beyond Bing optimization, but a few specifics matter:

Since DuckDuckGo does not personalize, on page relevance is more important than for personalized engines.
DuckDuckGo's own DuckDuckBot crawler must be allowed in robots.txt.
DuckDuckGo Lite (the text only version) is parsed by ChatGPT bots in reading mode, so Bing and DuckDuckGo visibility cascade into AI engine visibility.

14.10 Yandex

Yandex holds about 1.34 percent globally, 65 to 72 percent in Russia and several CIS countries.

Skip unless the client serves Russian language markets.

If relevant:

Yandex Webmaster verification.
Yandex Metrica analytics (separate from Google Analytics).
Russian language content with proper transliteration.
Local Yandex Maps presence for local businesses in Russia.
Yandex Direct for paid search (lower CPCs than Google in Russian markets).
Domain age, content freshness, and user behavior signals (CTR, dwell time) are weighted heavily in Yandex's ranking algorithm per the 2023 leaked ranking factors.

14.11 Baidu

Baidu holds about 0.55 percent globally, 53 plus percent in China.

Skip unless the client serves Chinese language markets.

If relevant:

Baidu Zhanzhang (Webmaster Tools) verification.
ICP license required for hosting in China for full Baidu support.
Simplified Chinese content.
Domain hosted on China based or China optimized infrastructure (latency from US hosted sites hurts Baidu rankings).
Baidu Tongji analytics.
Avoid heavy JavaScript reliance; Baidu's crawler is less capable than Googlebot at rendering JavaScript.

14.12 Brave Search

Brave Search has its own independent 30 billion page index since 2023. About 50 million daily queries. Privacy focused, used heavily by Brave browser users.

Tactics:

Submit URLs via Brave Search Webmaster Tools.
General SEO best practices apply.
Smaller share but growing among privacy minded users, particularly in tech and crypto verticals.

14.13 Voice Search and Smart Speakers

Voice queries skew long tail and conversational. Optimizing for AEO directly addresses voice search.

Tactics:

Lede answer in 40 to 60 words sounds natural read aloud.
Avoid jargon and acronyms in the lede.
Use natural language throughout (do not write for the keyword stuffing spider).
Phone number and address visible in plain text on contact pages (voice assistants extract these directly).
LocalBusiness schema with geo coordinates accurate.

14.14 Image and Video Carousels

Both are search appearance types worth tracking separately.

For images:

ImageObject schema on every hero image.
Descriptive alt text and filename.
Width and height attributes.
Modern formats (WebP, AVIF) with fallbacks.
Image sitemap.

For videos:

VideoObject schema on every page that embeds video.
Hosting via YouTube for maximum AI engine visibility (multiple AI engines preferentially cite YouTube).
Self hosted video as a secondary option only.
Transcript published on the host page.
Closed captions in the video file.
Video sitemap.

14.15 Local Pack and Knowledge Panel

For local businesses, the Local Pack (the map of three local results) and the Knowledge Panel (the right side panel with business details) are the most valuable Google surfaces.

Tactics for Local Pack:

Google Business Profile complete and verified.
Categories aligned with primary services.
Hours accurate.
Photos updated quarterly.
Reviews actively requested and responded to.
LocalBusiness schema with geo coordinates matching GBP.
NAP consistency across all directory citations.
Service area defined accurately.

Tactics for Knowledge Panel:

Strong Organization schema with all properties filled.
Wikidata Q ID linked via sameAs.
Wikipedia article when notability allows.
Consistent name and branding across web (Wikipedia, social, GBP, Bing Places).
Structured social links.

15. Audit Mode

Purpose: evaluate any site (fully built, partially built, or freshly inherited) against the framework. Output: pass, partial, fail, or N/A on each criterion plus remediation steps.

15.1 When to Run Audit Mode

Initial engagement with a new client to establish baseline.
Quarterly health check on retainer clients.
Post Google core update verification (March, July, October, December cadence has been typical).
Pre handoff QA after a Phase 4 build.
Any time GSC shows a sudden 20 percent or worse drop in clicks or impressions.

15.2 Audit Inputs

Required:

Domain to audit.
GSC access (read at minimum).
Bing Webmaster Tools access (read at minimum).
File system access for self hosted sites (read at minimum).
Hosting environment information.

Optional but improves audit quality:

Google Analytics 4 access.
Server log access.
Ahrefs or Semrush access.
Direct access to the CMS or static site source.

15.3 The 50 Criterion Audit

The audit is organized into five pillars: Eligibility, Coverage, Extractability, Distribution, and Measurement. Each criterion produces one of four results: PASS, PARTIAL, FAIL, or N/A. Each criterion has remediation steps when it does not pass.

Pillar A: Eligibility (12 criteria)

A1. Robots.txt allows the full AI crawler set.
Method: curl https://[domain]/robots.txt. Confirm GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, Google-Extended, Bingbot, YandexBot, Baiduspider, Applebot, Amazonbot, Meta-ExternalAgent are not blocked.
Pass: all listed crawlers allowed.
Partial: most allowed, one or two missing or unintentionally blocked.
Fail: significant AI crawlers blocked.
Remediation: update robots.txt per section 8.8 and Appendix C 19.10.

A2. No CDN or proxy is interfering with crawler access.
Method: check curl -I https://[domain]/ for Cloudflare, Akamai, Fastly headers.
Pass: no third party CDN in front.
Fail: Cloudflare or other proxy detected (against Joseph's standing rules).
Remediation: remove proxy, route through Bubbles nginx directly.

A3. SSL is active and current.
Method: curl -vI https://[domain]/ 2>&1 | grep -E "subject:|start date:|expire date:".
Pass: valid certificate, expiring more than 30 days out.
Partial: valid but expiring within 30 days.
Fail: invalid, expired, or self signed.
Remediation: certbot renewal or new issuance.

A4. HTTP/3 (QUIC) is supported.
Method: curl --http3 https://[domain]/ -I (requires curl with HTTP/3 support).
Pass: HTTP/3 negotiation succeeds.
Partial: HTTP/2 only.
Fail: HTTP/1.1 only.
Remediation: enable QUIC in nginx per section 9.1 config.

A5. Mobile renders correctly without JavaScript.
Method: curl -A "Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36" https://[domain]/[priority-page] | grep -i "<h1\|<main\|<article".
Pass: critical content present in initial HTML.
Fail: empty or shell only HTML.
Remediation: implement server side rendering or static prerendering.

A6. Core Web Vitals pass on mobile.
Method: PageSpeed Insights at https://pagespeed.web.dev/ using the priority page URL.
Pass: LCP under 2.5s, INP under 200ms, CLS under 0.1.
Partial: one metric in needs improvement zone.
Fail: any metric in poor zone.
Remediation: image optimization, render blocking script reduction, layout stability fixes.

A7. Sitemap.xml is present and current.
Method: curl https://[domain]/sitemap.xml | head -50.
Pass: valid XML, lastmod dates within last 90 days for active pages.
Partial: present but stale lastmod.
Fail: missing or invalid.
Remediation: regenerate sitemap; submit to GSC and Bing.

A8. Llms.txt is present.
Method: curl -I https://[domain]/llms.txt.
Pass: returns 200 with valid content.
Partial: returns 200 but content is stale or thin.
Fail: returns 404.
Remediation: create llms.txt per section 11.3.

A9. Robots.txt is consistent with sitemap.
Method: cross reference robots.txt and sitemap.xml.
Pass: all sitemap URLs are crawlable per robots.txt.
Fail: sitemap includes URLs blocked by robots.txt.
Remediation: reconcile.

A10. Canonical tags are present and correct on every priority page.
Method: spot check 10 priority pages for <link rel="canonical" href="...">.
Pass: all 10 have correct self referencing canonical (or correct cross page canonical for variants).
Fail: missing or incorrect canonicals.
Remediation: add or fix canonicals.

A11. Author schema and visible attribution on content pages.
Method: spot check 10 content pages for visible author byline AND Article author field.
Pass: 8 of 10 or better have both.
Partial: byline present but no schema, or vice versa.
Fail: neither present.
Remediation: implement author bios and Person schema per section 7.8.

A12. Last updated dateModified is accurate.
Method: spot check 10 pages: compare dateModified in schema to actual last edit date.
Pass: dateModified within 30 days of actual edit date on 8 of 10.
Partial: dateModified within 90 days.
Fail: dateModified is stale on most pages.
Remediation: implement automated dateModified updates per stack.

Pillar B: Coverage (10 criteria)

B1. Pillar architecture exists.
Method: review pillars.yaml or equivalent documentation.
Pass: at least one pillar with 8 plus cluster pages.
Partial: pillar identified but cluster count low.
Fail: no pillar architecture.
Remediation: complete Phase 2.

B2. Each pillar has a 3,000 to 5,000 word pillar page.
Method: spot check word count on each pillar URL.
Pass: all pillars meet word count.
Partial: most do.
Fail: pillars are thin.
Remediation: expand pillar pages per Phase 3.

B3. Cluster pages are 800 to 2,500 words.
Method: spot check 10 cluster URLs.
Pass: 8 of 10 in range.
Partial: average is in range but variance high.
Fail: cluster pages are thin or bloated inappropriately.
Remediation: rewrite outliers.

B4. Sub query coverage rate is 70 percent or higher across top 10 priority pillars.
Method: review sub query coverage map.
Pass: 70 percent or higher.
Partial: 50 to 69 percent.
Fail: below 50 percent.
Remediation: add cluster pages for unanswered sub queries.

B5. No primary keyword is targeted by two pages.
Method: GSC export plus url to keyword map cross reference.
Pass: no cannibalization.
Partial: one or two flagged cases under remediation.
Fail: multiple unresolved cannibalization cases.
Remediation: complete Phase 2 cannibalization remediation.

B6. Every cluster page links back to its pillar.
Method: spot check 10 cluster pages for outbound link to pillar with descriptive anchor.
Pass: all 10.
Partial: most have link, anchor text generic.
Fail: cluster pages do not link to pillar.
Remediation: implement internal linking plan.

B7. Pillar links to all cluster pages.
Method: review pillar pages for outbound links.
Pass: pillar links to all clusters.
Partial: pillar links to most.
Fail: pillar links to few or none.
Remediation: expand pillar internal linking.

B8. Breadcrumbs are present on every page.
Method: spot check 10 pages for visible breadcrumb navigation AND BreadcrumbList schema.
Pass: 9 of 10 or better.
Fail: breadcrumbs missing on most pages.
Remediation: implement breadcrumbs per template.

B9. Each priority page includes information gain.
Method: spot check 10 priority pages for at least one information gain element in first 30 percent.
Pass: 7 of 10 or better.
Partial: 4 to 6 of 10.
Fail: 3 or fewer.
Remediation: build information gain assets per section 13.

B10. Internal linking density meets target.
Method: crawl the site and compute average inbound internal links per priority page.
Pass: average 3 plus inbound links per priority page.
Partial: average 2 to 3.
Fail: average below 2.
Remediation: expand internal linking systematically.

Pillar C: Extractability (12 criteria)

C1. Every priority page has a 40 to 60 word lede.
Method: spot check 10 priority pages. Count words in the first paragraph.
Pass: 9 of 10 in range.
Partial: 7 of 10 in range.
Fail: most have preamble before answer, or lede is too short or too long.
Remediation: rewrite ledes per section 7.2.

C2. Lede contains the primary keyword in the first sentence.
Method: spot check 10 priority pages.
Pass: 9 of 10.
Fail: keyword absent from first sentence.
Remediation: rewrite ledes.

C3. Every H2 is phrased as a question matching a sub query.
Method: spot check 10 priority pages. Read H2 list.
Pass: 80 percent of H2s are questions.
Partial: 50 to 79 percent.
Fail: most H2s are nouns or generic labels.
Remediation: rewrite H2s per section 7.4.

C4. Lists use semantic HTML, not styled divs.
Method: view source on 10 priority pages.
Pass: all lists are <ul> or <ol>.
Fail: lists are styled divs.
Remediation: refactor to semantic HTML.

C5. Tables use semantic HTML.
Method: view source.
Pass: all tabular content uses <table>, <thead>, <tbody>, <tr>, <th>, <td>.
Fail: tables are styled divs.
Remediation: refactor.

C6. The 30 percent rule is honored on priority pages.
Method: identify the citable answer on each priority page. Confirm it appears within the first third of the page text.
Pass: 8 of 10.
Fail: most pages bury the answer.
Remediation: restructure pages.

C7. Sections are self contained.
Method: read 10 random sections from priority pages out of context.
Pass: each section reads as a complete, standalone answer.
Fail: sections require context from earlier sections.
Remediation: rewrite to remove cross references.

C8. Schema is present on every priority page.
Method: spot check 10 priority pages. Look for <script type="application/ld+json">.
Pass: 10 of 10.
Fail: missing on some pages.
Remediation: implement schema per Phase 4.

C9. Schema validates against Schema.org and Google Rich Results Test.
Method: run Google Rich Results Test on 10 priority pages.
Pass: 9 of 10 pass with no errors.
Partial: warnings present but no errors.
Fail: errors present.
Remediation: fix schema errors.

C10. Schema content matches visible page content.
Method: spot check fields like price, name, address, dateModified against visible content.
Pass: full alignment on 9 of 10.
Fail: schema describes content not visible on page.
Remediation: align schema with visible content.

C11. No deprecated schema types present.
Method: scan pages for HowTo, Practice Problem, Q and A, Book Action, Course Info, Estimated Salary, Vehicle Listing, Special Announcement, Claim Review, Learning Video.
Pass: none present, OR present but documented as intentional.
Fail: deprecated types present and producing zero rich result lift.
Remediation: remove or replace deprecated schema.

C12. Open Graph and Twitter Card tags present.
Method: spot check 10 priority pages.
Pass: og:title, og:description, og:url, og:image, twitter:card present on all.
Fail: missing on most pages.
Remediation: add tags.

Pillar D: Distribution (8 criteria)

D1. Earned media count meets target.
Method: review off page tracking log for last 12 months.
Pass: 3 plus earned media mentions.
Partial: 1 or 2.
Fail: 0.
Remediation: launch earned media outreach program.

D2. YouTube presence active.
Method: confirm own YouTube channel exists with content from last 90 days; confirm at least one third party mention.
Pass: own channel active plus third party mention.
Partial: one without the other.
Fail: no YouTube presence.
Remediation: launch YouTube content cadence.

D3. Reddit and forum mentions exist.
Method: search Reddit for client brand, capture mentions in last 12 months.
Pass: 3 plus authentic mentions.
Partial: 1 or 2.
Fail: 0.
Remediation: launch authentic Reddit participation.

D4. LinkedIn article cadence active.
Method: review LinkedIn for client author publishing.
Pass: at least 1 article per quarter.
Fail: no LinkedIn publishing.
Remediation: build LinkedIn editorial calendar.

D5. Wikipedia entity exists or has been formally proposed.
Method: search Wikipedia for the brand.
Pass: live Wikipedia article exists.
Partial: Wikipedia draft exists or notability is being built.
Fail: no Wikipedia presence and no plan.
Remediation: assess notability; if eligible, draft article.

D6. Wikidata entity exists.
Method: search Wikidata for the brand.
Pass: Wikidata Q ID assigned and linked from Organization schema sameAs.
Partial: Q ID exists but not linked from schema.
Fail: no Wikidata entity.
Remediation: create Wikidata entity per section 10.5.

D7. Local citation parity (NAP consistency) for local businesses.
Method: compare Google Business Profile, Apple Business Connect, Bing Places, Yelp, Facebook for NAP consistency.
Pass: full NAP match across all five.
Partial: minor differences (formatting, abbreviation).
Fail: significant differences (different addresses, different phone numbers).
Remediation: standardize NAP across all directories.

D8. Industry directory citations match the vertical.
Method: review citation profile against industry standard directory list.
Pass: client is listed on appropriate vertical specific directories.
Partial: some directories present, others missing.
Fail: no vertical specific citations.
Remediation: build out citation profile per vertical.

Pillar E: Measurement (8 criteria)

E1. Google Search Console verified.
Method: log in to GSC, confirm property exists.
Pass: verified.
Fail: not verified.
Remediation: verify GSC property.

E2. Bing Webmaster Tools verified.
Method: log in to BWT.
Pass: verified.
Fail: not verified.
Remediation: verify and import GSC settings.

E3. Google Analytics 4 active.
Method: confirm GA4 property exists with active data.
Pass: GA4 active with data flowing.
Partial: GA4 exists but data is sparse or misconfigured.
Fail: no GA4 or no data.
Remediation: implement or fix GA4.

E4. Google Business Profile active and verified.
Method: search GBP for the client.
Pass: verified, with hours, photos, and recent reviews.
Partial: verified but incomplete.
Fail: not claimed or not verified.
Remediation: claim and complete GBP.

E5. Manual citation sampling protocol active.
Method: confirm weekly sampling log exists.
Pass: sampling log active and current.
Partial: sampling log exists but is intermittent.
Fail: no sampling.
Remediation: implement weekly sampling.

E6. AI Overview tracking active.
Method: confirm GSC Performance reporting captures AI Overview impressions; confirm tracking tool (Semrush, Ahrefs, manual) is in use.
Pass: tracking active across at least Google AI Overviews and Bing Copilot.
Partial: GSC only.
Fail: no AI Overview tracking.
Remediation: set up tracking.

E7. Reporting cadence is on schedule.
Method: review reporting log against agreed cadence.
Pass: reports delivered on time for last 3 cycles.
Partial: 1 or 2 missed.
Fail: regular missed reports.
Remediation: re establish cadence.

E8. Citation rate is being measured and reported.
Method: review reports for citation rate metric.
Pass: citation rate appears in reports with month over month delta.
Partial: citation count appears but not rate.
Fail: no citation tracking in reports.
Remediation: add citation rate to reporting template.

15.4 Audit Output Format

The agent produces two outputs: a markdown report for the client, and a JSON report for programmatic handling.

Markdown report structure:

# SEO and Search Appearance Audit
## [Client Name]
## [Audit Date]

## Executive Summary
- Overall score: X of 50 PASS, X PARTIAL, X FAIL, X N/A
- Critical issues: [count and one line summary of the top 3]
- Quick wins: [count and one line summary of the top 3]

## Pillar Scores
- Eligibility: X of 12
- Coverage: X of 10
- Extractability: X of 12
- Distribution: X of 8
- Measurement: X of 8

## Detailed Findings
### Pillar A: Eligibility
[For each criterion: status, evidence, remediation steps]

### Pillar B: Coverage
[same pattern]

### Pillar C: Extractability
[same pattern]

### Pillar D: Distribution
[same pattern]

### Pillar E: Measurement
[same pattern]

## Prioritized Remediation Plan
[Top 10 actions in priority order, with effort estimate and expected impact]

## Appendix: Audit Methodology
[Brief explanation of the framework version and methods used]

JSON report structure:

{
  "audit_metadata": {
    "framework_version": "2.0",
    "domain": "",
    "audited_at": "",
    "auditor": "",
    "engagement_id": ""
  },
  "summary": {
    "total_criteria": 50,
    "pass": 0,
    "partial": 0,
    "fail": 0,
    "na": 0,
    "score_percent": 0
  },
  "pillar_scores": {
    "eligibility": { "max": 12, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
    "coverage": { "max": 10, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
    "extractability": { "max": 12, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
    "distribution": { "max": 8, "pass": 0, "partial": 0, "fail": 0, "na": 0 },
    "measurement": { "max": 8, "pass": 0, "partial": 0, "fail": 0, "na": 0 }
  },
  "criteria": [
    {
      "id": "A1",
      "pillar": "eligibility",
      "name": "Robots.txt allows the full AI crawler set",
      "status": "PASS",
      "evidence": "",
      "remediation": [],
      "effort_hours": 0,
      "impact": "high"
    }
  ],
  "prioritized_remediation": []
}

15.5 Partial Install Audit

When the operator declares the engagement as "partial install" (the framework was started but not completed by a prior team), the audit follows the same 50 criterion structure, but:

N/A is heavily used for criteria that depend on prior phases not yet complete.
The remediation plan is structured as a phased re completion plan rather than 50 independent fixes.
Output includes a clear "starting point" summary identifying which phase the client is currently in.

15.6 Audit Mode Phase Gate

Audit mode does not have a forward gate (the audit is the deliverable). It does have an exit criterion:

All 50 criteria evaluated.
Both markdown and JSON reports produced.
Prioritized remediation plan delivered.
Client has reviewed and accepted the audit.

16. Maintenance Schedule

Purpose: the framework is not a one time implementation. Maintenance is what compounds the gains over time.

16.1 Weekly

Manual citation sampling on top 10 priority queries across Google AI Overviews, Google AI Mode, ChatGPT, Perplexity, Claude, Bing Copilot.
Spot check schema validity on any pages modified in the last 7 days.
Review GSC for new query opportunities (queries with 10 plus impressions and no current page assignment).
Review GSC for sudden CTR drops (greater than 30 percent week over week on any priority page).
Triage and respond to any Search Console manual actions or warnings.

16.2 Monthly

Update keyword to page map with any new GSC queries that have hit 10 plus impressions during the month.
Run cannibalization audit against latest 30 days of GSC data.
Refresh dateModified on pages with substantive content updates during the month.
Check for newly deprecated schema properties announced in Search Central blog.
Server log analysis for AI crawler behavior anomalies.
Bing Webmaster Tools and Yandex Webmaster review (if applicable).
Off page activity log update.
Client report production per cadence.

16.3 Quarterly

Full sub query coverage audit per pillar.
Topic cluster health review (consolidate, delete, or reroute thin pages).
Off page brand mention count refresh.
Page structure review of top 20 cited or near cited pages.
Wikidata entity refresh.
llms.txt content refresh.
Re run audit mode against all 50 criteria.
Internal linking density spot check.
Information gain asset publication (per tier cadence).

16.4 After Every Google Core Update

Re evaluate top 20 pages for citation rate changes.
Identify any pages that lost rich results to deprecation.
Check schema property updates announced in Search Central blog.
Compare ranking volatility to industry benchmarks.
Adjust strategy if a pattern emerges (E E A T strengthened, freshness weighted higher, etc.).

16.5 Annually

Full author bio and credential refresh across all content.
Domain pillar architecture review.
Annual flagship information gain asset publication.
Framework version bump if Google or major LLMs have shipped meaningful changes.
Client retainer pricing review.
Hosting and infrastructure capacity review.

17. Appendix A: Deprecated Schema Reference

Schema types deprecated as of March 2026. Listed for reference; do not implement on new pages.

Schema Type	Deprecated	Reason	Replacement
HowTo	September 2023	Rich result removed from desktop and mobile	Article with ordered list, no rich result expected
Practice Problem	January 2026	Limited adoption	Article or Quiz schema
Dataset (general search)	January 2026	Now only serves Dataset Search	Keep if relevant for Dataset Search
Sitelinks Search Box	January 2026	Integrated into core search	None needed
SpecialAnnouncement	January 2026	COVID era specific	Event schema if applicable
Q and A	January 2026	Overlap with FAQPage and Forum	FAQPage if appropriate vertical
Book Actions	January 2026	Low adoption	Product or Article schema
Course Info	January 2026	Low adoption	Article or Event schema
Claim Review	January 2026	Restricted to fact checkers	Article schema
Estimated Salary	January 2026	Low adoption	JobPosting with baseSalary
Learning Video	January 2026	Replaced by VideoObject	VideoObject
Vehicle Listing	January 2026	Low adoption	Product schema

Note: removing deprecated schema does not improve rankings (no penalty for keeping it). Removal is housekeeping. The action that matters is not implementing these on new pages.

18. Appendix B: Sub Query Generation Templates

Reusable prompts for generating fan out sub queries.

18.1 General Purpose Sub Query Prompt

Generate 15 likely sub queries that an AI search system would run
when given the prompt: "[PRIMARY_KEYWORD]".

Cover these angles when applicable:
- Definition: what is X
- Cost or pricing: how much does X cost
- Comparison: X versus Y, X alternatives
- How to: how to do X, step by step X
- When: when should I do X, when does X happen
- Where: where do I get X, where can X be done
- Who: who needs X, who provides X
- Why: why does X matter, why does X happen
- Recent changes: latest X, X in 2026, X updates
- Pros and cons: benefits of X, downsides of X
- Common mistakes: X mistakes, errors with X
- Local variations: X near me, X in [location]
- Regulatory considerations: X law, X compliance
- Examples: X example, sample X
- Outcome and expectations: results of X, what to expect from X

Output as a JSON array of strings, one sub query per element.
Plain text only, no commentary, no explanation.

18.2 Local Service Business Variant

Generate 15 likely sub queries that an AI search system would run
when a user in [CITY], [STATE] asks about: "[PRIMARY_KEYWORD]".

Cover these angles:
- Local cost variations
- Local provider names and recommendations
- Local regulations and licensing
- Local hours and availability
- Local emergency or same day options
- Service area boundaries
- Insurance and payment specific to the region
- Local reviews and reputation
- Comparison to nearby cities
- Distance and travel considerations

Output as a JSON array of strings.

18.3 E Commerce Variant

Generate 15 likely sub queries that an AI search system would run
when a user is shopping for: "[PRODUCT_KEYWORD]".

Cover these angles:
- Best for use case: best X for [scenario]
- Comparison shopping: X vs Y, X alternatives
- Sizing and fit (when applicable)
- Material and construction
- Warranty and support
- Reviews and durability
- Where to buy: cheapest, fastest shipping, in stock
- Used or refurbished options
- Compatible accessories
- Common defects or known issues
- Return policy considerations
- Brand reputation
- Price drops or sales

Output as a JSON array of strings.

18.4 YMYL Variant (Legal, Medical, Financial)

Generate 15 likely sub queries that an AI search system would run
when a user asks about a YMYL topic: "[YMYL_KEYWORD]".

Cover these angles:
- Definition with disclaimers
- Symptoms, signs, or indicators (medical) or warning signs (legal, financial)
- Causes
- Treatment, remedies, or solutions
- Prevention
- When to consult a professional
- Costs and insurance
- Regulations and legal considerations
- Risks of self diagnosis or self help
- Reputable sources and second opinions
- Recent research or rulings
- Statistics and prevalence

Note: YMYL content requires strong E E A T signals.
Recommend professional consultation in answers.
Cite primary authoritative sources (NIH, AMA, IRS, court rulings, etc.).

Output as a JSON array of strings.

18.5 Tax and Accounting Variant (Worked Example for handledtax.com Pattern)

Generate 15 likely sub queries that an AI search system would run
when a user asks about: "[TAX_KEYWORD]".

Cover these angles:
- Definition and basic rules
- Who must comply (income thresholds, filing status)
- When payments or filings are due (specific dates and quarters)
- How payments are calculated
- Payment methods (electronic, check, IRS Direct Pay)
- Penalties for non compliance and how they accrue
- Safe harbor rules and exemptions
- State versus federal differences
- Self employed versus W 2 employee differences
- How to estimate when income is variable
- What forms are required (1040 ES, Schedule SE, etc.)
- Common filing mistakes
- When to consult a tax professional
- Recent IRS guidance and rule changes
- Examples for different income scenarios

Output as a JSON array of strings.

19. Appendix C: Code Snippet Library

Reusable code blocks referenced from sections 7 through 14. Copy and adapt; do not modify in place in this document.

19.1 Organization Schema with @graph Pattern (Universal)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Organization",
      "@id": "https://example.com/#organization",
      "name": "Example Business Inc",
      "alternateName": "Example",
      "url": "https://example.com/",
      "logo": {
        "@type": "ImageObject",
        "url": "https://example.com/assets/img/logo.png",
        "width": 600,
        "height": 200
      },
      "description": "[One sentence description of the business]",
      "telephone": "+1-555-555-5555",
      "email": "info@example.com",
      "address": {
        "@type": "PostalAddress",
        "streetAddress": "123 Main Street",
        "addressLocality": "Cassville",
        "addressRegion": "MO",
        "postalCode": "65625",
        "addressCountry": "US"
      },
      "geo": {
        "@type": "GeoCoordinates",
        "latitude": 36.6781,
        "longitude": -93.8722
      },
      "sameAs": [
        "https://www.facebook.com/examplebiz",
        "https://www.linkedin.com/company/examplebiz",
        "https://twitter.com/examplebiz",
        "https://www.youtube.com/@examplebiz",
        "https://www.wikidata.org/wiki/QXXXXXXX"
      ],
      "founder": { "@type": "Person", "name": "Founder Name" },
      "foundingDate": "2020-01-15",
      "areaServed": [
        { "@type": "State", "name": "Missouri" },
        { "@type": "State", "name": "Arkansas" }
      ]
    },
    {
      "@type": "WebSite",
      "@id": "https://example.com/#website",
      "url": "https://example.com/",
      "name": "Example Business",
      "publisher": { "@id": "https://example.com/#organization" }
    },
    {
      "@type": "WebPage",
      "@id": "https://example.com/page-slug/#webpage",
      "url": "https://example.com/page-slug/",
      "name": "Page Title",
      "isPartOf": { "@id": "https://example.com/#website" },
      "datePublished": "2026-01-15T08:00:00-06:00",
      "dateModified": "2026-05-03T14:30:00-05:00"
    }
  ]
}
</script>

19.2 LocalBusiness Schema (Service Area Business)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ProfessionalService",
  "@id": "https://example.com/#localbusiness",
  "name": "Example Professional Service",
  "image": "https://example.com/assets/img/office.jpg",
  "telephone": "+1-555-555-5555",
  "priceRange": "$$",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "123 Main Street",
    "addressLocality": "Cassville",
    "addressRegion": "MO",
    "postalCode": "65625",
    "addressCountry": "US"
  },
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": 36.6781,
    "longitude": -93.8722
  },
  "url": "https://example.com/",
  "openingHoursSpecification": [
    {
      "@type": "OpeningHoursSpecification",
      "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
      "opens": "08:00",
      "closes": "17:00"
    }
  ],
  "areaServed": [
    {
      "@type": "GeoCircle",
      "geoMidpoint": {
        "@type": "GeoCoordinates",
        "latitude": 36.6781,
        "longitude": -93.8722
      },
      "geoRadius": "80467"
    }
  ]
}
</script>

19.3 Person Schema (Author or Team Member)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "@id": "https://example.com/team/joseph-anady/#person",
  "name": "Joseph Anady",
  "givenName": "Joseph",
  "familyName": "Anady",
  "url": "https://example.com/team/joseph-anady/",
  "image": "https://example.com/assets/img/team/joseph-anady.jpg",
  "jobTitle": "Founder and Lead Developer",
  "worksFor": { "@id": "https://example.com/#organization" },
  "alumniOf": [
    {
      "@type": "CollegeOrUniversity",
      "name": "Colorado State University"
    }
  ],
  "hasCredential": [
    {
      "@type": "EducationalOccupationalCredential",
      "credentialCategory": "degree",
      "name": "BA Computer Engineering"
    },
    {
      "@type": "EducationalOccupationalCredential",
      "credentialCategory": "degree",
      "name": "MA Cybersecurity"
    },
    {
      "@type": "EducationalOccupationalCredential",
      "credentialCategory": "certification",
      "name": "Service Disabled Veteran Owned Small Business (SDVOSB)"
    }
  ],
  "sameAs": [
    "https://www.linkedin.com/in/josephanady",
    "https://www.wikidata.org/wiki/Q138610626"
  ]
}
</script>

19.4 Article Schema (Content Page)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "@id": "https://example.com/blog/article-slug/#article",
  "headline": "[Article Headline, under 110 characters]",
  "description": "[Article description, under 250 characters]",
  "image": ["https://example.com/assets/img/article-hero.jpg"],
  "datePublished": "2026-05-03T08:00:00-05:00",
  "dateModified": "2026-05-03T14:30:00-05:00",
  "author": { "@id": "https://example.com/team/joseph-anady/#person" },
  "publisher": { "@id": "https://example.com/#organization" },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.com/blog/article-slug/"
  },
  "articleSection": "[Section name]",
  "keywords": ["keyword one", "keyword two", "keyword three"],
  "wordCount": 2400,
  "inLanguage": "en-US"
}
</script>

19.5 Product Schema (E Commerce)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "@id": "https://example.com/products/product-slug/#product",
  "name": "Product Name",
  "image": [
    "https://example.com/assets/img/product-1.jpg",
    "https://example.com/assets/img/product-2.jpg"
  ],
  "description": "[Product description]",
  "sku": "PROD-12345",
  "mpn": "MFR-67890",
  "brand": { "@type": "Brand", "name": "Brand Name" },
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/products/product-slug/",
    "priceCurrency": "USD",
    "price": "997",
    "priceValidUntil": "2026-12-31",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "seller": { "@id": "https://example.com/#organization" }
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.8",
    "reviewCount": "47",
    "bestRating": "5",
    "worstRating": "1"
  }
}
</script>

19.6 Service Schema with Tier Catalog

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Service",
  "@id": "https://example.com/services/web-development/#service",
  "name": "Custom Website Development",
  "description": "[Description of the service]",
  "provider": { "@id": "https://example.com/#organization" },
  "areaServed": [
    { "@type": "State", "name": "Missouri" },
    { "@type": "State", "name": "Arkansas" }
  ],
  "serviceType": "Web Development",
  "offers": {
    "@type": "Offer",
    "priceCurrency": "USD",
    "price": "997"
  },
  "hasOfferCatalog": {
    "@type": "OfferCatalog",
    "name": "Web Development Tiers",
    "itemListElement": [
      {
        "@type": "Offer",
        "itemOffered": { "@type": "Service", "name": "Custom Website" },
        "price": "597",
        "priceCurrency": "USD"
      },
      {
        "@type": "Offer",
        "itemOffered": { "@type": "Service", "name": "Website plus SEO and AEO" },
        "price": "797",
        "priceCurrency": "USD"
      },
      {
        "@type": "Offer",
        "itemOffered": { "@type": "Service", "name": "Full Digital Presence" },
        "price": "997",
        "priceCurrency": "USD"
      }
    ]
  }
}
</script>

19.7 BreadcrumbList Schema

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://example.com/" },
    { "@type": "ListItem", "position": 2, "name": "Services", "item": "https://example.com/services/" },
    { "@type": "ListItem", "position": 3, "name": "Web Development" }
  ]
}
</script>

Note: the last item has no item property because it represents the current page.

19.8 VideoObject Schema

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "[Video Title]",
  "description": "[Video Description]",
  "thumbnailUrl": "https://example.com/assets/img/video-thumb.jpg",
  "uploadDate": "2026-05-03T08:00:00-05:00",
  "duration": "PT5M30S",
  "contentUrl": "https://example.com/assets/video/video.mp4",
  "embedUrl": "https://www.youtube.com/embed/VIDEO_ID",
  "publisher": { "@id": "https://example.com/#organization" }
}
</script>

19.9 FAQPage Schema (Use Sparingly per 8.1)

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "[Question text]",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "[Answer text, full sentence answer]"
      }
    }
  ]
}
</script>

Reminder: as of March 2026, FAQ rich results are restricted primarily to government and authoritative health sites. Other sites still benefit from FAQPage schema as an AI trust signal, but should not expect a SERP rich result.

19.10 Robots.txt Default

# robots.txt for [domain]
# Last updated: [YYYY-MM-DD]

User-agent: Googlebot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: YandexBot
Allow: /

User-agent: Baiduspider
Allow: /

User-agent: Applebot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: CCBot
Allow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/
Disallow: /cart/

Sitemap: https://[domain]/sitemap.xml

19.11 Llms.txt Template

# [Site Name]

> [One sentence description of the site and its core authority area]

## [Primary Section, e.g., Service Lines]

- [Title 1](https://[domain]/url-1/): [Brief description]
- [Title 2](https://[domain]/url-2/): [Brief description]

## [Secondary Section, e.g., Resources]

- [Title 3](https://[domain]/url-3/): [Brief description]

## About

- [About Page](https://[domain]/about/): [One sentence summary]
- [Contact](https://[domain]/contact/): [contact info]

19.12 Sitemap Index XML

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://[domain]/sitemap-pages.xml</loc>
    <lastmod>YYYY-MM-DD</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://[domain]/sitemap-images.xml</loc>
    <lastmod>YYYY-MM-DD</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://[domain]/sitemap-videos.xml</loc>
    <lastmod>YYYY-MM-DD</lastmod>
  </sitemap>
</sitemapindex>

19.13 Standard HTML Page Template (Static Stack)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>[Primary Keyword] | [Brand Name]</title>
  <meta name="description" content="[150 to 160 character description with primary keyword in first 90 characters]">
  <link rel="canonical" href="https://[domain]/[path]/">

  <meta property="og:title" content="[Title]">
  <meta property="og:description" content="[Description]">
  <meta property="og:url" content="https://[domain]/[path]/">
  <meta property="og:image" content="https://[domain]/assets/img/og-image.jpg">
  <meta property="og:type" content="article">
  <meta property="og:site_name" content="[Brand Name]">

  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:title" content="[Title]">
  <meta name="twitter:description" content="[Description]">
  <meta name="twitter:image" content="https://[domain]/assets/img/twitter-card.jpg">

  <!-- Sitewide Organization, WebSite, BreadcrumbList @graph (see 19.1) -->
  <!-- Page specific Article schema (see 19.4) -->

  <link rel="stylesheet" href="/assets/css/styles.css">
  <link rel="icon" type="image/png" href="/favicon.ico">
</head>
<body>
  <header>
    <!-- Site nav -->
  </header>

  <nav aria-label="Breadcrumb">
    <ol>
      <li><a href="/">Home</a></li>
      <li><a href="/[pillar]/">[Pillar Name]</a></li>
      <li aria-current="page">[Current Page]</li>
    </ol>
  </nav>

  <main>
    <article>
      <h1>[Primary Keyword Phrased as a Headline]</h1>

      <p class="lede"><strong>[40 to 60 word direct answer]</strong></p>

      <p>[2 to 3 sentence expansion]</p>

      <h2>[Question matching sub query 1]</h2>
      <p>[40 to 60 word answer]</p>
      <p>[Supporting context]</p>

      <h2>[Question matching sub query 2]</h2>
      <ol>
        <li>[Step 1]</li>
        <li>[Step 2]</li>
        <li>[Step 3]</li>
      </ol>

      <h2>[Question matching sub query 3]</h2>
      <table>
        <thead>
          <tr><th>Feature</th><th>Option A</th><th>Option B</th></tr>
        </thead>
        <tbody>
          <tr><td>[Row]</td><td>[Value]</td><td>[Value]</td></tr>
        </tbody>
      </table>

      <h2>Frequently Asked Questions</h2>
      <h3>[FAQ 1]</h3>
      <p>[Answer]</p>
      <h3>[FAQ 2]</h3>
      <p>[Answer]</p>

      <aside class="author-bio">
        <p>Written by <a href="/team/[author-slug]/">[Author Name]</a>, [credentials].
        [One sentence biography]. Last updated [Month Day, Year].</p>
      </aside>
    </article>
  </main>

  <footer>
    <p>Crafted by <a href="https://thatdeveloperguy.com/">ThatDeveloperGuy.com</a>.</p>
  </footer>

  <script src="/assets/js/main.js" defer></script>
</body>
</html>

19.14 Audit Bash Helpers

#!/bin/bash
# Quick eligibility audit script for any domain
DOMAIN="$1"

if [ -z "$DOMAIN" ]; then
  echo "Usage: $0 <domain>"
  exit 1
fi

echo "=== Robots.txt check ==="
curl -s "https://$DOMAIN/robots.txt" | head -50

echo ""
echo "=== Llms.txt check ==="
curl -sI "https://$DOMAIN/llms.txt" | head -3

echo ""
echo "=== Sitemap check ==="
curl -sI "https://$DOMAIN/sitemap.xml" | head -3

echo ""
echo "=== Canonical check (homepage) ==="
curl -s "https://$DOMAIN/" | grep -i 'rel="canonical"' | head -3

echo ""
echo "=== Schema presence check (homepage) ==="
SCHEMA_COUNT=$(curl -s "https://$DOMAIN/" | grep -c 'application/ld+json')
echo "Schema script tags found: $SCHEMA_COUNT"

echo ""
echo "=== HTTP version check ==="
curl -sI --http2 "https://$DOMAIN/" | head -1
curl -sI --http3 "https://$DOMAIN/" 2>/dev/null | head -1

echo ""
echo "=== Mobile rendering check ==="
curl -s -A "Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36" "https://$DOMAIN/" | \
  grep -ic '<h1\|<main\|<article'

echo ""
echo "=== AI crawler reading mode test (GPTBot) ==="
curl -s -A "GPTBot" "https://$DOMAIN/" | head -100 | grep -ic '<h1\|<main\|<article'

echo ""
echo "=== Server log AI crawler activity (last 10000 lines) ==="
if [ -r /var/log/nginx/access.log ]; then
  sudo tail -10000 /var/log/nginx/access.log 2>/dev/null | \
    grep -E "GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Google-Extended|Bingbot" | \
    awk '{print $12, $13, $14, $15}' | sort | uniq -c | sort -rn | head -20
else
  echo "Log file not readable from this context"
fi

19.15 Citation Sampling Script

#!/bin/bash
# Manual citation sampling helper
# Run this weekly with a list of priority queries

QUERIES_FILE="$1"

if [ -z "$QUERIES_FILE" ]; then
  echo "Usage: $0 <queries.txt>"
  exit 1
fi

OUTPUT_FILE="citation-sample-$(date +%Y%m%d).md"

cat > "$OUTPUT_FILE" << EOF
# Citation Sample
## $(date +%Y-%m-%d)

For each query, manually run on each surface and record:
- Cited (URL appears as a source)
- Mentioned (brand name appears in answer text without link)
- Not present

| Query | Google AI Overview | Google AI Mode | ChatGPT | Perplexity | Claude | Bing Copilot |
|-------|--------------------|----------------|---------|------------|--------|--------------|
EOF

while IFS= read -r query; do
  echo "| $query | | | | | | |" >> "$OUTPUT_FILE"
done < "$QUERIES_FILE"

echo "Created $OUTPUT_FILE. Fill in manually and commit to engagement notes."

19.16 GSC Performance Export Script

#!/bin/bash
# GSC export helper using Google Search Console API
# Requires gcloud auth and Search Console API enabled

PROPERTY="$1"
START_DATE="$2"
END_DATE="$3"

if [ -z "$PROPERTY" ] || [ -z "$START_DATE" ] || [ -z "$END_DATE" ]; then
  echo "Usage: $0 <property-url> <start-date> <end-date>"
  echo "Example: $0 'https://example.com/' 2026-04-01 2026-04-30"
  exit 1
fi

# Get OAuth token
TOKEN=$(gcloud auth print-access-token)

# Export queries dimension
curl -s -X POST \
  "https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"startDate\": \"$START_DATE\",
    \"endDate\": \"$END_DATE\",
    \"dimensions\": [\"query\"],
    \"rowLimit\": 25000
  }" > "gsc-queries-$START_DATE-to-$END_DATE.json"

# Export pages dimension
curl -s -X POST \
  "https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"startDate\": \"$START_DATE\",
    \"endDate\": \"$END_DATE\",
    \"dimensions\": [\"page\"],
    \"rowLimit\": 25000
  }" > "gsc-pages-$START_DATE-to-$END_DATE.json"

# Export search appearance dimension (cannot combine with other dimensions per API constraint)
curl -s -X POST \
  "https://searchconsole.googleapis.com/webmasters/v3/sites/$(printf %s "$PROPERTY" | jq -sRr @uri)/searchAnalytics/query" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"startDate\": \"$START_DATE\",
    \"endDate\": \"$END_DATE\",
    \"dimensions\": [\"searchAppearance\"],
    \"rowLimit\": 25000
  }" > "gsc-search-appearance-$START_DATE-to-$END_DATE.json"

echo "Exports written to current directory."
echo "Note: Search Appearance must be queried separately and cannot be combined with other dimensions."

20. Appendix D: 2026 Data Citations

The framework's strategic decisions are anchored in measured industry data. Sources are listed here for verification and update. Refresh this section quarterly as new studies are published.

20.1 AI Overview and AI Mode Coverage

AI Overviews appear on about 48 percent of all Google searches (Q1 2026). Source: Digital Applied, BrightEdge AI Overview Impact Report.
AI Overviews appear on 70 plus percent of informational and how to queries. Source: BrightEdge.
Google AI Mode launched limited availability May 2025, expanded globally through 2025, 75 million daily active users by January 2026, processing over 1 billion queries per month. Source: Digital Applied, ALM Corp.
AI Mode runs on Gemini 3 Pro with Personal Intelligence integration since January 22, 2026. Source: ALM Corp.

20.2 Click and Conversion Behavior

93 percent of AI Mode queries result in zero clicks. Source: Seer Interactive 25.1 million impression study.
Organic CTR drops up to 61 percent on queries with AI Overviews. Source: Seer Interactive (1.76 percent baseline to 0.61 percent with AI Overview).
Sites cited in AI Overviews see 35 percent more clicks than non cited top 10 results. Source: Seer Interactive.
Cited visitors convert at about 23 times the rate of standard search visitors. Source: Seer Interactive, Alhena, GeoLikeAPro.
58.5 percent of all searches now end without a click. Source: SparkToro/Datos Q2 2025 zero click study.

20.3 Citation Decoupling

68 percent of pages cited in AI Overviews are NOT in the top 10 organic results. Source: Surfer SEO December 2025 study of 173,902 URLs across 10,000 keywords.
Only 38 percent of pages cited in AI Overviews also rank in the top 10, down from 76 percent seven months earlier. Source: Ahrefs February 2026 study of 863,000 keywords.
25 to 39 percent overlap between traditional Google rankings and AI search citations. Source: Mike King, SparkToro Office Hours, January 2026.
Only 13.7 percent citation overlap between AI Overviews and AI Mode. Source: Ahrefs December 2025.
Brands relying solely on traditional SEO miss 87.5 to 89.8 percent of AI citation opportunities. Source: Ekamoira Topical Coverage Gap research, synthesizing Mike King and Surfer SEO data.

20.4 Citation Volatility

AI Overview content changes 70 percent of the time for the same query. Source: Ahrefs November 2025.
45.5 percent of AI Overview citations get replaced when the answer regenerates. Source: Ahrefs November 2025.
AI Mode self overlap on the same query run three times: 9.2 percent. Source: SE Ranking August 2025.
Less than 1 in 100 chance ChatGPT or Google AI returns the same brand list twice across 100 runs. Source: SparkToro January 2026.

20.5 Mention Versus Citation

AI Mode cites sources 76.3 percent of the time, mentions brands 37.6 percent. Source: Growth Memo April 2026.
AI Overviews cite sources 84.9 percent of the time, mention brands 61 percent. Source: Growth Memo April 2026.
AI systems use content aggregators (Medium, Wikipedia, Wired) as sources but rarely mention them. Source: Growth Memo April 2026.

20.6 Query Fan Out

Google AI Mode fires 9 to 11 parallel sub queries per user prompt; some studies measure up to 16. Source: upGrowth, SE Ranking, SEO.com.
ChatGPT runs 2.3 to 2.8 sub queries per prompt. Source: upGrowth.
E commerce: 18 to 22 sub queries per prompt with 61 percent citation rate. Source: Go Fish Digital cited via Wellows.
Healthcare: 22 to 28 sub queries with 48 percent citation rate. Source: Wellows.
Finance: 16 to 20 sub queries with 52 percent citation rate. Source: Wellows.

20.7 Content Format and Citation Patterns

44.2 percent of all LLM citations come from the first 30 percent of a page's text. Source: Position Digital April 2026.
Pages above 20,000 characters average about 10 AI citations each. Pages under 500 characters average 2.39. Source: Digital Applied.
ChatGPT prefers focused shorter content; pages covering 26 to 50 percent of fan out sub queries get cited more than pages covering 100 percent. Source: Growth Memo April 2026.
Pages with semantically relevant title and URL slug are more likely to be cited by ChatGPT. Source: Ahrefs April 2026.
AI cites pages that are 25.7 percent fresher than traditional search surfaces. Source: Ahrefs.

20.8 Off Page and Distribution

Earned media distribution can lift AI citations by up to 325 percent versus owned site only. Source: Stacker December 2025.
YouTube mentions and branded web mentions are the top correlated factors with AI brand visibility across ChatGPT, AI Mode, and AI Overviews. Source: Ahrefs December 2025.
92 percent of the time, ChatGPT agents rely on the Bing Search API. Source: Search Engine Land October 2025.
46 percent of ChatGPT bot visits begin in reading mode (plain HTML). Source: Search Engine Land October 2025.
63 percent of ChatGPT agents leave immediately after landing. Source: Search Engine Land October 2025.

20.9 Long Tail and Keyword Statistics

91.8 percent of all searches are long tail (3 plus words). Source: Whitehat SEO B2B Guide, multiple corroborating studies.
Long tail keywords convert at 2.5 times the rate of head terms. Source: Yotpo, W3era, multiple e commerce studies.
AI search intent breakdown: Informational 34.28 percent, Comparative/Selection 23.82 percent, Acquisition 16.44 percent. Source: Ignite Visibility benchmarks.

20.10 Schema and Rich Results

HowTo rich results deprecated September 2023, removed from desktop and mobile. Source: Google Search Central.
7 schema types deprecated January 2026: Practice Problem, Dataset (general search), Sitelinks Search Box, SpecialAnnouncement, Q and A, Book Actions, Course Info, Claim Review, Estimated Salary, Learning Video, Vehicle Listing. Source: Google Search Central blog November 2025 announcement, ALM Corp coverage.
March 2026 core update: FAQ rich result impressions dropped nearly 50 percent. HowTo rich results disappeared from supplementary content. Review schema demoted on editorial comparison posts. Source: Digital Applied March 2026.
FAQPage rich results restricted primarily to government and authoritative health sites since 2023. Source: Google Search Central.
Pages with structured data earn 35 percent higher CTR from rich results when displayed. Source: Digital Applied.

20.11 Search Engine Market Share (March 2026)

Google: 90.01 percent global. Source: StatCounter March 2026.
Bing: 4.98 percent global, ~7 percent US. Source: StatCounter, Microsoft Q2 FY26 earnings.
Yandex: 1.34 percent global, 65 to 72 percent in Russia. Source: StatCounter.
Yahoo: 1.39 percent global (uses Bing index). Source: StatCounter.
DuckDuckGo: 0.76 percent global, 1.84 percent US. Source: StatCounter, DuckDuckGo Traffic.
Baidu: 0.55 percent global, 53 percent in China. Source: StatCounter.
Brave Search: independent 30 billion page index, 50 million daily queries. Source: Brave.

20.12 AI Engine Usage

ChatGPT: 800 million weekly active users, 2 to 2.5 billion daily prompts, 65 percent of which qualify as search. Source: OpenAI public reporting, Jasper.
Perplexity: 33 million plus monthly active users. Source: Perplexity public reporting.
Total search usage (combined search engines plus LLM search) up 26 percent worldwide and 16 percent in US. Source: Graphite March 2026.

End of Framework Document

This document is version 2.0, last updated 2026-05-03. The next scheduled review is 2026-08-03 (quarterly cadence). Framework version bumps occur when Google or major LLM providers ship meaningful changes to ranking, citation, or schema behavior.

For corrections, additions, or vertical specific extensions, contact admin@thatdeveloperguy.com.

Crafted by ThatDeveloperGuy.com.