DEV Community

Apex Stack
Apex Stack

Posted on

I'm Optimizing 100,000 Pages for AI Search Engines, Not Just Google. Here's My Playbook.

Google AI Overviews now appear in more than 1 out of every 4 search results. For long-tail queries — the exact type that programmatic SEO sites target — that number jumps above 50%.

I run StockVS.com, a programmatic SEO site with 100,000+ pages covering stock analysis, sector breakdowns, and ETF data across 12 languages. When I started building it, I optimized exclusively for traditional Google rankings. Page titles, meta descriptions, schema markup, internal linking — the standard playbook.

That playbook is no longer enough. AI search engines like Google's AI Overviews, ChatGPT with browsing, Perplexity, and others are reshaping how people find information. They don't just rank pages — they synthesize answers. And if your content isn't structured to be cited by these systems, you're invisible in the new search landscape.

Here's how I'm adapting 100,000 pages for a world where AI does the reading first.

The Problem: AI Overviews Eat Your Click

Here's what happens now when someone searches "NVDA stock analysis 2026":

  1. Google shows an AI Overview at the top — a synthesized answer pulling from multiple sources
  2. Below that, maybe some People Also Ask boxes
  3. Then the traditional blue links

The AI Overview answers the question well enough that many users never scroll down. For programmatic SEO sites that depend on long-tail traffic, this is an existential shift. You can rank on page 1 and still get zero clicks because the AI summary already gave the user what they needed.

I noticed this pattern in my own Search Console data. Impressions were climbing in certain query buckets, but click-through rates were declining. People were seeing my pages in search results but not clicking through — because the AI Overview had already answered their question.

GEO: The New Optimization Layer

The SEO community has started calling this "Generative Engine Optimization" or GEO. It's the practice of structuring your content so that AI systems are more likely to cite it when generating answers.

This isn't about tricking AI. It's about making your data so clear, so structured, and so authoritative that when an AI needs to answer a financial question, your page becomes the obvious source to cite.

Here's what I've changed across my 100,000+ pages.

1. Structured Data Becomes Non-Negotiable

I was already using schema markup — FinancialProduct, FAQPage, BreadcrumbList. But for AI search, I've gone deeper.

Every stock page on StockVS now includes:

{
  "@type": "FinancialProduct",
  "name": "AAPL Stock Analysis",
  "description": "Apple Inc. stock analysis with key financials, valuation metrics, and sector comparison",
  "provider": {
    "@type": "Organization",
    "name": "StockVS"
  },
  "offers": {
    "@type": "Offer",
    "price": "0",
    "priceCurrency": "USD"
  }
}
Enter fullscreen mode Exit fullscreen mode

But I've also added explicit dateModified timestamps to every page, author markup linking to the site's editorial policy, and about schema connecting each stock page to its sector and industry.

Why? AI systems use structured data as a trust signal. When Perplexity or Google's AI Overview needs to decide which source to cite for "NVDA P/E ratio," the page with clean, machine-readable schema wins.

2. Direct-Answer Formatting

AI Overviews pull from content that directly answers questions. I restructured every stock page to lead with key metrics in a scannable format before diving into analysis.

Instead of:

"Apple Inc. (AAPL) is a technology company that designs, manufactures, and markets smartphones..."

I now start with:

AAPL Key Metrics (March 2026)

  • Market Cap: $3.2T
  • P/E Ratio: 28.4
  • Dividend Yield: 0.54%
  • 52-Week Range: $169.21 – $260.10
  • Sector: Technology

Then the analysis follows. This formatting makes it trivial for AI systems to extract and cite specific data points. Every page becomes a structured data card that AI can pull from.

3. FAQ Sections That AI Actually Cites

I've always had FAQ schema on my pages, but I reworked the questions to match how people actually query AI assistants.

Old approach:

  • "What is the P/E ratio of AAPL?"
  • "Is AAPL a good investment?"

New approach:

  • "How does AAPL's valuation compare to the Technology sector average?"
  • "What are the key risks for Apple stock in 2026?"
  • "Should I buy AAPL at its current price?"

The difference: the new questions match how people phrase queries to ChatGPT, Perplexity, and Google's conversational search. When an AI system encounters these questions in your FAQ schema, it's more likely to cite your answer in its generated response.

4. Unique Data Points as Citation Magnets

Here's the biggest insight: AI systems preferentially cite pages that contain unique, quantitative data that other sources don't have.

Every stock page on StockVS generates unique analysis using financial data from yfinance combined with a local Llama 3 model. This means my AAPL page doesn't just repeat the same data as Yahoo Finance — it includes proprietary analysis, custom comparisons within the sector, and valuation assessments that exist nowhere else on the web.

For AI citation, unique data is the moat. If your page says the same thing as 50 other pages, the AI has no reason to cite you specifically. If your page contains a unique analysis or data point, the AI must cite you to reference that information.

5. Cross-Language as an AI Advantage

One of the most underappreciated aspects of AI search: multilingual content creates citation opportunities across language barriers.

When someone asks ChatGPT about a stock in German, the AI pulls from German-language sources. Most financial analysis sites are English-only. StockVS covers stocks, sectors, and ETFs across 12 languages — and in several of those languages, we're one of very few sources with comprehensive financial analysis.

My Search Console data confirms this: Dutch and German pages consistently generate more impressions per page than English ones. In AI search, this advantage compounds — there are simply fewer high-quality German-language financial analysis pages for AI to cite.

The Technical Implementation

Here's what the pipeline looks like for optimizing 100,000 pages for AI search:

Data Layer (Supabase PostgreSQL)
→ 8,000+ ticker records with live financial data from yfinance
→ Sector/industry/ETF relationship mappings
→ Historical price data and calculated metrics

Content Generation (Local Llama 3)
→ Template-based analysis with unique data-driven insights per ticker
→ FAQ generation matching conversational search patterns
→ Cross-language content with localized financial terminology

Schema Layer (Astro Static Build)
→ JSON-LD structured data generated at build time
FinancialProduct, FAQPage, BreadcrumbList, Organization
dateModified auto-updated per data refresh cycle

Delivery (Cloudflare CDN)
→ Sub-second page loads globally
→ Edge-cached static HTML — no JavaScript rendering required
→ This matters because AI crawlers heavily penalize slow or JS-dependent pages

What I'm Measuring

Traditional SEO metrics don't capture the full picture anymore. Here's what I'm tracking:

  1. CTR at position — If I'm ranking position 5-10 and CTR drops, it likely means an AI Overview is absorbing clicks
  2. Impression-to-click ratio by query type — Long-tail financial queries vs. branded queries
  3. Schema validation rate — Percentage of pages passing Google's Rich Results test
  4. AI citation monitoring — Searching key queries in Perplexity and ChatGPT to check if StockVS pages are cited
  5. Language-specific AI visibility — Checking AI responses in German, Dutch, Polish for stock queries

I don't have a perfect measurement system for AI citations yet — nobody does. But directionally, I can see which optimizations move the needle.

What's Not Working

Transparency time: some things I've tried haven't panned out.

Aggressive FAQ expansion didn't help as much as I expected. Adding 20 FAQs per page didn't increase AI citations — having 5 really well-structured, data-rich FAQs performed better.

Trying to "game" AI Overviews by stuffing exact-match questions into headings backfired. The content felt unnatural and the quality signals degraded.

Over-optimizing meta descriptions for AI was a waste. AI systems read the full page content, not just the meta description. The meta description matters for traditional CTR, not for AI citation.

The Playbook Summary

If you're running a programmatic SEO site, here's the minimum viable GEO stack for 2026:

  1. Schema everythingdateModified, author, about, domain-specific types. Make your data machine-readable.
  2. Lead with data — Put key facts and metrics above the fold in scannable formats. AI extracts from the top of your content first.
  3. Match conversational queries — Rewrite FAQs to match how people ask AI assistants, not how they type into Google.
  4. Generate unique analysis — Original data points are your citation moat. If your page says what everyone else's says, AI won't cite you.
  5. Go multilingual — Non-English AI search is wide open. If your data works in other languages, translate and localize it.
  6. Measure AI visibility — Check Perplexity, ChatGPT, and Google AI Overviews manually for your key queries. The tools for automated tracking are coming but aren't reliable yet.

What's Next

I'm building automated monitoring that checks whether StockVS pages appear in AI-generated answers for a rotating set of financial queries across all 12 languages. It's basically an "AI SERP tracker" — and I'll share the results when I have enough data.

The shift from "rank on page 1" to "get cited by AI" is the biggest change to SEO since mobile-first indexing. For programmatic SEO at scale, it's both a threat and an opportunity. The sites that adapt their content structure for AI consumption first will have a massive head start.


I'm building StockVS and documenting the entire journey. If you're into programmatic SEO, AI-powered content generation, or building data-driven web properties, I write about the real numbers — what works, what fails, and what the data actually says.

Resources I've built:

Top comments (6)

Collapse
 
devgab profile image
DevGab

Great write-up — the failed experiments section alone makes this worth reading. The "unique data as citation moat" point particularly resonates; that feels like the only strategy with real staying power as AI models get better at synthesising commodity content.

Genuine question though: have you been able to quantify whether AI citations actually drive meaningful click-throughs? You mention impressions climbing while CTR drops on long-tail queries, and the pivot to GEO as the response — but I'd love to know if being cited in an AI Overview or a Perplexity answer actually recovers that lost traffic, or if users just consume the synthesised answer and move on.

Basically: is "get cited by AI" a genuine traffic strategy, or more of a brand visibility play where you're trading clicks for mentions? Curious what your data shows on that front.

Collapse
 
apex_stack profile image
Apex Stack

Honest answer: right now it's more brand visibility than a traffic driver, and I think that's actually the correct framing for most sites at this stage.

Here's what my data shows after ~10 days of tracking: GSC impressions are climbing on long-tail queries (2,180 over 3 months, 834 unique queries), but clicks are still in single digits. The CTR drop you're describing is real — when an AI Overview synthesizes the answer, users often don't click through. My avg position is 52.5, so I'm not even in the "could have gotten the click" zone yet for most queries.

What I can't measure yet (and this is the gap in the whole GEO narrative): whether being structured for AI citation actually causes the impression in the first place, or whether it's just correlation with good SEO fundamentals. Schema markup, clear data tables, FAQ sections — those help both traditional and AI search.

My working hypothesis is that GEO is a hedge, not a replacement. The sites that will win are the ones with unique data that AI models can't synthesize from commodity sources. For a financial data site with live calculations across 8,000+ tickers, that's the moat — Perplexity can summarize a stock overview, but it can't run a custom peer comparison with yesterday's data.

So to directly answer your question: treat AI citations as a brand awareness channel (like being quoted in a news article) rather than a direct traffic channel. The ROI comes from authority building, not click recovery. At least that's the honest picture at this scale — would love to hear if anyone with more traffic sees different numbers.

Collapse
 
devgab profile image
DevGab

Really useful data, thanks for sharing actual numbers. The single-digit clicks on 2,180 impressions confirms what I've been suspecting — AI citation is brand awareness at best, not a traffic strategy.

I think the deeper issue is that purely informational content is becoming commodity. If an LLM can synthesise the answer, the user has no reason to click through. The sites that will keep getting real visits are the ones where the value is the interaction — calculators, personalised data, transactions, community. Things that can't be accurately reproduced by an LLM scraping static content. "Unique data" is part of it, but unique functionality is the real moat.

Thread Thread
 
apex_stack profile image
Apex Stack

You nailed the core tension. The commodity risk is real — if an LLM can synthesize the same answer from 10 sources, why would anyone click through to source #7?

What I'm betting on is that structured, queryable data beats prose. A stock page with live P/E ratios, dividend history, and sector comparisons isn't something an LLM can easily replicate from its training data — it needs current numbers. The play is making your content the source that AI platforms cite, not the summary they replace.

The CTR problem you spotted is exactly right though. 2,180 impressions with 3 clicks means Google is showing our pages but users aren't compelled to click. That's a title tag and meta description problem more than a content quality problem — working on making those more specific and action-oriented rather than generic.

Collapse
 
gavincettolo profile image
Gavin Cettolo

This is a really strong and practical playbook. Thanks for sharing it.

What stood out to me is how clearly this captures the shift from ranking to being extracted. The line that stuck with me is essentially: if your answer isn’t easy for an AI to lift, it doesn’t exist. That aligns with what we’re seeing more broadly: AI systems don’t reward “good content” in the traditional sense, they reward structured, quotable content.

I also like how your approach turns that into something operational at scale. Things like:

  • leading with key metrics
  • increasing “direct answer density”
  • treating schema as a trust layer, not a checkbox

…these aren’t just optimizations, they’re constraints that shape how content is written. And that’s the interesting part: we’re not just changing distribution (SEO → GEO), we’re changing the unit of content itself, from narrative pages to what you might call “answer capsules.”

The point about unique data as a “citation moat” is probably the most important takeaway. A lot of teams are trying to adapt to AI search by reformatting existing content, but as you show, structure alone isn’t enough. If the underlying information isn’t differentiated, AI has no reason to pick you over 50 equivalent sources. That lines up with broader findings too, originality and information gain are now key signals for visibility.

Your example of CTR dropping despite impressions going up is also a subtle but critical signal. It highlights how misleading traditional SEO metrics have become in this new landscape. Visibility without clicks used to be an edge case, now it’s becoming the default because AI answers intercept intent before the user ever reaches your page.

One thing your article made me think about: at this scale (100K+ pages), this starts to look less like “content optimization” and more like information architecture design for machines. You’re essentially building a system where:

  • humans read for depth
  • AI reads for extraction
  • and both need to succeed simultaneously

That’s not trivial, especially when AI-generated content itself risks lowering the very clarity and signal these systems depend on.

Overall, this feels like one of the clearest real-world bridges between programmatic SEO and AI-native distribution I’ve seen so far. Really valuable to see it grounded in actual data and constraints, not just theory.

Collapse
 
apex_stack profile image
Apex Stack

Really appreciate the depth of this comment, Gavin. You nailed something I've been wrestling with — the distinction between "content optimization" and "information architecture design for machines" is exactly right. At 100K+ pages, you stop thinking about individual articles and start thinking about how your entire data model maps to extraction patterns.

The "answer capsule" framing is spot on. I've noticed that the pages getting cited by Bing Copilot on my site aren't necessarily the longest or most detailed — they're the ones where the key data point is immediately parseable within a structured context. Leading with the metric, wrapping it in schema, and making the surrounding text reinforce (not dilute) the answer.

And yeah, the CTR paradox is real. Impressions up 3x month over month but clicks barely moved. Traditional SEO would call that a failure, but in the GEO world it means AI systems are reading your content even if humans aren't clicking through. The question is whether that visibility compounds into brand recognition over time — still too early to tell on my end.

The tension you described between human depth and AI extraction is the hardest part. I'm experimenting with a layered approach: structured summary block at the top (for machines), then deeper narrative analysis below (for humans who do click through). Early data suggests it works for both, but it's definitely an evolving playbook.