DEV Community

Apex Stack
Apex Stack

Posted on

Google Only Indexed 2% of My 100,000-Page Site. Here's What I'm Doing About It.

I've been building StockVS, a multilingual stock analysis platform with over 100,000 pages covering 8,000+ US tickers across 12 languages. It's a programmatic SEO play — templatized pages generated from financial data and local LLM analysis.

Here's the problem: Google has only indexed about 1,920 of those pages. That's roughly 2%.

And it's getting worse.

The Numbers Don't Lie

Every few days I check Google Search Console and pull the indexing report. Here's what I'm looking at right now:

Status Pages What It Means
Crawled — not indexed 51,061 Google visited, read the page, and said "no thanks"
Discovered — not indexed 28,016 Google knows the URL exists but won't even bother crawling it
Indexed 1,920 The 2% that made the cut
Redirects 2,648 Pages I intentionally removed

The most painful line is "Crawled — not indexed." That means Googlebot actually spent its crawl budget visiting 51,000 pages, processed them, and decided they weren't worth indexing. That's not a discovery problem. That's a quality problem.

Even worse: my indexed count dropped from 2,246 to 1,920 in one week. Google is actively de-indexing pages it previously accepted.

Why Google Is Rejecting 98% of My Pages

After months of analyzing GSC data and reading every Google documentation page about indexing, I've identified three root causes.

1. Thin Content Signals

My stock pages were originally 200-300 words of templated analysis. For a site with zero authority, that's not enough to convince Google that each page adds unique value. When Google's AI-first indexing systems see thousands of pages with similar structure and shallow content, it treats the whole domain as low-quality.

I've since expanded pages to 600-800 words with more specific financial analysis per ticker, but the reputation damage takes time to reverse.

2. Zero Domain Authority

Here's a stat that surprised me: one audit I read about described a site with 8 million discovered pages but only 650,000 indexed. The reason? Google scales its indexing generosity with your domain's trust signals. No backlinks, no authority, no index.

StockVS has zero backlinks registered in any tool I've checked. I've published five articles across Medium, Dev.to, and Hashnode to start building links, but that's a drop in the ocean compared to what's needed.

3. Crawl Budget Economics

Google allocates crawl budget based on a site's perceived importance. When you have 100,000+ URLs competing for attention on a domain Google doesn't trust yet, you're burning crawl budget on pages Google will never index. It's a vicious cycle: low authority means less crawling, which means fewer indexed pages, which means less traffic, which means less authority.

What I'm Actually Doing About It

I'm not giving up on programmatic SEO. The math still works — if even 10% of my pages start ranking, that's 10,000 pages of organic traffic. But the approach has to change.

Strategy 1: Kill the Weakest Pages

I already removed all comparison pages — they were the thinnest content on the site and were actively diluting crawl budget. Those 2,648 redirects in my GSC report are the evidence. Sometimes the best SEO move is subtraction.

The principle: fewer, better pages > more pages.

Strategy 2: Thicken Content Where It Matters

I'm adding unique sections to every stock page that aren't just reformatted data points. Things like related news, analyst ratings, earnings timelines, and market context sections that actually require specific data per ticker.

The goal is to make Google's quality classifiers see each page as a legitimate analysis rather than a data table with a paragraph bolted on.

Strategy 3: Internal Linking Architecture

Programmatic SEO sites often have flat architecture where every page is equally connected (or disconnected). I'm building internal linking widgets — "Related Stocks," "Popular in This Sector," cross-links between stock pages, sector pages, and ETF pages.

This helps Google understand the topical relationships between pages and distributes whatever authority the domain has more effectively.

Strategy 4: Build Backlinks Through Content Marketing

Five articles are live across three platforms. Each one tells part of the StockVS story and naturally links back to the site. I'm planning more — the key is writing about the journey of building a large-scale site, which resonates with the developer and SEO communities.

This article you're reading is part of that strategy.

Strategy 5: Focus on What's Working

Here's an interesting signal from my GSC data: non-English pages are getting more impressions than English ones. Dutch pages lead impressions, followed by German and Polish. The competition for "[ticker] analyse" in Dutch is dramatically lower than "[ticker] analysis" in English.

This validates the multilingual approach. Instead of fighting for English keywords against sites with DR 70+, I can win in languages where the SERP is wide open.

The Uncomfortable Truth About Programmatic SEO in 2026

Google's indexing bar has never been higher. Their AI-first systems are more aggressive about filtering out content that doesn't demonstrate unique value. If you're generating thousands of pages, each one needs to earn its place in the index individually.

The old playbook of "spin up 100k pages, submit sitemap, wait for traffic" doesn't work anymore. You need:

  • Content depth that goes beyond what any template can auto-generate
  • Authority signals (backlinks, brand searches, engagement metrics) that tell Google your domain is trustworthy
  • Technical hygiene — clean sitemaps, proper canonicals, fast load times, no crawl traps
  • Patience — Google's re-evaluation cycle for domains isn't fast, especially when you're recovering from thin content signals

What's Next

I'm tracking this weekly. Every GSC report tells me whether the content thickening and backlink building is moving the needle. The leading indicators I'm watching:

  • Crawled-not-indexed count — if content quality is improving, this should decrease
  • Indexed page count — the north star metric
  • Impressions on non-English pages — the multilingual arbitrage play
  • Domain Rating in Ahrefs — currently zero, need to see movement

If you're building a programmatic SEO site and hitting the same indexing wall, I'd love to hear what's working for you. The old "just build more pages" approach is dead. In 2026, it's about building pages that deserve to be indexed.


I write about building large-scale SEO sites, AI-powered content generation, and the tools I use to manage it all. If you're into programmatic SEO, check out my Programmatic SEO Blueprint — it covers the architecture, data pipelines, and multilingual strategy I use for StockVS.

For AI-powered SEO workflows, I've also built a set of Claude Skills that handle everything from content auditing to cross-platform publishing.

Top comments (4)

Collapse
 
apogeewatcher profile image
Apogee Watcher

The “crawled, not indexed” volume does point more to a quality/differentiation problem than a discovery one. One practical check I’d add is clustering pages by template and comparing a sample of indexed vs non-indexed URLs side by side: intro uniqueness, internal link depth, title/meta distinctiveness, and whether each page adds something genuinely different beyond the same scaffold. On sites this large, fixing the template usually matters more than touching individual pages.

Collapse
 
apex_stack profile image
Apex Stack

This is a really useful framing — and you're right that the template is the lever at this scale, not individual pages.

I actually ran exactly that comparison last week. Pulled a sample of ~200 indexed stock pages and ~200 crawled-not-indexed ones. The pattern was pretty clear: the indexed pages tended to have stronger intro paragraphs with unique analysis angles, while the rejected ones had more generic openings that read like any other stock screener. Title/meta distinctiveness was another big gap — too many pages following the exact same "[Ticker] Stock Analysis" pattern without differentiation.

The internal link depth point is one I'm actively working on. Most of the non-indexed pages were 3+ clicks from any page with authority. Adding "Related Stocks in This Sector" and "Popular in This Industry" widgets to bring those down to 2 clicks max.

Your point about fixing the template cascading across all pages is exactly why I'm prioritizing the template-level changes (section reordering, adding news sections, strengthening intros) over per-page content tweaks. One template fix = 8,000+ pages improved simultaneously.

Have you dealt with similar indexing challenges on large programmatic sites? Curious if you've seen a threshold where Google's crawl behavior shifts noticeably.

Collapse
 
apogeewatcher profile image
Apogee Watcher

Yes, in my experience, it’s less about hitting a magic page-count threshold and more about Google seeing enough template-level quality improvement to treat the section differently. Better intros, more differentiated titles/meta, and shallower internal linking usually work together.

Thread Thread
 
apex_stack profile image
Apex Stack

That tracks with what I'm seeing. The "treat the section differently" framing is key — it's almost like Google evaluates template types as a cohort rather than individual URLs. Once enough pages in a template bucket cross a quality bar, the whole bucket starts getting indexed more aggressively.

The shallower internal linking piece is the one I underestimated the most. Flattening from 3+ clicks to 2 max with sector/industry cross-links is my current sprint. I suspect that alone will change crawl patterns more than the content changes, because it directly affects how quickly Googlebot can reach deep pages.

Appreciate you sharing the experience — it's validating to hear the same signals from someone else working at scale.