30,000 pages "crawled — not indexed"? Here's what Google actually decided.

#seo #webperf #nextjs #tutorial

Last week I ran a full crawl topology audit on a client's NextJS ecommerce site. 38,000 URLs in their sitemap. Googlebot had crawled 31,000 of them. Indexed: 4,200.

Not a content problem. Not a duplicate content problem. The content was fine — product pages with original descriptions, category guides, blog posts with actual research. Googlebot found every page. It just decided most of them weren't worth the index space.

This is the single most misunderstood SEO problem in 2026.

What "crawled — not indexed" actually means

When you see that status in Google Search Console, your first instinct is "my content isn't good enough." That's the wrong instinct.

What actually happened: Googlebot arrived at the page, parsed it, checked its crawl queue, and made a cost-benefit decision. "This page would take N compute units to render and store. The expected search value is lower than the other 30,000 URLs waiting in line. Skip it."

These are budget decisions, not quality judgments.

I wrote the full breakdown on Medium — the three-layer architecture of crawl topology, how sitemap tiering works, and the two internal linking patterns that actually move pages from "crawled" to "indexed":

Google Stopped Indexing Your Pages? It's Probably Your Crawl Topology, Not Your Content →

The short version for devs

Three things determine whether Googlebot prioritizes a page:

1. URL depth from the homepage
Every click deeper reduces crawl priority. Pages at depth 3+ get crawled last, if the budget holds. A product page at /shop/category/subcategory/product (depth 4) competes with 15,000 other URLs for the last 20% of crawl budget.

2. Internal link equity distribution
Most large sites have a power-law link distribution: the homepage and blog each have hundreds of internal links. Product pages have 2-3. Googlebot follows the links — it spends proportionally more budget on the pages with more inbound links. If your product pages are link-poor, they're crawl-poor.

3. Orphaned pages
Pages that exist in the sitemap but have zero internal links. Googlebot finds them through the sitemap, crawls them, and promptly forgets them because nothing else on the site points to them. No context means no index priority.

The fix in one paragraph

Segment your sitemap by conversion tier, not content type. Make sure every product page has at least 3 internal links from category or related-product pages. Run a crawl audit (I use the tool I built at outboundautonomy.com — free, no account) and look for pages that are in your sitemap but have zero internal links. Those are your crawl budget leaks.

Why this matters right now

Google's May 2026 Core Update is rolling out (May 21 through ~Jun 4). Early reports in the SEO community show a sharp increase in pages shifted to "crawled — not indexed" status. This update appears to be tightening crawl budget allocation based on site-level crawl topology quality — not content quality.

If you've seen a drop in indexed pages over the last two weeks, check your crawl topology before you rewrite a single sentence.