What I learned analyzing crawl depth and internal linking at scale

#seo #technicalseo #data #webdev

Most SEO tools can crawl thousands (or millions) of URLs.

But after working with large websites, I realized something important:
crawling a site is not the same as understanding crawlability.

In this post, I want to share what I learned while analyzing crawl depth and internal linking patterns at scale, and why many SEO issues don’t come from content, but from structure.

Crawling vs real reachability

A crawler can technically discover a URL, but that doesn’t mean the URL is:

Easy to reach through internal links
Close enough to the homepage
Likely to be crawled frequently by search engines

On large websites, I repeatedly found URLs that:

Were technically crawlable
Appeared in sitemaps
Had no meaningful internal paths pointing to them

They existed, but they were structurally isolated.

Crawl depth is not just a number

Crawl depth is often treated as a simple metric:

“This URL is at depth 6, that one at depth 3.”

In practice, depth behaves more like a distribution problem:

A small percentage of URLs concentrate most internal links
Large clusters sit far away from strong linking hubs
Navigation patterns matter more than raw depth values

Two URLs at the same depth can have very different crawl probabilities depending on how they’re linked.

Internal linking patterns repeat themselves

After analyzing multiple sites, some patterns showed up again and again:

Pagination creates long, weak crawl paths
Faceted navigation generates depth without authority
Orphan-like URLs exist even when “everything is linked”
Sitemaps give a false sense of coverage

Most of these issues are invisible if you only look at:

Indexation reports
URL counts
Crawl totals

You need to look at how URLs are actually connected.

Why this matters for SEO

Search engines don’t crawl websites randomly.
They follow links, prioritize paths, and allocate attention unevenly.

If internal linking doesn’t reflect real priorities:

Important pages may be under-crawled
Indexation becomes inconsistent
Crawl budget is wasted on low-value paths

This is especially critical for large or complex websites.

A note on tooling

While working on a side project focused on analyzing crawl depth and internal linking (SEODataReport),
I started noticing how often structural issues were being overlooked in favor of surface-level metrics.

That experience reinforced one idea:
technical SEO problems are usually architectural, not cosmetic.

More info here:
https://seodatareport.com