DEV Community

CBG
CBG

Posted on

What I learned analyzing crawl depth and internal linking at scale

Most SEO tools can crawl thousands (or millions) of URLs.

But after working with large websites, I realized something important:
crawling a site is not the same as understanding crawlability.

In this post, I want to share what I learned while analyzing crawl depth and internal linking patterns at scale, and why many SEO issues don’t come from content, but from structure.


Crawling vs real reachability

A crawler can technically discover a URL, but that doesn’t mean the URL is:

  • Easy to reach through internal links
  • Close enough to the homepage
  • Likely to be crawled frequently by search engines

On large websites, I repeatedly found URLs that:

  • Were technically crawlable
  • Appeared in sitemaps
  • Had no meaningful internal paths pointing to them

They existed, but they were structurally isolated.


Crawl depth is not just a number

Crawl depth is often treated as a simple metric:

“This URL is at depth 6, that one at depth 3.”

In practice, depth behaves more like a distribution problem:

  • A small percentage of URLs concentrate most internal links
  • Large clusters sit far away from strong linking hubs
  • Navigation patterns matter more than raw depth values

Two URLs at the same depth can have very different crawl probabilities depending on how they’re linked.


Internal linking patterns repeat themselves

After analyzing multiple sites, some patterns showed up again and again:

  • Pagination creates long, weak crawl paths
  • Faceted navigation generates depth without authority
  • Orphan-like URLs exist even when “everything is linked”
  • Sitemaps give a false sense of coverage

Most of these issues are invisible if you only look at:

  • Indexation reports
  • URL counts
  • Crawl totals

You need to look at how URLs are actually connected.


Why this matters for SEO

Search engines don’t crawl websites randomly.
They follow links, prioritize paths, and allocate attention unevenly.

If internal linking doesn’t reflect real priorities:

  • Important pages may be under-crawled
  • Indexation becomes inconsistent
  • Crawl budget is wasted on low-value paths

This is especially critical for large or complex websites.


A note on tooling

While working on a side project focused on analyzing crawl depth and internal linking (SEODataReport),
I started noticing how often structural issues were being overlooked in favor of surface-level metrics.

That experience reinforced one idea:
technical SEO problems are usually architectural, not cosmetic.

More info here:
https://seodatareport.com


Final thoughts

If you work with large websites, I’d encourage you to look beyond:

  • Total URLs
  • Indexation percentages
  • Crawl stats alone

Instead, focus on:

  • Real crawl paths
  • Link concentration
  • Structural distance from key hubs

Understanding how a site is connected often explains why SEO issues exist long before content or keywords do.


Enter fullscreen mode Exit fullscreen mode

Top comments (0)