Most SEO tools can crawl thousands (or millions) of URLs.
But after working with large websites, I realized something important:
crawling a site is not the same as understanding crawlability.
In this post, I want to share what I learned while analyzing crawl depth and internal linking patterns at scale, and why many SEO issues don’t come from content, but from structure.
Crawling vs real reachability
A crawler can technically discover a URL, but that doesn’t mean the URL is:
- Easy to reach through internal links
- Close enough to the homepage
- Likely to be crawled frequently by search engines
On large websites, I repeatedly found URLs that:
- Were technically crawlable
- Appeared in sitemaps
- Had no meaningful internal paths pointing to them
They existed, but they were structurally isolated.
Crawl depth is not just a number
Crawl depth is often treated as a simple metric:
“This URL is at depth 6, that one at depth 3.”
In practice, depth behaves more like a distribution problem:
- A small percentage of URLs concentrate most internal links
- Large clusters sit far away from strong linking hubs
- Navigation patterns matter more than raw depth values
Two URLs at the same depth can have very different crawl probabilities depending on how they’re linked.
Internal linking patterns repeat themselves
After analyzing multiple sites, some patterns showed up again and again:
- Pagination creates long, weak crawl paths
- Faceted navigation generates depth without authority
- Orphan-like URLs exist even when “everything is linked”
- Sitemaps give a false sense of coverage
Most of these issues are invisible if you only look at:
- Indexation reports
- URL counts
- Crawl totals
You need to look at how URLs are actually connected.
Why this matters for SEO
Search engines don’t crawl websites randomly.
They follow links, prioritize paths, and allocate attention unevenly.
If internal linking doesn’t reflect real priorities:
- Important pages may be under-crawled
- Indexation becomes inconsistent
- Crawl budget is wasted on low-value paths
This is especially critical for large or complex websites.
A note on tooling
While working on a side project focused on analyzing crawl depth and internal linking (SEODataReport),
I started noticing how often structural issues were being overlooked in favor of surface-level metrics.
That experience reinforced one idea:
technical SEO problems are usually architectural, not cosmetic.
More info here:
https://seodatareport.com
Final thoughts
If you work with large websites, I’d encourage you to look beyond:
- Total URLs
- Indexation percentages
- Crawl stats alone
Instead, focus on:
- Real crawl paths
- Link concentration
- Structural distance from key hubs
Understanding how a site is connected often explains why SEO issues exist long before content or keywords do.
Top comments (0)