๐ฅ TL;DR โ Want the full technical playbook? This article covers the core concepts. The complete guide includes GSC audit templates, robots.txt patterns for 12 CMS setups, and the implementation calendar that took us from 15% to 94% indexation.
โ Fix Your Crawl Budget: The Complete Playbook โ โฌ12, instant PDF ยท 30-day refund
We had 800 pages. Google was indexing 120.
That's 85% of our content invisible to search โ despite being technically sound, well-written, and internally linked.
The problem wasn't content quality. It was crawl budget waste.
Here's what we found, what we fixed, and what actually moved the needle.
What Is Crawl Budget (And Why It Matters at Scale)
Google doesn't crawl your entire site on every pass. It has a budget โ determined by your domain authority and server response time.
For small sites (<1,000 pages): rarely a problem.
For large sites (10,000+ pages): it becomes the bottleneck between publishing and ranking.
The part most guides miss: crawl budget is wasted on pages that will never rank.
Every paginated URL, filter combination, or session-parameterized URL Googlebot visits is a crawl NOT spent on your actual content.
Our Crawl Budget Audit
Pull your server crawl log. Cross-reference with GSC "Not Indexed" report.
What was eating our budget:
| Issue | URLs Wasted |
|---|---|
| Pagination (/page/2, /page/3...) | 340 |
| Faceted navigation (?sort=price&color=red) | 180 |
| Session parameters in URLs | 90 |
| Thin tag/category pages | 70 |
That's 680 URLs consuming crawl budget daily โ 0 ranking potential.
Fix those, and Google redirects crawl capacity to your real content.
The 5 Fixes That Worked
1. Noindex Pagination (Except Page 1)
<!-- On /blog/page/2 and all subsequent pages -->
<meta name="robots" content="noindex, follow">
Keep follow so link equity still passes. Just stop indexation of content-free pages.
2. Canonical Tags on Filter/Facet Pages
<!-- On /shoes?color=red&size=42 -->
<link rel="canonical" href="https://example.com/shoes/">
Consolidates crawl budget to the base page. Prevents duplicate content penalties.
3. Block Session Parameters in robots.txt
# robots.txt
Disallow: /*?session_id=
Disallow: /*?ref=email
Disallow: /*?utm_source=
Better fix: sessions in cookies, not URLs. But robots.txt works in the short term.
4. Internal Links to Underlinked Pages
Pages with zero internal links are invisible to Googlebot even if they're technically indexed.
Use Screaming Frog to find pages with <2 internal links. Add links from your high-traffic content.
Rule of thumb: No content page should have fewer than 3 relevant internal links pointing to it.
5. Server Response Time Under 200ms
Googlebot respects rate limits. Slow servers = fewer crawls per day.
Target <200ms TTFB. CDN for static assets. Aggressive page-level caching.
Going from 800ms to 180ms TTFB alone increased our daily crawl rate by ~40%.
The Results (6 Weeks Later)
| Metric | Before | After |
|---|---|---|
| Indexed pages | 120 | 748 |
| GSC daily crawl rate | 340/day | 1,200/day |
| Organic impressions | 12k/mo | 47k/mo |
Not all indexed pages ranked. But you can't rank what Google hasn't seen.
Ongoing Monitoring
Weekly GSC audit habit:
- "Not indexed" count โ should decrease weekly
- "Crawled but not indexed" โ content quality issue
- "Discovered but not crawled" โ internal linking issue
Set up a Google Sheet pulling GSC data weekly. Flag anything that changes by >10%.
๐ฅ The Complete Crawl Budget Playbook
This article is the theory. The full playbook includes:
- GSC audit template (CSV with formulas to identify high-priority gaps)
- robots.txt patterns for 12 common CMS setups (WordPress, Webflow, Next.js, Shopify...)
- Internal linking matrix for content cluster optimization
- Server configuration checklist for Nginx, Apache, and Vercel
- 30-day implementation calendar with weekly checkpoints
โ 800 Pages, 120 Indexed: Fix Your Crawl Budget โ โฌ12, instant PDF download
30-day money-back guarantee. No questions asked.
Top comments (0)