Yusufhan Sacak

Posted on Feb 18

Why Google Isn't Indexing Your Next.js Site (And How to Find Out in 3 Seconds)

#programming #webdev #nextjs #vercel

You've spent weeks building your Next.js site. You've deployed to Vercel. Everything looks beautiful. There's just one problem — Google doesn't seem to know your site exists.

You check Search Console. It says "Discovered – currently not indexed" on half your pages. Some pages have "Crawled – currently not indexed" with zero explanation. You Google your own brand name and get nothing.

Sound familiar? You're not alone. I've been there, and honestly, it drove me mad.

The silent killers nobody warns you about

Here's what most Next.js tutorials won't tell you: there are at least a dozen ways your perfectly working site can be completely invisible to search engines. And the worst part? None of them show any visible symptoms in your browser.

The 308 trap. Set trailingSlash: true in your next.config.js and Next.js starts returning 308 permanent redirects. Googlebot follows redirects, but chains of them waste your crawl budget. I've seen sites where a single page visit triggered 3 redirects before landing — that's crawl budget down the drain.

The middleware ghost. Next.js middleware can rewrite URLs, redirect users, or modify headers. The problem? It often only affects bots, not browsers. So you test your site, everything works, but Googlebot is getting served something completely different.

The canonical mismatch. Your page lives at https://example.com but the canonical tag points to https://www.example.com. You've just told Google to ignore half your site.

The missing robots.txt. No robots.txt means no sitemap directive, no crawl guidance, nothing. Google will figure it out eventually, but "eventually" could be months.

These aren't edge cases. They're incredibly common on Next.js/Vercel deployments. I kept running into the same issues across different projects, so I built a tool to catch them all in one go.

Introducing vercel-seo-audit

npx vercel-seo-audit https://your-site.com

That's it. One command. It takes about 2-3 seconds and tells you exactly what's wrong, why it matters, and how to fix it.

Here's what the output actually looks like:

SEO Audit Report for https://example.com/
  Completed in 1483ms

  Summary:
    ✖ 1 error
    ⚠ 3 warnings
    ℹ 2 info
    ✔ 4 passed

  REDIRECTS
  ────────────────────────────────────────
  ✖ [ERROR] Redirect chain detected (3 hops)
    → Reduce to a single redirect: http://example.com → https://example.com/

  METADATA
  ────────────────────────────────────────
  ⚠ [WARNING] Canonical URL mismatch
    → Canonical points to https://www.example.com/ but page is https://example.com/

Every finding comes with three things: what's wrong, why it matters for SEO, and a concrete suggestion to fix it. No vague "something might be off" messages.

What it actually checks

The tool runs 11 audit modules in parallel. Here's the full list:

The basics that everyone forgets:

robots.txt — missing, blocking Googlebot, missing Sitemap directive
sitemap.xml — missing, redirected, empty, broken URLs, robots.txt cross-check
Favicon — missing entirely, missing HTML link tags, conflicting declarations

The metadata that makes or breaks your rankings:

Canonical URL presence and mismatches
noindex directives (both meta tags and X-Robots-Tag headers — yes, they can exist in headers too)
Missing title, description, charset, viewport
Open Graph tags (og:title, og:description, og:image) with broken image detection
Twitter Card validation (twitter:card, twitter:image)

The Next.js/Vercel-specific gotchas:

Trailing slash 308 redirect traps
Middleware rewrite/redirect detection
Vercel deployment fingerprinting

The stuff that separates good SEO from great:

Structured data / JSON-LD validation (checks for Article, FAQPage, Product, Organisation and more)
Internationalisation / hreflang tags (self-reference, x-default, reciprocal links)
Image SEO (missing alt text, not using next/image, missing lazy loading, oversized images, layout shift)
Security headers (HSTS, X-Content-Type-Options, X-Frame-Options, Referrer-Policy)

And with --crawl, it fetches every URL from your sitemap and audits each page individually.

Real-world examples

Let me show you what this looks like on actual sites.

A well-configured site

Running it against a properly configured Next.js portfolio site:

Summary:
  ⚠ 1 warning
  ℹ 3 info
  ✔ 3 passed

One warning about missing width/height on some images, a few informational notes. Basically healthy.

A site with problems

Running it against a site that hadn't thought about SEO:

Summary:
  ⚠ 4 warnings
  ℹ 8 info

  ROBOTS — robots.txt not found
  SITEMAP — sitemap.xml not found
  METADATA — Canonical URL missing, og:image missing
  TWITTER — twitter:card missing
  SECURITY — HSTS missing, X-Content-Type-Options missing
  STRUCTURED-DATA — No JSON-LD found

That's 12 actionable findings from a single command. Each one with a clear explanation and fix.

CI integration — catch regressions before they ship

This is where it gets really useful. Add it to your GitHub Actions pipeline:

name: SEO Audit
on:
  push:
    branches: [main]

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: JosephDoUrden/vercel-seo-audit@v1
        with:
          url: https://your-site.com
          strict: true
          report: json

With --strict, any warning fails your build. Merge a PR that accidentally adds noindex to your homepage? CI catches it before it reaches production.

You can also use --diff to compare against a previous audit:

# Save today's report
vercel-seo-audit https://your-site.com --report json

# Next week, compare
vercel-seo-audit https://your-site.com --diff report.json

It'll tell you exactly which issues are new, which were resolved, and which are unchanged.

Config file for teams

Tired of typing the same flags every time? Create a .seoauditrc.json in your project root:

{
  "url": "https://your-site.com",
  "strict": true,
  "userAgent": "googlebot",
  "pages": ["/docs", "/pricing", "/about"],
  "report": "json",
  "timeout": 15000
}

Then just run vercel-seo-audit with no arguments. CLI flags always override the config, so individual devs can still customise their local runs.

How it works under the hood

If you're curious about the architecture, the tool uses a two-phase execution model:

Phase 1 runs robots.txt and redirect checks in parallel. These produce prerequisite data (like the robots.txt content and response headers) that other modules need.

Phase 2 runs everything else in parallel — sitemap, metadata, favicon, Next.js detection, structured data, i18n, images, and security headers. Each module gets an AuditContext with shared data from Phase 1.

Phase 3 (optional, with --crawl) fetches every URL from your sitemap and runs a subset of checks against each page, with configurable concurrency.

Every module uses Promise.allSettled(), so if one check fails (say, a timeout on sitemap.xml), the rest still complete. You always get results, even if your site is partially unreachable.

The whole thing is written in TypeScript, runs on Node.js 18+, and has zero runtime dependencies beyond chalk (colours), cheerio (HTML parsing), commander (CLI), and fast-xml-parser (sitemap parsing). No headless browser. No Puppeteer. Just HTTP requests and HTML analysis, which is why it finishes in 2-3 seconds.

The checks I wish someone had told me about

Let me highlight a few non-obvious checks that have saved me personally:

X-Robots-Tag header. You can have a perfectly clean HTML page with no noindex meta tag, but if your middleware or CDN is adding an X-Robots-Tag: noindex header, Google won't index it. This one is genuinely invisible unless you check response headers manually. The tool catches it automatically.

Sitemap/robots.txt cross-reference. Your sitemap.xml says URLs live at https://example.com/blog/... but your robots.txt Sitemap: directive points to https://www.example.com/sitemap.xml. Subtle, but it confuses crawlers.

hreflang reciprocal links. If page A says "my French version is page B", but page B doesn't say "my English version is page A", Google might ignore both hreflang declarations entirely. The tool checks for this.

Relative og:image URLs. Many social media crawlers don't resolve relative URLs. Your og:image of /images/preview.png might work in a browser but show a broken preview on Twitter. The tool flags this specifically.

Getting started

Install and run:

npx vercel-seo-audit https://your-site.com

Or install globally:

npm i -g vercel-seo-audit
vercel-seo-audit https://your-site.com --verbose

Useful flags to know:

# Audit as Googlebot
vercel-seo-audit https://your-site.com --user-agent googlebot

# Check specific pages for redirect issues
vercel-seo-audit https://your-site.com --pages /docs,/pricing,/about

# Full sitemap crawl (default: 50 pages)
vercel-seo-audit https://your-site.com --crawl

# JSON output for scripting
vercel-seo-audit https://your-site.com --json

The project is open source — github.com/JosephDoUrden/vercel-seo-audit. Contributions are welcome, and there are several issues tagged good first issue if you want to get involved.

If your Next.js site isn't showing up in Google, there's almost certainly a technical reason. Don't spend weeks guessing — run the audit and find out in seconds.

DEV Community