Ted

Posted on May 26 • Originally published at tedagentic.com

A Vercel Catch-All Rewrite Caused 190 Pages to Canonicalize to the Homepage

#seo #vercel #react #webdev

I run a React/Vite SPA deployed on Vercel. The site had been live for months. GSC was showing 190+ pages in the "Discovered — currently not indexed" bucket. Not penalised, not crawled and rejected — just never indexed.

The cause turned out to be one line in vercel.json.

How a catch-all rewrite breaks indexing

Vercel needs to know what to serve when someone hits a client-side route like /city/denver directly. Since there's no dist/city/denver/index.html, the default behavior is to rewrite all unmatched paths to dist/index.html — the homepage shell.

{
  "rewrites": [{ "source": "/(.*)", "destination": "/index.html" }]
}

The homepage shell has a canonical tag:

<link rel="canonical" href="https://example.com/" />

So when Googlebot hits /city/denver, it receives the homepage HTML and reads canonical: https://example.com/. On a low-trust domain, Google appeared to deprioritize further crawling of those routes — treating them as duplicates of the homepage rather than returning to index them independently.

┌──────────────────────────────────────────────────────────┐
│              WHAT GOOGLEBOT SAW                          │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  GET /city/denver                                        │
│       │                                                  │
│       ▼                                                  │
│  Vercel catch-all: serves dist/index.html               │
│       │                                                  │
│       ▼                                                  │
│  <link rel="canonical" href="https://example.com/" />   │
│       │                                                  │
│       ▼                                                  │
│  Google: signals duplicate of homepage                  │
│          deprioritizes further crawling                  │
│                                                          │
│  Result: 190 pages sitting in "Discovered, not indexed" │
└──────────────────────────────────────────────────────────┘

The fix is to generate a real dist/[path]/index.html for every route before Vercel deploys. That way the catch-all never fires for known routes — Vercel serves the real file.

The prerender system

The build command:

"build": "vite build && node scripts/prerender.mjs"

prerender.mjs runs after Vite. It walks STATIC_ROUTES — a flat array of every known path — and writes a proper dist/[path]/index.html for each one. Each file gets:

A correct <title> for the route
<meta name="description"> with real content
<link rel="canonical"> pointing to the actual URL
Injected body content inside #root so Googlebot sees real text without executing JS

For detail pages, the script fetches live data from the database at build time and writes real names, ratings, addresses, and descriptions into the HTML.

If a path is in STATIC_ROUTES, Vercel finds the pre-built file and serves it directly. The catch-all only fires for paths that genuinely don't exist.

The 4-commit audit

Once I confirmed the mechanism, I audited every route type and ran four commits in sequence.

Commit 1 — Blog slugs

Seven blog posts were missing from STATIC_ROUTES. They had been published directly without updating the prerender list. Every one of them was being served as the homepage.

Added the slugs, rewrote thin bodies on three entries using GSC impression data — queries with 60–70 impressions and zero clicks, showing Google was finding the topic but landing on content that couldn't hold it.

Commit 2 — Crawl budget

The site had out-of-state listing pages — pages for locations outside the primary geography with no real search demand. They weren't indexed but they were being crawled on every Googlebot pass.

Added noindex: true flag support to the HTML generator and set seven of these pages to noindex. Googlebot stops spending crawl budget on them.

function generateHtml({ path, title, description, canonical, body, noindex = false }) {
  const robotsMeta = noindex
    ? '<meta name="robots" content="noindex, nofollow" />'
    : '<meta name="robots" content="index, follow" />';
  // ...
}

Eight additional ghost-town pages were also noindexed — locations with no tourism infrastructure, no search volume, and no database data to populate them with.

Commit 3 — City pages

29 city-level pages existed in STATIC_ROUTES but had generic one-sentence bodies:

Browse listings in Denver.

Replaced each with unique content — elevation, specific listing names, local context, tourism details. Not template-swapped: each city got its own paragraph based on what made it distinct.

┌─────────────────────────────────────────────────────────┐
│                BEFORE / AFTER                            │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  BEFORE                                                  │
│  "Browse listings in Denver."                           │
│                                                          │
│  AFTER                                                   │
│  "Denver sits at 5,280 feet. [Listing A] in RiNo and   │
│  [Listing B] near Capitol Hill are highest-rated.       │
│  Most visitors from sea level report altitude effects   │
│  within the first day — start low."                     │
│                                                          │
└─────────────────────────────────────────────────────────┘

Commit 4 — Detail page enrichment

Individual listing pages were getting auto-generated one-liners. The prerender already had the database connection — it just wasn't using it for detail pages.

Added two enrichment loops that pull real data per slug:

for (const [slug, data] of Object.entries(dispensariesBySlug)) {
  const path = `/listing/${slug}`;
  routes[path] = {
    title: `${data.name} — Directory`,
    description: `${data.name} in ${data.city}. Rating: ${data.rating}/5.`,
    body: buildDetailHtml(data),
  };
}

Every matched detail page now has real name, rating, review count, type, address, and description injected at build time.

What got submitted to GSC

After the four commits deployed, I submitted blog URLs via GSC's URL Inspection tool and queued the rest for the following day after hitting the daily limit. City and detail pages surface through the sitemap on the next crawl cycle.

┌─────────────────────────────────────────────────────────┐
│                  ROUTE STATUS AFTER AUDIT                │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Blog pages              →  correct canonical + body    │
│  Out-of-state listings   →  noindexed                   │
│  Ghost town pages        →  noindexed                   │
│  City pages (29)         →  unique content injected     │
│  Detail pages            →  real DB data injected       │
│                                                          │
│  Catch-all fires only for paths that don't exist        │
└─────────────────────────────────────────────────────────┘

The check that catches it

After any prerender change, before pushing:

npm run build
curl -s http://localhost:4173/your-route | grep -i 'canonical'

If you see canonical: / on a page that isn't the homepage, the catch-all is winning. A route is missing from STATIC_ROUTES or the prerender write failed silently.

The other check is in Vercel build logs — the prerender should log a row count for every data fetch. If it logs zero or logs nothing, the database connection failed and detail pages are running on fallback content. Treat zero as a failure, not a warning.

Why this is easy to miss

The system behaved correctly at every layer. Vite built the bundle. The prerender ran. Vercel deployed green. The site worked perfectly in a browser — React hydrated immediately and the correct content appeared.

Only a raw HTTP request exposed the problem:

curl -s "https://example.com/city/denver" | grep canonical
# <link rel="canonical" href="https://example.com/" />

Googlebot doesn't execute JavaScript. It reads what the server sends. Everything the human-facing monitoring stack measured was post-hydration. The failure existed entirely in the gap between what the server sent and what the browser rendered — a gap that only matters to crawlers.

DEV Community