GasPriceCheck

Posted on May 7 • Edited on May 14

Google deindexed half of my Next.js site. Here's the four-phase recovery.

#webdev #nextjs #seo #javascript

I run a side project: a gas price finder that's mostly programmatic content. Roughly 33,620 ZIP code pages plus a few hundred state and city pages, all built on Next.js 15 with ISR. It runs on Vercel, fronted by Cloudflare for DNS.

For about eight months it was indexing fine. Traffic was small but growing. Then on April 11 I checked Google Search Console and saw something I'd never seen at this scale: 87 URLs flagged as "not found (404)," 61 flagged as "soft 404," and a chunk of others sitting in "crawled, currently not indexed."

I'm a data analyst. This is my side project, not my day job. So I had a weekend, a coffee subscription, and the GSC export. Here's the four-phase recovery, what each phase actually fixed, and the things I'd do differently.

What the damage actually was

GSC's URL Inspection tool is the only way to figure out what Google thinks of any specific URL. The 87 hard 404s broke down into two groups when I sampled them:

Stale city slugs. URLs like /portland and /columbus that Google had picked up from somewhere (early sitemap drafts, third-party references, link typos in old social posts). My live routes use state-suffix disambiguation for city names that exist in multiple states (/portland-or and /portland-me for Portland, similar pattern for Columbus). Google still had the un-disambiguated slugs cached and was hitting 404s when it tried to refresh them.
Typo URLs in third-party referrers. A handful of blog posts I'd written had link typos to my own site. Google followed those, hit 404s, and now thought those URLs were canonical.

The 61 soft 404s were a different beast. These were real ZIP code pages that returned 200 OK but Google decided didn't have enough content to be "real" pages. Looking at the SSR output, I could see why: when a ZIP wasn't cached in Redis (which was most of the long tail), the page rendered a hero, a search box, and a thin "search results loading" placeholder. About 640 visible words in the SSR HTML. From Google's perspective: this is a glorified 404 wearing a 200 costume.

The "crawled, currently not indexed" pile was the soft-404 pile's pre-stage. Google had crawled them, decided they weren't worth indexing, but hadn't formally classified them as soft 404s yet.

That's the diagnosis. Now the fixes.

Phase 1: The boring redirect fix

This phase is unglamorous and important. For each of the 87 hard 404s, I had to decide: does this URL have a clear successor, and if so, where do I redirect it?

I did this in next.config.mjs:

const STALE_CITY_REDIRECTS = [
  { source: '/portland', destination: '/portland-or', permanent: true },
  { source: '/midland', destination: '/blog/gas-prices-midland', permanent: true },
  // ...64 more
];

module.exports = {
  async redirects() {
    return STALE_CITY_REDIRECTS;
  },
};

permanent: true emits a 308. That tells Google "this URL has moved permanently, transfer signals to the destination." A 301 would also work; I went with 308 because Next.js's defaults align with that and I didn't want to fight the framework.

The unglamorous part: most of the 87 URLs didn't have an obvious destination. Some pointed to cities I no longer covered. Some pointed to ZIPs that didn't exist. For those, I redirected to the parent state page (e.g. an unrecognized Texas city slug pointing to /texas) so the user lands somewhere useful, and the link equity isn't dropped on the floor.

Total output of Phase 1: 66 redirects, deployed in one commit. Pushed it. Watched GSC for a week.

Result: about half the hard 404s validated. The other half stuck. Why?

Phase 1.5: The Cloudflare redirect chain I didn't know I had

When I inspected one of the still-failing URLs in GSC, the URL Inspection tool said "page with redirect" and showed a chain. Google was hitting http://example.com/portland, getting redirected to https://www.example.com/portland, and then redirected again to https://www.example.com/portland-or.

Two hops. Google does not like two hops.

I had Cloudflare in front of Vercel as a DNS proxy. Cloudflare was handling the http to https and apex to www redirect. Vercel was handling the slug-to-slug redirect. Each layer worked. Together they made a chain.

The fix was a single Cloudflare Redirect Rule that does both transforms in one hop:

If: hostname matches "example.com" AND scheme is http
Then: 308 to https://www.example.com (preserving the original path)

After that rule landed, the chain collapsed from 2 hops to 1. Google reprocessed and started clearing the rest of the hard 404s. Lesson I should have known: when you have multiple layers (CDN, edge, framework), each one defaulting to "I'll handle the redirect" stacks. Audit the hop count with curl -IL <url> and look for chains.

Phase 2: The real work (thin SSR content)

The 61 soft 404s couldn't be redirected away. These were real pages I wanted indexed. Google just thought they were thin.

The diagnosis was straightforward once I started reading my own SSR output. When a ZIP wasn't cached, the page rendered:

A hero with the ZIP, city, state
A search box
A "loading prices..." placeholder (waiting for client-side fetch)
A footer

That's it. Maybe 640 visible words, of which 400 were the footer and global nav. The actual page-specific content was a hero header and a placeholder.

The fix had three sub-components.

2a. Server-render the nearby ZIP grid

I had a helper called getNearbyZips(zip, radius) that returned ZIP codes within a given mile radius. I'd been using it on the client. I moved it to the server component so the SSR HTML included an actual grid of "nearby ZIP codes" with links.

This added about 80 words of unique-per-ZIP content (different neighbors for each ZIP). More importantly, it added 8-12 internal links per page, which gave Google more signal about the URL's place in the site graph.

// Before: client-side, invisible to Google
const nearby = useNearbyZips(zip);

// After: server-rendered, visible to Google
const nearby = await getNearbyZips(zip, 25);
return (
  <section>
    <h2>Nearby ZIP codes</h2>
    <ul>
      {nearby.map(zipCode => (
        <li key={zipCode}>
          <Link href={'/' + zipCode}>{zipCode}</Link>
        </li>
      ))}
    </ul>
  </section>
);

2b. Unconditional save tips

I added a hand-written "How to save on gas in [city]" section that rendered regardless of cache state. About 120 words of static-but-locally-relevant content per city. This is templated, but with enough variable interpolation that no two pages have identical text.

2c. State backlink with name fallback

Every ZIP page already had a "Back to [state] state guide" link, but it relied on a state abbreviation lookup that returned null for some edge cases. So those pages were rendering "Back to undefined state guide" or worse, no link at all. Fixed it with a fallback:

const stateName = getStateByAbbr(state) ?? getStateByName(state) ?? state;

Small fix, but it meant every ZIP page now had a working internal link to its parent state, which closes a major site-graph gap.

After Phase 2, my SSR word count went from 640 to about 890 on previously-thin pages. That's the threshold I cared about. Google's "soft 404" verdict is based on relative content depth, not an absolute word count, but more depth is always better than less.

Phase 3: The geocoding gap I didn't know I had

While I was at it, I noticed something weird. Some ZIP pages were rendering with lat/lng of (0, 0). This made the distance calculations on the page nonsensical ("nearest gas station: 8,247 miles away"). It also meant the "nearby ZIPs" grid was showing up empty for those pages.

The cause: my ZIP-to-lat/lng resolver had a single source: zippopotam.us. It's free, fast, and most of the time correct. But for some valid US ZIPs (75072 in McKinney TX, for one), it returns a 404.

I rebuilt the resolver as a 4-tier fallback chain:

zipContent.json (a static file with 33,620 ZIPs and pre-resolved coords)
Redis cache (per-request resolved coords)
zippopotam.us API
Nominatim API (slower but covers the gaps)
Placeholder (0, 0) with degraded behavior

I'll write up the chain in detail in a separate post. The point for this post: when GSC flagged these pages as soft 404s, the broken geocoding was part of the picture even though it wasn't the headline issue.

Phase 7B: The second sweep

Three days after deploying Phases 1 through 3, I ran the GSC validation again. About 80% of the URLs had cleared. Some hadn't. So I wrote a script (find-redirect-candidates.js) that programmatically tested every plausible city slug variant against the live site:

const variants = [
  '/' + city,
  '/' + city + '-' + state,
  '/' + city.replace(' ', '-'),
  '/cheap-gas-' + city,
  // ...
];

for (const v of variants) {
  const res = await fetch('https://example.com' + v);
  if (res.status === 404) console.log(v);
}

This caught 35 more 404s I hadn't found in the GSC export. Stale links in old blog posts I'd written, third-party links from a directory submission I'd forgotten about, typo URLs in my own social media posts. Each one got a redirect.

Phase 7B added 35 new redirects, bringing the total to 101. I deployed those, and the second GSC validation came back clean.

Phase 8: Per-state context (the depth fix that actually moved the needle)

After all the redirects and SSR enrichment, I still had some pages stuck in "crawled, currently not indexed." Word count was up. Internal links were up. But Google was still skeptical.

The thing I hadn't done: make the templated content actually different across pages. My "save tips" section was different by city, but my page-level content above the fold was nearly identical. A page about ZIPs in California and a page about ZIPs in Maine had no state-specific context.

I built a stateContext.ts module:

const STATE_CONTEXT = {
  CA: "California's gas prices are shaped by the state's unique CARB...",
  TX: "Texas typically has some of the lowest gas prices in the country...",
  // ...15 hand-written for top-traffic states
  // ...35 generated from a parameterized template for the rest
};

export function getStateContext(state: string): string {
  return STATE_CONTEXT[state] ?? defaultContext(state);
}

The 15 hand-written paragraphs are 80 to 120 words each. They explain the state's gas tax, refinery capacity, regulatory regime, and seasonal pricing patterns. These are the things you'd say to a friend if they asked "why are gas prices weird in California?"

The other 35 states get a templated paragraph with state-specific variables (avg price, neighboring states, gas tax rate). Templated, but with enough variation that each is unique.

After deploying Phase 8, the previously-stuck pages started getting indexed within 10 days. Not all of them. But enough that I stopped worrying about them.

Final SSR word count on a representative ZIP page when measured post-deploy: 810 visible words, up from 635 before Phase 8 alone. The whole journey took it from 640 to 810: a +170 word lift made of mostly per-state context plus the SSR enrichment.

The diagnostic that lied to me

One quick aside that became its own post. While diagnosing the soft 404s, I wrote a script to grep the SSR output for content markers ("does this page have an EIA average price rendered?"). The script reported zero matches across 14 URLs. I spent an hour debugging the data layer before realizing the script was broken.

The cause: React inserts HTML comments between adjacent text expressions during SSR. My regex was failing on the comment boundary. The data was rendering correctly the whole time. My detector was the bug.

I wrote that one up separately. The summary: strip HTML comments before any content matching on Next.js SSR output.

The numbers

Before:

87 hard 404s in GSC
61 soft 404s in GSC
~640 SSR words on cached-miss ZIP pages
1 source of truth for ZIP geocoding (and gaps)
2-hop redirect chain from http apex to https www

After:

0 hard 404s (101 redirects in next.config.mjs, 1 Cloudflare Redirect Rule)
0 soft 404s after Phase 8 (per latest GSC validation)
~810 SSR words on the same pages
4-tier geocoding chain with 100% coverage of 33,620 ZIPs
1-hop redirect

Total commits: 7. Total deploy waves: 3 (April 17, April 23, April 27). Total weekend hours: I stopped counting somewhere around 14.

Things I'd do differently

If I were starting this side project again:

Set up GSC URL Inspection alerts before launch. I had GSC connected but wasn't watching it daily. The 404 problem accumulated for weeks before I noticed.
Add a curl -IL check to my deploy script. A redirect chain check would have caught the http+https+slug double-hop before it became a Google problem.
Server-render the unique parts of every templated page from day one. Anything that varies per page should be in the SSR HTML, not the client bundle. Loading states are the enemy of programmatic SEO.
Hand-write the top 10-20% of templated content. The remaining 80% can be generated, but the long-tail-of-the-long-tail is where Google will smell templating and dock you. Hand-writing the highest-traffic variants is high leverage.
Keep a diff log of redirect rules. Mine grew to 101 entries in next.config.mjs and I'd already lost track of which ones I added when. A separate JSON file with timestamps would have been smarter.
Don't use real URLs from your own site in code examples or narrative text in technical articles. Markdown auto-links them into real backlinks, which means typos and placeholders in your article become 404 reports on your site three weeks later. Use example.com for everything except actual call-to-action links.

What I learned about Google's verdict mechanism

Three things I genuinely didn't know before this:

Soft 404 is sticky. Once Google decides a page is a soft 404, fixing the page doesn't immediately clear the verdict. You have to ask GSC to re-validate, wait 1-4 weeks for re-crawl, and accept that some of those URLs will not come back even after you fix them. The verdict has memory.

"Crawled, currently not indexed" is the bench. Google has a finite indexation budget per site. Pages they don't think are worth indexing go on the bench. You can move pages off the bench by improving them, but it's not automatic and it's not fast.

Internal redirect hops add up. Each hop is a small signal-loss for Google. If your CDN does one redirect and your framework does another, you're below capacity even if both are individually correct. Audit your hop count.

If you've been through a similar programmatic-SEO recovery, I'd love to hear what your phase breakdown looked like. Mine was four phases over two weekends. The longest phase by far wasn't the bulk redirects (that was a few hours). It was Phase 8: the part where I had to admit my templated content was actually pretty thin and rewrite the per-state context by hand.

Templating gets you to 33,620 pages fast. Earning the right to keep those pages indexed takes longer.

DEV Community