DEV Community

kavela
kavela

Posted on • Originally published at beverageindex.com

Why Every Internal Link On Our Site Was a 404 (And the Smart-301 That Fixed It)

BeverageIndex.com is a programmatic SEO site indexing drink prices across 98 cities and 30 drinks. 3,728 indexable URLs. We launched it in March, submitted the sitemap to Google Search Console, and watched indexing flatline at zero for two full months.

When we finally crawled our own site with a friendly bot, the cause was embarrassingly simple: every drink and city link in our internal navigation pointed to a URL that returned 404.

The setup

bi-engine is a virtual-routing WordPress plugin (no wp_posts) backed by these tables:

Table Rows What it holds
wp_bi_drinks 30 id=slug; category enum: coffee/beer/wine/spirits/tea/juice/specialty/soft_drinks
wp_bi_cities 98 id=slug, country, currency, lat/lng
wp_bi_prices 2,940 drink × city, USD-normalized
wp_bi_routes 3,727 pre-computed /{drink}/{city}/ paths

The theme is unusual for a pSEO site: a single SPA terminal template (template-terminal.php) hydrates from a JSON payload + /wp-json/bi/v1/enriched. Country hubs, city hubs, drink hubs, category hubs, and city × drink price grids all render through the same client-side template, distinguished by URL pattern.

The URL patterns

The canonical paths are namespaced:

  • /category/{slug}/ — coffee, beer, wine, etc.
  • /city/{slug}/ — amsterdam, tokyo, istanbul…
  • /drink/{slug}/ — americano, cappuccino, coca-cola…
  • /country/{slug}/ — by name slug or ISO2
  • /{drink}/{city}/ — 3,727 price grids (e.g. /americano/amsterdam/)

The last pattern is the workhorse. It's also where the sitemap gets most of its URLs.

The bug

Look at any internal hub page on the site. The HTML is full of links like:

<a href="/americano/">Americano</a>
<a href="/amsterdam/">Amsterdam</a>
<a href="/coffee/">Coffee</a>
Enter fullscreen mode Exit fullscreen mode

Not /drink/americano/. Not /city/amsterdam/. Slug-only.

And slug-only didn't have a route. Googlebot crawled the homepage, followed every drink link, and got 404. Crawled the city hub, followed every drink × city link → those worked because /{drink}/{city}/ was registered. But the hub anchors themselves were dead.

We had two reasonable choices:

  1. Rewrite all the internal links to use the canonical /category/, /drink/, /city/, /country/ prefixes. Hundreds of edits across the SPA template and JSON payload generators.
  2. Make slug-only URLs work, by 301-redirecting them to the canonical prefixed form.

We picked (2) because it's also good externally — third parties linking to beverageindex.com/cappuccino/ will resolve correctly forever, even if our internal nav drifts.

The smart-301

We added a single rewrite rule, bi_slug_only, plus a handler that does a 4-way disambiguation:

public function maybe_slug_redirect($slug) {
    // 1. Drink?
    if ($this->db->drink_exists($slug)) {
        wp_redirect("/drink/{$slug}/", 301); exit;
    }
    // 2. City?
    if ($this->db->city_exists($slug)) {
        wp_redirect("/city/{$slug}/", 301); exit;
    }
    // 3. Category? (URL hyphen ↔ DB underscore)
    $cat_db = str_replace('-', '_', $slug);
    if (in_array($cat_db, $this->categories, true)) {
        wp_redirect("/category/{$slug}/", 301); exit;
    }
    // 4. Country? (ISO2 or full-name slug)
    if ($country = $this->db->country_lookup($slug)) {
        wp_redirect("/country/{$country->slug}/", 301); exit;
    }
    // Unknown — explicit 404 with theme template
    status_header(404);
    include get_404_template();
    exit;
}
Enter fullscreen mode Exit fullscreen mode

The ordering matters: drinks shadow categories shadow countries. We had two collisions to handle — coffee and soft-drinks are both potential drink slugs and category slugs. They're categories, so we resolved by reordering the lookups (drink table first, but neither coffee nor soft_drinks is in wp_bi_drinks, so they fall through to the category check correctly).

The other 301 we didn't notice

While fixing this, we caught a second crawl-budget leak. The sitemap index bi-sitemap.xml and its five children (bi-sitemap-cities.xml, bi-sitemap-drinks.xml, bi-sitemap-categories.xml, bi-sitemap-countries.xml, bi-sitemap-prices-1.xml) were all 301-redirecting to a trailing-slash variant before serving. Every Googlebot fetch was burning two requests instead of one.

WordPress's redirect_canonical filter was the culprit. The fix:

add_filter('redirect_canonical', [$this, 'skip_canonical_for_sitemap'], 10, 2);

public function skip_canonical_for_sitemap($redirect_url, $requested_url) {
    if (strpos($requested_url, 'bi-sitemap') !== false) {
        return false;
    }
    return $redirect_url;
}
Enter fullscreen mode Exit fullscreen mode

Now all six sitemap files return 200 directly.

IndexNow: not all engines treat all domains equally

We push fresh slugs to IndexNow on publish. beverageindex's IndexNow key (0047e09268ff…ccb) is happily accepted by both Yandex (HTTP 202) and api.indexnow.org / Bing (HTTP 200).

Meanwhile, our sister site assetvs.com gets 403 UserForbiddedToAccessSite from api.indexnow.org, while Yandex still accepts. Same key format, same payload, different domain reputation — assetvs probably tripped a Bing trust signal during its 2-month zero-indexing window. The lesson: if you operate a network, don't assume IndexNow status from one domain transfers to another. Test each.

Where indexing actually stands

Fix shipped. Sitemap re-submitted via the GSC Sitemaps API (PUT returns 204). 3,728 URLs are now linkable from anywhere on the site. Whether that converts into indexing in the next two weeks is the real test.

The broader lesson is one we've now repeated across multiple network properties: for a new pSEO domain, the technical indexing tax shows up as silent 4xx noise, not as a single dramatic failure. A trailing-slash 301 here, a 404'd internal link there, a missing FX table on another property. None of it is loud. All of it compounds.

Live site: beverageindex.com

Top comments (0)