DEV Community

kavela
kavela

Posted on • Originally published at evrangeguide.com

Pruning a Sitemap From 327,621 URLs to 2,483 (And Why That Was the Right Call)

evrangeguide.com is a programmatic SEO site for EV charging routes — 27 countries, 370 cities, 24 vehicles, and over two thousand canonical inter-city routes. When we totaled the URL space across every page-type combination, we got 327,621 URLs. We listed every one of them in the sitemap.

Google's response was decisive: it indexed almost none of them, and the crawl-budget metrics in Search Console got worse week over week.

This is the story of cutting that sitemap by 99.2% — down to 2,483 URLs — and why scarcity beat coverage.

The architecture

ev-engine is the same family of virtual-routing plugin we use across the network. No wp_posts. The data lives in custom tables:

Table Rows Holds
wp_ev_routes 2,044 inter-city routes (from-to pairs)
wp_ev_cities 370 city metadata + lat/lng
wp_ev_countries 27 country hubs
wp_ev_vehicles 24 EV models with battery + range specs
wp_ev_route_stations ~2.36M charger stops along each route
wp_ev_route_vehicles 147K per-vehicle stop plans (route × vehicle)

From these, the router (EV_Router) projects URL patterns:

  • /route/{from}-to-{to}/ — 2,044 base routes
  • /route/{from}-to-{to}/{vehicle}/ — 2,044 × 24 = ~49K route-vehicle pages
  • /route/{from}-to-{to}/{tier}/ — long/mid/short-range tier views
  • /city/{slug}/, /country/{slug}/, /ev/{vehicle}/, /glamour/{slug}/
  • Index pages: /routes/, /cities/, /countries/, /vehicles/, /glamour/
  • Multilingual prefixes for tr/de/fr/es (UI-translated, body still EN)

Multiplied across language prefixes and tier slices, the total grew past 327K.

Why we listed all of them

The naive pSEO instinct: more URLs is more surface area. If a long-tail searcher types "Tesla Model 3 Berlin to Munich charging stops," we wanted a page tuned to that exact query. So we generated /route/berlin-to-munich/tesla-model-3/ and 49K of its siblings, then submitted them all.

Google's reality: for a new EV-niche domain with low brand authority, indexing budget is allocated as a function of trust × content uniqueness × discovery cost. We were maximizing the third variable while having close to zero of the first.

What the data showed

Two months in, GSC had:

  • ~327K URLs discovered but flagged "Crawled – currently not indexed" or "Discovered – currently not indexed"
  • A handful of routes indexed; most of the indexed pages were the small set of unique hubs (countries, vehicles)
  • Crawl rate stuck low because Google was sampling thousands of near-duplicate route-vehicle and tier slices

The route-vehicle pages were structurally near-duplicate: same route metadata, same map, same station list, with one paragraph of vehicle-specific stop math. The tier pages (long/mid/short-range) were even worse — they sliced the same route by an arbitrary battery-range threshold.

We had built thousands of pages that, from Google's perspective, were dilutions of one good page.

The prune

We kept what was structurally unique and dropped the rest from the sitemap:

Page type Kept in sitemap Behavior
Routes 2,044 indexable, in sitemap
Cities 370 indexable, in sitemap
Countries 27 indexable, in sitemap
Vehicles 24 indexable, in sitemap
Glamour pages 18 curated highlights, in sitemap
Route × tier 0 noindex, follow
Route × vehicle 0 noindex, follow
Multilingual (non-EN) 0 noindex (UI translation only, body EN)

The templates that emit noindex, follow (route-range.php, route-vehicle-weather.php) keep crawl flow going — internal links from these pages still pass discovery to the canonical route page — but the pages themselves do not compete for index slots. New sitemap total: 2,483 URLs.

The single-DB migration that made this possible

A quick aside, because it's part of the same story. Until early May, the EV data tables (wp_ev_*) lived on a separate Hostinger Business database, while the WP core tables lived on contabo. The plugin had a custom ev_data_db() function that returned a second wpdb connection.

This worked for a year, then quietly broke after a Hostinger network change: cross-host queries started timing out under crawler load. The fix:

  1. Whitelist contabo's IP on Hostinger Remote MySQL (92.113.22.128, IPv4-forced — their host is IPv4-only).
  2. Dump the EV tables (88MB gzipped) and import into wp_evrangeguide on contabo's local MariaDB.
  3. Patch ev_data_db() to return $wpdb directly. Single connection from then on.
public function ev_data_db() {
    global $wpdb;
    return $wpdb;
}
Enter fullscreen mode Exit fullscreen mode

While we were in there we caught one more bug: a related-routes query that was casting distance into a BIGINT UNSIGNED and underflowing on "distance less than threshold" comparisons. Fix:

WHERE CAST(r.distance_km AS SIGNED) < {threshold}
Enter fullscreen mode Exit fullscreen mode

Without that cast, MariaDB was returning empty result sets for any negative comparison (which is half of all related-route queries).

IndexNow + GSC API

We re-submit the pruned sitemap via the GSC Sitemaps API after each rebuild. The shared service account (nererdenizlenir@dekorfiyat.iam.gserviceaccount.com) is siteOwner on every property in our network, so a single key lets us PUT to:

https://www.googleapis.com/webmasters/v3/sites/{site}/sitemaps/{feedpath}
Enter fullscreen mode Exit fullscreen mode

Returns 204; Google re-fetches the sitemap within 24h. We push fresh URL changes via IndexNow (key at the webroot, batched to api.indexnow.org). Google's old /ping?sitemap= endpoint is fully deprecated since 2023, so the GSC API path is the only programmatic re-submit channel.

What we'd tell our March-self

For a new pSEO domain, the right shape of the sitemap is:

  1. Hubs first. Cities, countries, vehicles, categories — these are the pages with strongest internal-link equity and the easiest to make uniquely useful.
  2. One canonical layer per long-tail query. Routes go in; route-vehicle and route-tier slices stay live but noindex, follow. Internal navigation is the discovery path; the sitemap is the indexing intent.
  3. Cap multilingual to noindex until you have the translation budget for genuine localization. Body-language mismatch (English body under a de/ URL) is worse than no localization at all.
  4. Resubmit via the GSC API after every prune. It's a 204 response. Don't wait for Googlebot to discover the change.
  5. Watch "Discovered – currently not indexed" as a primary metric, not a footnote. If it's growing faster than "Indexed," you're polluting the surface area.

When you're sitting on 327K URLs and Google is indexing a few hundred, the fastest path to better indexing isn't more content. It's less surface area, sharper hubs, and patience.

Live site: evrangeguide.com

Top comments (0)