DEV Community

kavela
kavela

Posted on • Originally published at gdpindex.org

Building a 32-URL economy microsite on top of a 754,000-row SQLite dataset

Building a 32-URL economy microsite on top of a 754,000-row SQLite dataset

We launched GDPIndex, a reference site for the twenty G20 economies. The numeric data sits in a 152 MB SQLite file (216 countries × 102 indicators × 754,809 rows of annual country-data). The interesting part of the build is not the dataset itself but how we exposed only a tiny, focused slice of it — and why we resisted the urge to publish thousands of pages.

This post is about the WordPress side of that decision: how the site routes 32 URLs against a half-million-row store without going near WP_Query, and how we split structured numbers from country-specific editorial narrative so each page has a different skeleton.

Why 32 URLs

The same SQLite already powers a much larger reference site — HistorySaid — with about 600 indexed URLs across 216 countries and 9 economic themes. GDPIndex is a deliberate subset:

  • 20 country pages at /{country}/economy/ (G20: 19 nations + EU)
  • 9 ranking pages at /rankings/{indicator}/ covering the GDP-family series
  • 3 static (about, methodology, privacy, terms)

That cap is the point. Google now treats large auto-generated reference sites with suspicion, and we already had the per-country data — what was missing was editorial reason for each page to exist. Twenty G20 economies, each with a structural story that has been the subject of decades of literature, is a defensible scope.

Virtual routing without WP_Query

WordPress's rewrite system is great for posts, taxonomies, and the bits of the URL space that map onto database records. It is terrible for serving 30 URLs whose content is computed at request time from data in a sibling table. Every workaround we have seen — custom post types, ACF-driven templates, rewrite endpoints — ends up either creating 30 dummy posts to "host" the URLs or fighting the parse_request lifecycle.

A cleaner approach: never let WordPress think about these URLs at all. Hook template_redirect at priority 0, parse $_SERVER['REQUEST_URI'] yourself, and short-circuit the response if the path matches one of your virtual routes.

class GI_Router {
    public function __construct(GI_DB $db, $engine) {
        add_filter('wp_sitemaps_enabled', '__return_false');
        add_action('parse_request', [$this, 'early_intercept'], 0);
        add_action('template_redirect', [$this, 'handle_request'], 0);
    }

    public function handle_request(): void {
        $path = trim(strtok($_SERVER['REQUEST_URI'], '?'), '/');
        if ($path === '') return;
        $segs = array_values(array_filter(explode('/', $path)));
        $route = $this->resolve($segs);
        if (!$route) return;

        global $hs_route;
        $hs_route = $route;

        status_header(200);
        global $wp_query;
        $wp_query->is_404 = false;
        $wp_query->is_page = true;

        get_header();
        include get_stylesheet_directory() . '/templates/' . $route['template'] . '.php';
        get_footer();
        exit;
    }
    // ...
}
Enter fullscreen mode Exit fullscreen mode

The resolver is a single switch on path shape:

private function resolve(array $segs): ?array {
    $g20 = gi_g20_map();          // ['turkey' => 'turkiye', ...]
    $gdp_inds = gi_gdp_indicators();

    // /{country}/economy
    if (count($segs) === 2 && $segs[1] === 'economy' && isset($g20[$segs[0]])) {
        return [
            'type' => 'country_economy',
            'template' => 'country-economy',
            'url_slug' => $segs[0],
            'db_slug'  => $g20[$segs[0]],
            // ...
        ];
    }

    // Bare /{country}/ → 301 to /economy
    if (count($segs) === 1 && isset($g20[$segs[0]])) {
        return ['redirect' => '/' . $segs[0] . '/economy/'];
    }

    // /rankings/{indicator}
    if (count($segs) === 2 && $segs[0] === 'rankings' && in_array($segs[1], $gdp_inds, true)) {
        // ...
    }

    return null; // not our URL — let WP handle it
}
Enter fullscreen mode Exit fullscreen mode

Returning null is critical: anything we don't claim falls through to WordPress, so /wp-admin/, /wp-json/, the favicon, and robots.txt all keep working.

We also intercept sitemap.xml, robots.txt, and llms.txt on parse_request (earlier than template_redirect), because WordPress 5.5+ ships a core sitemap at /wp-sitemap.xml and will 301 you there before your template hook runs. wp_sitemaps_enabledfalse disables that core handler.

Two slugs per country

The dataset and the URL space don't agree on slugs. SQLite has russian-federation, turkiye, korea, european-union. The URLs we want are russia, turkey, south-korea, eu. The router holds a single canonical map:

function gi_g20_map(): array {
    return [
        'argentina'      => 'argentina',
        'australia'      => 'australia',
        // ...
        'russia'         => 'russian-federation',
        'south-korea'    => 'korea',
        'turkey'         => 'turkiye',
        'united-kingdom' => 'united-kingdom',
        'united-states'  => 'united-states',
    ];
}
Enter fullscreen mode Exit fullscreen mode

Every layer that touches a country goes through this map: the router resolves URL → DB slug, the data layer queries by DB slug, the page renderer flips back to the URL slug when emitting internal links, and the JSON-LD BreadcrumbList uses URL slugs end-to-end. Putting the map in one function and refusing to allow ad-hoc string substitution anywhere else is what keeps the site consistent.

Two custom tables: structure plus narrative

The page template is shared across all 20 countries (hero + KPI strip + chart-bearing sections + FAQ + peer-rank table), but the content of each section is different per country. The "middle-income trap" section appears on Türkiye, China and Brazil; it does not appear on Germany. So we cannot store sections as a fixed schema — we store them as JSON.

CREATE TABLE wp_gi_country (
    slug VARCHAR(64) NOT NULL PRIMARY KEY,
    name VARCHAR(128) NOT NULL,
    iso3 CHAR(3),
    headline_question TEXT,
    intro_html LONGTEXT,
    sections_json LONGTEXT,   -- [{heading, body_html, charts:[...], tables:[...]}]
    faq_json TEXT,            -- [{q, a}]
    word_count INT DEFAULT 0,
    generated_at DATETIME,
    locked TINYINT(1) DEFAULT 0
);

CREATE TABLE wp_gi_ranking (
    slug VARCHAR(64) NOT NULL PRIMARY KEY,
    name VARCHAR(128) NOT NULL,
    analysis_html LONGTEXT,
    word_count INT DEFAULT 0,
    generated_at DATETIME
);
Enter fullscreen mode Exit fullscreen mode

A single PHP page template iterates sections_json and renders each section with its own chart and table specs. Chart specs reference the indicator slug (gdp, gdp-per-capita, gdp-growth, …); the template loads the time-series from SQLite at render time and ships a Chart.js config to the browser as a data-config attribute.

This separation also matters for editorial workflow: numbers come from SQLite (refreshed nightly from the source feed), and narrative lives in MySQL alongside the rest of WordPress. We can re-render the narrative without rebuilding the data, and we can refresh the data without touching the narrative.

The page-level "linker matrix"

To make 20 country pages a useful cluster instead of 20 dead ends, the template emits the same internal-link block on every page, but with the current country removed and a peer-selection step on top:

$peers = array_diff(array_keys(gi_g20_map()), [$url_slug]);
shuffle($peers);
$sample = array_slice($peers, 0, 5);
Enter fullscreen mode Exit fullscreen mode

Plus a static link to /rankings/gdp/ from every country page, and a <th>Country</th> cell on every ranking row that links back to /{country}/economy/. Every node in the 29-page graph (20 country + 9 ranking) has at least eight outbound internal links, and every node has at least nine inbound links from peer pages.

Sitemap and JSON-LD discipline

With 32 URLs and a small fixed shape, the sitemap is a 75-line PHP function:

public function urls(): array {
    $urls = [
        ['loc' => home_url('/'), 'changefreq' => 'weekly', 'priority' => '1.0'],
        ['loc' => home_url('/about/'), 'changefreq' => 'yearly', 'priority' => '0.4'],
        // ...
    ];
    foreach (array_keys(gi_g20_map()) as $slug) {
        $urls[] = ['loc' => home_url('/' . $slug . '/economy/'),
                   'changefreq' => 'monthly', 'priority' => '0.9'];
    }
    foreach (gi_gdp_indicators() as $ind) {
        $urls[] = ['loc' => home_url('/rankings/' . $ind . '/'),
                   'changefreq' => 'monthly', 'priority' => '0.7'];
    }
    return $urls;
}
Enter fullscreen mode Exit fullscreen mode

Each country page emits an @graph JSON-LD payload with five nodes: Article, Country, one StatisticalDataset per data table cited, FAQPage, and BreadcrumbList. Ranking pages add ItemList whose itemListElement[].url points back at the country profile. Inspecting any page with Google's Rich Results Test surfaces all five entity types correctly.

What we kept out

A few things we explicitly did not build because they would have undermined the focus:

  • No author pages or post archives. This is a reference site; it has no blog.
  • No tag taxonomy. Each country has a fixed analytical angle; adding a "tags" axis would dilute the structure.
  • No automatic monthly republish. Data refreshes nightly, but the narrative is only regenerated when an inflection year appears in the underlying series or when a major policy event materially changes the structural picture.
  • No comment system. It is a citation surface, not a forum.

What's published

Live at gdpindex.org. Twenty country profiles, nine ranking pages, three static pages. Total page count: 32. Average word count per country profile: ~4,500.

The same pattern (custom router → custom tables → bespoke per-instance schema) generalises to any focused reference site sitting on top of structured data. If you find yourself fighting WP_Query to expose computed content, take five hours and write the router instead. WordPress is happy to step out of the way once you ask it firmly.


Originally published at gdpindex.org.

Top comments (0)