DEV Community

kavela
kavela

Posted on

Turning World Bank Data Into 50K+ Searchable Pages with WordPress

What if you could make decades of World Bank and IMF economic data actually accessible and browsable - not buried in spreadsheets and PDF reports that nobody reads?

That's what we built with historysaid.com: a programmatic SEO site that transforms raw international development data into 50,000+ structured, searchable pages. Every country, every indicator, every year - all queryable, all browsable, all indexed by Google.

This post covers the architectural thinking behind it and what we learned building it.

The Data Problem

The World Bank and IMF publish some of the richest economic datasets on the planet:

  • GDP, inflation, trade balances, debt levels for 200+ countries
  • Time series spanning 60+ years (some indicators go back to the 1960s)
  • Hundreds of unique economic indicators covering everything from agricultural output to internet penetration rates
  • Regular updates as new data gets published quarterly or annually

But the official portals are designed for researchers and economists who already know what they're looking for. You need to know specific indicator codes and use clunky query builders to extract data into spreadsheets.

There's no way to just... explore. To browse. To stumble upon interesting economic stories by clicking around.

We wanted to change that.

Why Not Just Build a Dashboard?

We considered building a single-page dashboard app with interactive charts and filters. But dashboards have a fundamental SEO problem: they're one URL. Google can't index the state of your filters. If someone searches "Turkey GDP growth history", a dashboard app won't rank because that specific view doesn't have its own URL.

Programmatic SEO solves this. Each unique combination of country + indicator gets its own page, its own URL, its own title, and its own meta description. Google can index all 50K of them.

We chose WordPress for the same reasons we used it for startup-cost.com (see our previous post): cheap hosting, familiar ecosystem, and a powerful rewrite engine that nobody uses to its full potential.

The Architecture - Overview

We built a custom WordPress plugin that handles everything from data ingestion to page rendering.

Data Pipeline

The data flows through several stages:

World Bank API --> Fetch & Parse --> Validate --> Normalize --> MySQL
IMF Data Portal --> Fetch & Parse --> Validate --> Normalize --> MySQL
                                                                  |
MySQL --> Virtual URL Routing --> Template Engine --> HTML Page
Enter fullscreen mode Exit fullscreen mode

Each data source has its own ingestion logic because the API formats differ significantly. The World Bank provides a well-documented REST API with JSON responses, while IMF data comes in a different structure. We wrote adapters that normalize both into a common internal format.

The pipeline runs on a scheduled basis. When new data is published by either source, our next run picks it up automatically and updates the relevant records.

Data Quality Challenges

Working with international economic data is messier than you'd expect:

  • Missing values everywhere - Some countries don't report certain indicators for certain years. We handle nulls gracefully rather than showing zeros.
  • Delayed reporting - Some nations publish data 2-3 years late. Our pages show the most recent available data and clearly indicate the time period.
  • Unit inconsistency - Some values are in current USD, some in constant USD, some in percentages. Each indicator carries its unit metadata.
  • Country code mismatches - The World Bank uses ISO 3166-1 alpha-3 codes, the IMF sometimes uses its own codes. Our normalization layer handles the mapping.

Database Design Principles

We use custom MySQL tables (not wp_posts) following the same pattern from our startup-cost.com engine. The key design decisions:

  • Proper normalization - Countries, indicators, and data points are separate tables with foreign key relationships
  • Appropriate data types - We use high-precision decimal types for economic values because the data ranges from tiny percentages to trillion-dollar GDP figures. Floating point would introduce precision errors that data-savvy users would notice.
  • Strategic indexing - Our most common query patterns (all data for a country+indicator, all countries for an indicator+year) each have compound indexes that resolve in single-digit milliseconds
  • Roughly 4 million data points in the main table, all queryable in under 10ms thanks to proper indexing

Page Types

Our routing creates four types of pages:

  1. Country pages - Overview of all available indicators for a country. Shows key stats, latest values, and links to explore each indicator in depth.
  2. Country + Indicator pages - The core of the site. Detailed time series data with charts, data tables, summary statistics, and trend analysis. This is the bulk of our 50K pages.
  3. Indicator pages - Global comparison view. Shows all countries ranked by a specific indicator, with the ability to see how they compare.
  4. Comparison pages - Side-by-side country comparisons for a given indicator. Perfect for searches like "Japan vs South Korea GDP."

Zero rows in wp_posts. Everything is computed from data tables on each request (with caching for popular pages).

Charts and Data Display

Each data page includes an interactive chart (using a lightweight client-side charting library) and a full data table. The chart data is embedded as JSON in the page - fast, cacheable, and SEO-friendly since the actual values are also present in the HTML table.

We also calculate and display summary statistics: latest value, historical min/max, average, and trend direction. These make each page genuinely informative rather than just a raw data dump.

SEO Strategy

Every page gets unique, data-driven SEO elements:

  • Dynamic titles that include the country name and indicator name
  • Meta descriptions that include actual data values ("Turkey's GDP was $X in 2024. Explore the full trend from 1960 to 2024...")
  • Schema.org Dataset markup so Google understands these are data pages with temporal and spatial coverage
  • Breadcrumb navigation for clear site hierarchy

Each meta description contains real numbers from the data, making every page genuinely unique in Google's eyes.

Caching

With 50K+ pages, not everything can be pre-cached. We use a tiered approach:

  • Popular combinations (major countries + major indicators) get pre-cached with longer TTLs
  • Medium-traffic pages are cached on demand
  • Long-tail pages have shorter TTLs and are generated fresh when needed

We also cache aggregated data (like country rankings and regional averages) at the query level since multiple pages reference the same aggregations.

Internal Linking

Strong internal linking is essential for a site this large. Without it, search engines would never discover most of the pages:

  • Each country page links to all available indicators for that country
  • Each indicator page links to top countries for that indicator
  • Breadcrumbs on every page create clear hierarchy
  • Related content suggestions based on geographic and thematic proximity
  • Comparison links suggest relevant country pairs

The internal link graph ensures that any page on the site is reachable within 3-4 clicks from the homepage.

Results After 12 Months

  • 50,000+ pages indexed in Google Search Console
  • Average TTFB: ~150ms on shared hosting
  • Database queries consistently under 10ms
  • Zero editorial work - the site runs itself, updated automatically from source data
  • Growing organic traffic from long-tail searches like "Nigeria inflation rate 2015" or "Vietnam GDP per capita history"

The Reusable Pattern

This is the same architectural pattern we use across multiple sites at Kavela:

  1. Find an interesting, structured dataset
  2. Design custom tables optimized for the specific data model
  3. Build a data pipeline that keeps the database fresh
  4. Use WordPress virtual routing to create SEO-friendly URLs
  5. Render pages dynamically from the data
  6. Generate chunked sitemaps and build strong internal links

The key is making sure every page offers genuine value - real data, real calculations, real insights. Template spam with swapped-out city names won't work. Search engines are smart enough to detect that. But if every page genuinely answers a different question with different data, you've built something valuable.

Explore it yourself: historysaid.com


Built by Kavela Ltd - turning data into discoverable web experiences.

Top comments (0)