"Do countries with higher GDP per capita also have longer life expectancy?" I built a tool that lets you explore questions like that across 48 countries by picking any two of five metrics as scatter-plot axes. Two implementation hinges: (1) metrics that span orders of magnitude (population: Singapore 5.6M to India 1,417M, a 250× range) must be plotted and correlated on a log scale or every point collapses to one corner, and (2) a hand-rolled Pearson correlation coefficient recomputed live as you change axes. Vanilla JS, no chart library, with 34 Node tests on the computation layer.
🌐 Demo: https://sen.ltd/portfolio/global-stats/
📦 GitHub: https://github.com/sen-ltd/global-stats
The data model
48 countries × 5 metrics (population, GDP per capita, life expectancy, CO2 per capita, area):
{ name: "Japan", code: "JP", region: "アジア",
population: 125.1, gdpPerCapita: 33800, lifeExpectancy: 84.5,
co2PerCapita: 8.5, area: 378 },
Metric definitions live in a separate table with a log flag:
export const METRICS = [
{ key: "population", label: "...", log: true },
{ key: "gdpPerCapita", label: "...", log: true },
{ key: "lifeExpectancy", label: "...", log: false },
{ key: "co2PerCapita", label: "...", log: true },
{ key: "area", label: "...", log: true },
];
Only life expectancy is log: false. That distinction does real work.
Why log scale is non-negotiable
Plot "population vs GDP" on linear axes and it's a disaster. Population spans 250× (Singapore to India); GDP per capita spans 100× (Ethiopia $1,030 to Norway $106,150). On linear axes:
- nearly every point collapses into the bottom-left corner
- China and India alone stick to the right edge
- the correlation coefficient gets dragged around by the big outliers
The fix is log transformation — equal spacing per order of magnitude, so countries of wildly different size share one viewport. Linear metrics like life expectancy (52–85 years, a mere 1.6× range) stay linear.
export function normalize(value, metric, domainMin, domainMax) {
if (metric.log) {
const lv = Math.log10(value);
const lmin = Math.log10(domainMin);
const lmax = Math.log10(domainMax);
if (lmax === lmin) return 0.5;
return (lv - lmin) / (lmax - lmin);
}
if (domainMax === domainMin) return 0.5;
return (value - domainMin) / (domainMax - domainMin);
}
Tested by asserting the geometric midpoint maps to center:
test("log: geometric midpoint → 0.5", () => {
const m = getMetric("gdpPerCapita");
// domain 1000..100000, geometric mean = 10000 → 0.5
assert.ok(Math.abs(normalize(10000, m, 1000, 100000) - 0.5) < 1e-9);
});
Linear would put 50500 at the center; log puts the geometric mean 10000 there. That difference is what "thinking in orders of magnitude" means.
Hand-rolled Pearson correlation
Pick two axes, get a coefficient r. Straight from the definition:
export function pearson(xs, ys) {
const n = xs.length;
if (n < 2 || ys.length !== n) return null;
const meanX = xs.reduce((a, b) => a + b, 0) / n;
const meanY = ys.reduce((a, b) => a + b, 0) / n;
let num = 0, denX = 0, denY = 0;
for (let i = 0; i < n; i++) {
const dx = xs[i] - meanX, dy = ys[i] - meanY;
num += dx * dy; denX += dx * dx; denY += dy * dy;
}
const den = Math.sqrt(denX * denY);
if (den === 0) return null; // zero variance → undefined
return num / den;
}
Returning null for zero variance matters: 0/0 = NaN would corrupt axis labels downstream. Handle undefined explicitly.
test("perfect positive correlation = 1", () => {
assert.ok(Math.abs(pearson([1, 2, 3], [2, 4, 6]) - 1) < 1e-9);
});
test("no correlation ≈ 0", () => {
// a symmetric V has zero LINEAR correlation
assert.ok(Math.abs(pearson([-2, -1, 0, 1, 2], [4, 1, 0, 1, 4])) < 1e-9);
});
test("zero variance → null", () => {
assert.equal(pearson([5, 5, 5], [1, 2, 3]), null);
});
The V-shape test is the important one: zero correlation means zero linear correlation, not "no relationship." A perfect parabola has Pearson r = 0. The test documents that limitation.
Correlate in log space too
The key insight: if you display on a log scale, you must correlate on log-transformed values to match. Power-law relationships (y = ax^b) become straight lines in log-log space (log y = b·log x + log a), so Pearson on the logs captures the true strength:
export function metricCorrelation(keyX, keyY, pool) {
const mx = getMetric(keyX), my = getMetric(keyY);
const xs = [], ys = [];
for (const c of pool) {
let x = c[keyX], y = c[keyY];
if (mx.log) x = Math.log10(x); // power-law → linear
if (my.log) y = Math.log10(y);
xs.push(x); ys.push(y);
}
return pearson(xs, ys);
}
test("GDP vs life expectancy is a strong positive correlation", () => {
assert.ok(metricCorrelation("gdpPerCapita", "lifeExpectancy") > 0.5);
});
The actual value comes out at r ≈ 0.84 — the famous Preston curve (income vs longevity) reproduced from the data. GDP is log, life expectancy is linear, so it's a semi-log correlation, which matches the economics finding that life expectancy scales with the logarithm of income.
Y-axis inversion
SVG's origin is top-left, so "bigger value = higher up" needs a Y flip:
return pool.map((c) => ({
country: c,
cx: normalize(c[keyX], mx, dx.min, dx.max),
cy: 1 - normalize(c[keyY], my, dy.min, dy.max), // invert
}));
Guarded by a test:
test("y is inverted: highest life-expectancy country has smallest cy", () => {
const top = pts.reduce((a, b) => (b.country.lifeExpectancy > a.country.lifeExpectancy ? b : a));
for (const p of pts) {
if (p.country.code !== top.country.code) assert.ok(p.cy >= top.cy - 1e-9);
}
});
Data integrity tests
Hardcoded public data deserves integrity checks — and for a log-scale tool, "all metrics positive" is a precondition, not a nicety (log10(0) = -∞, log10(negative) = NaN):
test("no duplicate ISO codes", () => { /* ... */ });
test("every metric field is present and positive", () => {
for (const c of COUNTRIES)
for (const m of METRICS)
assert.ok(typeof c[m.key] === "number" && c[m.key] > 0);
});
test("life expectancy in a sane range (40-90)", () => { /* ... */ });
Architecture
data.js ← 48 countries × 5 metrics (World Bank / UN / OWID ~2022)
core.js ← pearson, normalize (log-aware), scatter scaling, region aggregation (DOM-free, 34 tests)
app.js ← SVG scatter + sortable table
Try it
Set the axes to "CO2 vs GDP" for a clear positive correlation (richer = more emissions). Set "population vs life expectancy" for near-zero (big and small countries live equally long). Colors encode region.
Takeaways
-
Order-of-magnitude metrics (population, GDP, area) need log scales or points collapse. Linear metrics (life expectancy) stay linear. A per-metric
logflag toggles both. -
Pearson is implementable from the definition. Return
nullfor zero variance — don't leak NaN into the view. - Display on log → correlate on log. Power laws straighten out in log-log space.
- Pearson measures linear correlation only. A V-shape test documents that.
- With log scales, "all values positive" is a precondition, not a check. Test it.
- GDP vs life expectancy gives r ≈ 0.84 — the Preston curve, straight from the data.
This is OSS portfolio #262 from SEN LLC (Tokyo). https://sen.ltd/portfolio/

Top comments (0)