SEN LLC

Posted on May 8

Animating Japan's Population Pyramid From 1950 to 2070 — Why You Need Cohort-Specific Mortality to Get the 1950 Triangle Right

#javascript #dataviz #frontend #demographics

"Have you ever seen Japan's ageing?" — I knew the data, but I'd never watched the shape of the country's population pyramid actually deform in front of me. So I built a 250-line page that does it: year slider plus auto-play, 1950 to 2070, with the first baby-boom cohort visibly rising from the bottom of the chart all the way to the top.

🌐 Demo: https://sen.ltd/portfolio/jp-population-pyramid/
📦 GitHub: https://github.com/sen-ltd/jp-population-pyramid

Hit the ▶ button and watch the chart morph at 120 ms per year. The first baby-boomers (born 1947-49) start as the broad base of a true triangle in 1950, become a fat bulge that walks up the chart through the decades, hit the very top of the projection in the 2030s, and disappear off the top by 2050. The two-bulge shape of 2020 turns into the inverted lopsided shape of 2070. Median age goes from 20 to 56, share aged 65+ from 7% to 38%.

Why this is harder than it looks

Drawing a single population pyramid in Plotly or D3 is a 30-line job. The interesting questions are upstream:

Where do the numbers come from, and how do you cover both 70 years of historical data and 50 years of projection in one consistent dataset?
How do you keep the shape evolving smoothly as a user drags the slider, when the source data is naturally 5- or 10-year snapshots?

Both of those have non-obvious answers, and one of them I got wrong on the first pass.

The data: emit it from a function, then nail the totals

The dataset that ships in data.json is generated by generate-data.py — a small Python script that exposes the demographic model as four functions:

annual_births(year) — single-year births in thousands, 1850 → 2070. Two Gaussian bumps for the first and second baby booms, plus a piecewise-linear trend that drops through the late 20th century.
survival(age, sex, birth_year) — share of a cohort still alive at the given exact age. More on this below.
cohort_size(birth_year, age, sex) — multiply the two together with a sex-split factor at birth.
bin_population(year, bin_idx, sex) — sum five single-age cohorts to make a 5-year bin.

Once those are in place, building a snapshot for any calendar year is just calling bin_population 21 times for each sex.

The last thing the script does is calibrate:

def snapshot(year):
    raw_male   = [bin_population(year, i, "M") for i in range(N_BINS)]
    raw_female = [bin_population(year, i, "F") for i in range(N_BINS)]
    raw_total  = sum(raw_male) + sum(raw_female)
    target     = TARGET_TOTALS[year]   # IPSS / UN WPP value
    scale      = target / raw_total
    male   = [round(m * scale) for m in raw_male]
    female = [round(f * scale) for f in raw_female]
    return {"year": year, "male": male, "female": female}

Even with carefully tuned birth and survival functions, the unscaled total comes out 92-105% of the actual figure. Rather than fight the model into perfect alignment, I let it generate the shape and then scale each year's totals to the published numbers (1950 = 83.2 M, 2020 = 126.3 M, 2070 = 87.0 M, etc.). The shape lives in the model, the size is anchored to reality.

Where I got it wrong: 1950 wasn't a pyramid

The first version of survival() took only two arguments — age and sex — and used a stylized contemporary Japanese life table:

def survival(age, sex):
    if sex == "F":
        return math.exp(-((age / 90) ** 7))
    return math.exp(-((age / 85) ** 7))

Plug this into the 1950 generator. Anyone in the chart who's age 75 in 1950 was born in 1875, and the function predicts that 42% of their cohort is still alive: exp(-(75/85)^7) ≈ 0.42. That's roughly correct for someone born in 1945, but for someone born in 1875 it's wildly optimistic. Tuberculosis, infant mortality, two wars — real cohort-survival to age 75 for 1875-born Japanese was something like 5-10%.

The result is a 1950 pyramid that looks like a slightly-tapered trapezoid, with a median age of 36.9 and an aging ratio of 18.6%. Both numbers are way off from reality (median age in 1950 Japan was about 22; aging ratio about 5%). The shape isn't a pyramid at all.

The fix is to make survival take birth_year as a parameter, and attenuate the modern curve for older cohorts:

def cohort_factor(birth_year):
    if birth_year >= 1945: return 1.0
    if birth_year >= 1925: return 0.55 + 0.45 * (birth_year - 1925) / 20
    if birth_year >= 1890: return 0.25 + 0.30 * (birth_year - 1890) / 35
    return 0.20

def survival(age, sex, birth_year):
    base = math.exp(-((age / 90 if sex == "F" else 85) ** 7))
    return base * cohort_factor(birth_year)

Now the 1875-born cohort's survival to 75 drops to 0.42 × 0.20 = 8.4%, which is in the right neighbourhood. Median age in 1950 is 20.4, aging ratio 7.2%. The pyramid actually looks like a pyramid.

The lesson here is one any demographer would tell you immediately: mortality is cohort-specific, not just age-specific. If your survival function takes only (age, sex), you're applying today's life table to 19th-century births. The correction is one extra parameter and a piecewise-linear adjustment factor.

Linear interpolation between snapshots

The dataset only carries 13 snapshots — 1950, 1960, ..., 2070. When the user drags the slider to 2005, we blend the 2000 and 2010 snapshots 50/50:

export function interpolateSnapshots(a, b, year) {
  const t = (year - a.year) / (b.year - a.year);
  return {
    year,
    male:   interpolateArrays(a.male,   b.male,   t),
    female: interpolateArrays(a.female, b.female, t),
  };
}

export function getSnapshot(snapshots, year) {
  if (year <= snapshots[0].year)        return clone(snapshots[0]);
  if (year >= snapshots.at(-1).year)    return clone(snapshots.at(-1));
  for (let i = 0; i < snapshots.length - 1; i++) {
    const a = snapshots[i], b = snapshots[i + 1];
    if (year >= a.year && year <= b.year) return interpolateSnapshots(a, b, year);
  }
}

Linear blending of bin counts is not what real demographic dynamics do — cohorts move discretely between bins as they age, with deaths and births perturbing the totals. But for a dragable visualization at one-year resolution, the lie is small enough not to matter; nothing on screen jumps. If you wanted to show actual cohort flow you'd need a continuous-time integrator, which is a different tool with a different scope.

SVG diverging bars with one global scale

The chart is a vertical stack of 21 horizontal bars, with a centre column for the age labels. Males extend leftward from the centre, females rightward.

function renderBars(snapshot, globalMax) {
  const halfPlotW = (VIEW_W - PAD_LEFT - PAD_RIGHT - CENTER_GAP) / 2;
  const cx = VIEW_W / 2;
  for (let i = 0; i < snapshot.male.length; i++) {
    const m = snapshot.male[i], f = snapshot.female[i];
    const mw = (m / globalMax) * halfPlotW;
    const fw = (f / globalMax) * halfPlotW;
    barCache.male[i].setAttribute("x", cx - CENTER_GAP / 2 - mw);
    barCache.male[i].setAttribute("width", mw);
    barCache.female[i].setAttribute("width", fw);
  }
}

Two design choices worth calling out:

globalMax is the largest single bin across all snapshots, not per-frame. If you re-normalise per-frame, the boomer bulge appears stationary in the chart while everything around it grows and shrinks; the eye-catching effect — the bulge visibly rising up the pyramid as cohorts age — disappears. With a fixed scale you keep the right answer.
CSS transitions on <rect> width and x. The setAttribute calls are direct, no animation library, but .bar-male { transition: x 0.15s, width 0.15s; } makes every per-year update interpolate smoothly. Drag the slider and the bars glide. Hit play and you get a film.

The <rect> elements are pooled at first render and reused on every year update. No DOM churn, no React.

Computing stats live

Median age, aging ratio, and working-age ratio are recomputed in JS on every render. With 21 bins and 13 snapshots there's nothing to optimise — just walk the cumulative sum.

export function medianAge(snapshot, binWidth = 5) {
  const total = totalPopulation(snapshot);
  const half = total / 2;
  let cumulative = 0;
  for (let i = 0; i < snapshot.male.length; i++) {
    const binSize = snapshot.male[i] + snapshot.female[i];
    const next = cumulative + binSize;
    if (next >= half) {
      const fraction = binSize === 0 ? 0 : (half - cumulative) / binSize;
      return i * binWidth + fraction * binWidth;
    }
    cumulative = next;
  }
  return (snapshot.male.length - 1) * binWidth;
}

The unit tests pin a tiny synthetic dataset that lets you verify the median by hand:

const TINY = [
  { year: 2000, male: [400, 300, 200, 100], female: [400, 300, 200, 100] },
  ...
];
// Bin sizes (M+F): 800, 600, 400, 200. Cumulative: 800, 1400, 1800, 2000.
// Half-total = 1000 → falls in bin 1. Need 200 more out of 600 → 1/3 of the bin.
// median ≈ 5 + (1/3)*5 = 6.667 years.
assert.ok(Math.abs(medianAge(TINY[0]) - 6.667) < 0.01);

The Japanese number formatting trap

Population in this codebase is in thousands, because that's what the source data looks like. Rendering "1.26億" or "8,320万" requires getting the unit conversions exactly right, and I shipped two different bugs before settling:

1万 = 10 千, so man = thousands / 10. I once wrote Math.round(thousands / 10) followed by "0万", which appended an extra zero and rendered 83,202 千 (= 83.2 M people = 8,320 万) as "83200万" (= 832 億, 832,000 M, off by a factor of 100).
1億 = 1万 × 1万 = 100,000 千. I once divided by 10_000 instead of 100_000 and rendered 128,097 千 as "12.81億" (off by a factor of 10).

The fix is one careful function with three branches and boundary-value tests for all three:

export function formatJpPopulation(thousands) {
  if (thousands >= 100_000) return `${(thousands / 100_000).toFixed(2)}億`;
  if (thousands >= 10)      return `${Math.round(thousands / 10).toLocaleString("en-US")}万`;
  return `${thousands}千`;
}

assert.equal(formatJpPopulation(83_202),  "8,320万");
assert.equal(formatJpPopulation(128_097), "1.28億");
assert.equal(formatJpPopulation(5),       "5千");

The asymmetric scaling (× 10 between 千 and 万, × 10,000 between 万 and 億) makes this one of those functions where boundary-value tests pay off enormously.

Try it

Hit ▶ once and let it cycle. Even if you already know the headline numbers — Japan ages, Japan shrinks — watching the boomer bulge walk up the chart while the base thins out is a different way of having that fact in your head.

The data is a stylized model. To swap in real numbers from UN WPP, IPSS, or e-Stat, you only need to replace data.json — the schema ({ageBins, snapshots: [{year, male[], female[]}]}) is the simplest plausible JSON shape, and the renderer doesn't care where the numbers came from.

MIT, ~250 lines of JS, ~120 lines of Python generator, 16 unit tests, no build step.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

DEV Community