Building a Monte Carlo Retirement Simulator in Python

#investing #finance #beginners #productivity

A retirement projection built on a single number — "assume 7% per year" — answers the wrong question. It tells you what happens on one perfectly smooth path that the market has never actually delivered. Real returns arrive jagged: a strong decade, a flat stretch, a crash three years before you plan to stop working. A Monte Carlo simulation replaces that single path with thousands of them, and the output stops being a number and becomes a distribution. This walks through building one in Python and, more importantly, understanding what it can and cannot tell you.

Why a Single Average Return Lies to You

The standard projection uses the deterministic compound-growth formula: start with a balance, multiply by (1 + r) every year, subtract withdrawals. It is easy to put in a spreadsheet and it produces a clean curve. The problem is that the clean curve corresponds to a sequence of identical annual returns, and no such sequence has ever occurred.

The full set of returns over 30 years does determine your ending balance if you never withdraw — multiplication is commutative. But the moment you take money out, the order of returns matters. A bad year early in retirement forces you to sell more shares to fund the same withdrawal, and those shares are gone before any recovery can compound them. Two retirees with the identical 30-year average can end up with wildly different outcomes depending on when the bad years landed. A single-average projection cannot see this. It computes one path and hides the distribution of paths around it.

What a Monte Carlo Simulation Actually Does

The technique is mechanically simple. You assume a distribution for annual returns — for example, a mean and standard deviation estimated from long-run historical data. Then you:

Draw a random return for each year of the projection horizon.
Run your withdrawal logic year by year on that random sequence.
Record whether the portfolio survived and what it ended at.
Repeat thousands of times.

The result is not "your portfolio will be worth X." It is "across 10,000 simulated futures, the portfolio survived in 87% of them, and here is the spread of ending balances." That success rate, and the shape of the distribution, is the actual deliverable.

A Minimal Simulator in Python

The core is about 20 lines of NumPy. Everything else is interpretation.

import numpy as np

def simulate(start, withdrawal, years, mean, std, trials=10_000):
    # Draw a (trials x years) matrix of random annual returns
    returns = np.random.normal(mean, std, size=(trials, years))
    balances = np.full(trials, float(start))
    survived = np.ones(trials, dtype=bool)

    for y in range(years):
        balances = balances * (1 + returns[:, y]) - withdrawal
        survived &= balances > 0
        balances = np.clip(balances, 0, None)

    return {
        "success_rate": survived.mean(),
        "p10": np.percentile(balances, 10),
        "p50": np.percentile(balances, 50),
        "p90": np.percentile(balances, 90),
    }

result = simulate(
    start=1_000_000, withdrawal=45_000, years=30,
    mean=0.05, std=0.12,            # illustrative real-return assumptions
)
print(result)

The numbers passed in here are illustrative, not a recommendation. A real run should use return and volatility assumptions you can defend, expressed in real (inflation-adjusted) terms so the withdrawal stays in constant dollars.

Working in real returns — subtracting expected inflation from the nominal mean — lets you hold the withdrawal constant across years. The alternative is to model nominal returns and inflate the withdrawal each year. Pick one convention and apply it consistently; mixing them is the most common bug in homemade simulators.

Reading the Output: Percentiles, Not Averages

The instinct is to look at the median ending balance and call it the answer. Resist it. The two outputs that matter are the success rate and the lower percentiles.

The success rate is the fraction of simulated paths where the portfolio never hit zero. A 95% success rate means 1 in 20 simulated futures ran out of money. Whether that is acceptable is a personal judgment, not a technical one — but you cannot make the judgment without the number.

The lower percentiles describe the bad-but-not-impossible cases. The P10 ending balance is the outcome that 90% of paths beat. If the P10 is a comfortable cushion, the plan is robust. If the P10 is zero, the plan depends on the market cooperating, and the median balance is cold comfort. This is the same reason you look at P99 latency instead of average latency when you operate a service: the average hides the tail, and the tail is what hurts.

Run the simulation twice with the withdrawal as the only variable. The gap between, say, a 4% and a 4.5% withdrawal rate often moves the success rate more than any plausible change in your return assumptions. Spending is the lever you actually control.

Where the Model Is Wrong

A simulator is only as honest as its assumptions, and the minimal version above makes several that are convenient rather than accurate.

Normal distributions understate tail risk. Real market returns have fatter tails than a normal distribution — extreme years happen more often than the bell curve predicts. Drawing from np.random.normal produces crashes that are too mild and too rare. Using a Student's t-distribution, or resampling from actual historical years (bootstrap), gives a more realistic tail.

Returns are not independent. The model treats each year as an independent draw. Real markets show some mean reversion over long horizons and volatility clustering over short ones. Independent draws miss both.

Fixed withdrawals are unrealistic. Almost nobody spends an identical inflation-adjusted amount through a 25% drawdown. Modeling a flexible rule — spending less in bad years — changes the success rate substantially and reflects how people actually behave.

Taxes and fees are missing. The minimal model ignores both. A 0.5% annual fee and the tax drag on a taxable account are real subtractions from the return every single year.

A Monte Carlo simulator produces precise-looking numbers from imprecise inputs. An 87.3% success rate carries false authority — the third digit is noise. Treat the output as a way to compare scenarios and stress-test assumptions, not as a forecast. The value is in the relative comparison, not the absolute figure.

The point of building your own simulator is not that it beats established tools like cFIREsim or the research behind the 4% rule. It is that writing the loop forces you to make every assumption explicit. You cannot hand-wave the return distribution when you have to type it into np.random. That explicitness — seeing exactly which inputs the answer depends on — is worth more than the answer itself.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.