Ryan Qi

Posted on Apr 11

Your risk model passes all its tests. It will still blow up in a crisis.

#python #beginners #webdev #showdev

It's rough. There are probably bugs. But if I wait until it's perfect, I'll never post it.

So here it is.

"Black swans being unpredictable, we need to adjust to their existence rather than naively try to predict them."

The thing that annoyed me

I was writing a portfolio risk model a few months back. Linters happy, tests passing, everything looked fine. Ran it against historical data, still fine.

Then I started wondering: what actually happens to this thing during a liquidity crisis? Like, what if correlations spike hard and vol goes through the roof at the same time? Does the math still hold?

I couldn't find a clean way to answer that. Backtesting libraries exist, sure, but I wanted something that would look at the code itself and tell me where it breaks and why. Not just that it produced a wrong number. The exact line. The exact input. The chain of variables that led there.

I couldn't find that tool so I started building it. That became BlackSwan.

What it actually does

You point it at a Python function containing financial or numerical logic, pick a stress scenario, and it hammers the function with thousands of perturbed inputs drawn from realistic market conditions: liquidity crash, vol spike, correlation breakdown, rate shock, missing data.

When it finds failures, it tells you the exact line, how often it happened, which input caused it, and traces the causal chain back to the root variable. It even suggests a fix.

$ python -m blackswan test models/risk.py --scenario liquidity_crash

{
  "shatter_points": [
    {
      "line": 36,
      "severity": "critical",
      "failure_type": "non_psd_matrix",
      "message": "Covariance matrix loses positive semi-definiteness when pairwise correlation exceeds 0.91.",
      "frequency": "847 / 5000 iterations (16.9%)",
      "causal_chain": [
        { "line": 8,  "variable": "correlation", "role": "root_input" },
        { "line": 31, "variable": "corr_matrix",  "role": "intermediate" },
        { "line": 36, "variable": "cov_matrix",   "role": "failure_site" }
      ],
      "fix_hint": "Apply nearest-PSD correction (Higham 2002) after correlation perturbation, or clamp eigenvalues to epsilon."
    }
  ]
}

Line 36. 16.9% failure rate under a liquidity crash scenario. That covariance matrix was quietly breaking almost 1 in 6 times and nothing in my normal workflow would have caught it. The model would have just... silently produced garbage.

That's the gap BlackSwan is trying to fill. Standard testing tools understand code structure. They don't understand mathematical behavior under stress.

How far along is it

Genuinely early. I want to be upfront about that.

It's on PyPI (pip install blackswan) and it works, but the scope is narrow by design. Right now it's focused on portfolio risk, covariance/correlation analysis, and VaR-style models using NumPy and Pandas. If you try to throw a random Python file at it, it'll reject it rather than silently produce garbage output. I think that's the right call for now.

There's also a VS Code extension if you'd rather not live in the terminal. Click Run BlackSwan above a function, pick a scenario from a dropdown, and failures show up as red squiggles with hover tooltips. That part was genuinely fun to build.

The ambition outpaces where it is right now. But it's real and it's useful.

The interesting parts (if you want to go deeper)

This section is for people who want to know how it actually works under the hood.

Detectors

Every iteration runs up to 8 detectors concurrently on your function's output:

Detector	What it catches
`NaNInfDetector`	Any computation producing NaN or Inf
`MatrixPSDDetector`	Covariance matrix losing positive semi-definiteness
`ConditionNumberDetector`	Ill-conditioned matrices before inversion
`DivisionStabilityDetector`	Denominator approaching zero
`ExplodingGradientDetector`	Output growing 100x relative to input perturbation
`RegimeShiftDetector`	Structural breaks in output distribution across iterations
`BoundsDetector`	Outputs exceeding configurable plausible bounds
`LogicalInvariantDetector`	User-defined assertions (e.g. portfolio weights must sum to 1)

These get auto-tagged to relevant source lines via AST analysis. No instrumentation, no decorators, no changes to your code.

Adversarial mode

Standard Monte Carlo samples perturbations randomly. Adversarial mode runs a genetic algorithm that evolves stress parameters toward worst-case inputs instead:

python -m blackswan test models/risk.py --scenario liquidity_crash --adversarial

It maintains a population of parameter sets, scores each by failure severity, and breeds the worst performers across generations. A HardnessAdaptor automatically increases perturbation intensity when no failures are found, so it doesn't stall on robust code. It's slower but it finds things random sampling misses.

Causal chains

The output you saw above with root_input, intermediate, failure_site comes from building a dependency graph of your function via AST analysis. When a failure is detected, BlackSwan walks that graph backwards from the failure site to find the root cause. This is the part I'm most proud of and also the part most likely to have bugs, so if the causal chains look wrong on your code please tell me.

ReproducibilityCard

Every run emits a machine-readable provenance record with exact BlackSwan version, Python version, NumPy version, scenario hash, seed, and a ready-to-paste replay command. If you share a finding with a colleague and they can't reproduce it, the card tells you exactly why.

Try it

pip install blackswan
python -m blackswan test your_model.py --scenario vol_spike

Source on GitHub.

If you try it and something breaks or doesn't make sense, open an issue. I'd much rather hear that it failed on your code than not hear anything at all. And if you work in this space and think I'm solving the wrong problem entirely, I'd genuinely like to know that too.

DEV Community