DEV Community

Ryan Qi
Ryan Qi

Posted on

Your risk model passes all its tests. It will still blow up in a crisis.

It's rough. There are probably bugs. But if I wait until it's perfect, I'll never post it.

So here it is.


"Black swans being unpredictable, we need to adjust to their existence rather than naively try to predict them."


The thing that annoyed me

I was writing a portfolio risk model a few months back. Linters happy, tests passing, everything looked fine. Ran it against historical data, still fine.

Then I started wondering: what actually happens to this thing during a liquidity crisis? Like, what if correlations spike hard and vol goes through the roof at the same time? Does the math still hold?

I couldn't find a clean way to answer that. Backtesting libraries exist, sure, but I wanted something that would look at the code itself and tell me where it breaks and why. Not just that it produced a wrong number. The exact line. The exact input. The chain of variables that led there.

I couldn't find that tool so I started building it. That became BlackSwan.


What it actually does

You point it at a Python function containing financial or numerical logic, pick a stress scenario, and it hammers the function with thousands of perturbed inputs drawn from realistic market conditions: liquidity crash, vol spike, correlation breakdown, rate shock, missing data.

When it finds failures, it tells you the exact line, how often it happened, which input caused it, and traces the causal chain back to the root variable. It even suggests a fix.

$ python -m blackswan test models/risk.py --scenario liquidity_crash
Enter fullscreen mode Exit fullscreen mode
{
  "shatter_points": [
    {
      "line": 36,
      "severity": "critical",
      "failure_type": "non_psd_matrix",
      "message": "Covariance matrix loses positive semi-definiteness when pairwise correlation exceeds 0.91.",
      "frequency": "847 / 5000 iterations (16.9%)",
      "causal_chain": [
        { "line": 8,  "variable": "correlation", "role": "root_input" },
        { "line": 31, "variable": "corr_matrix",  "role": "intermediate" },
        { "line": 36, "variable": "cov_matrix",   "role": "failure_site" }
      ],
      "fix_hint": "Apply nearest-PSD correction (Higham 2002) after correlation perturbation, or clamp eigenvalues to epsilon."
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Line 36. 16.9% failure rate under a liquidity crash scenario. That covariance matrix was quietly breaking almost 1 in 6 times and nothing in my normal workflow would have caught it. The model would have just... silently produced garbage.

That's the gap BlackSwan is trying to fill. Standard testing tools understand code structure. They don't understand mathematical behavior under stress.


How far along is it

Genuinely early. I want to be upfront about that.

It's on PyPI (pip install blackswan) and it works, but the scope is narrow by design. Right now it's focused on portfolio risk, covariance/correlation analysis, and VaR-style models using NumPy and Pandas. If you try to throw a random Python file at it, it'll reject it rather than silently produce garbage output. I think that's the right call for now.

There's also a VS Code extension if you'd rather not live in the terminal. Click Run BlackSwan above a function, pick a scenario from a dropdown, and failures show up as red squiggles with hover tooltips. That part was genuinely fun to build.

The ambition outpaces where it is right now. But it's real and it's useful.


The interesting parts (if you want to go deeper)

This section is for people who want to know how it actually works under the hood.

Detectors

Every iteration runs up to 8 detectors concurrently on your function's output:

Detector What it catches
NaNInfDetector Any computation producing NaN or Inf
MatrixPSDDetector Covariance matrix losing positive semi-definiteness
ConditionNumberDetector Ill-conditioned matrices before inversion
DivisionStabilityDetector Denominator approaching zero
ExplodingGradientDetector Output growing 100x relative to input perturbation
RegimeShiftDetector Structural breaks in output distribution across iterations
BoundsDetector Outputs exceeding configurable plausible bounds
LogicalInvariantDetector User-defined assertions (e.g. portfolio weights must sum to 1)

These get auto-tagged to relevant source lines via AST analysis. No instrumentation, no decorators, no changes to your code.

Adversarial mode

Standard Monte Carlo samples perturbations randomly. Adversarial mode runs a genetic algorithm that evolves stress parameters toward worst-case inputs instead:

python -m blackswan test models/risk.py --scenario liquidity_crash --adversarial
Enter fullscreen mode Exit fullscreen mode

It maintains a population of parameter sets, scores each by failure severity, and breeds the worst performers across generations. A HardnessAdaptor automatically increases perturbation intensity when no failures are found, so it doesn't stall on robust code. It's slower but it finds things random sampling misses.

Causal chains

The output you saw above with root_input, intermediate, failure_site comes from building a dependency graph of your function via AST analysis. When a failure is detected, BlackSwan walks that graph backwards from the failure site to find the root cause. This is the part I'm most proud of and also the part most likely to have bugs, so if the causal chains look wrong on your code please tell me.

ReproducibilityCard

Every run emits a machine-readable provenance record with exact BlackSwan version, Python version, NumPy version, scenario hash, seed, and a ready-to-paste replay command. If you share a finding with a colleague and they can't reproduce it, the card tells you exactly why.


Try it

pip install blackswan
python -m blackswan test your_model.py --scenario vol_spike
Enter fullscreen mode Exit fullscreen mode

Source on GitHub.


If you try it and something breaks or doesn't make sense, open an issue. I'd much rather hear that it failed on your code than not hear anything at all. And if you work in this space and think I'm solving the wrong problem entirely, I'd genuinely like to know that too.


Top comments (0)