Ruslan Manov

Posted on Jan 31

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

#rust #python #machinelearning #opensource

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

A Hungarian mathematician's 1960 invention, a British statistician's 1993 extension, and a Rust rewrite that eliminates 30 seconds of JIT warmup. Here's the story of state estimation under regime switches.

60 years of hidden state estimation

1960 — Rudolf Kálmán publishes "A New Approach to Linear Filtering and Prediction Problems." A single paper that would guide Apollo spacecraft to the Moon, enable GPS, and become the backbone of every control system. But it has a limitation: it assumes linear dynamics and Gaussian noise.

1968 — Extended Kalman Filter (EKF) linearizes nonlinear systems via Taylor expansion. Works well enough for slightly nonlinear systems, fails catastrophically for highly nonlinear ones.

1993 — Gordon, Salmond, and Smith publish the bootstrap particle filter. Instead of assuming a distribution shape, they represent the belief as a cloud of weighted samples (particles). Each particle is a hypothesis about the hidden state. Propagate, weight, resample. Repeat. No linearity assumptions. No Gaussian requirements.

Present — Regime-switching extensions add a second layer: each particle carries both a continuous state (position, velocity) AND a discrete mode (regime). The system can switch between fundamentally different behaviors — trending, mean-reverting, or chaotic — and the filter tracks which regime is active.

The problem: 19 functions × 2-5s JIT each = pain

My trading system uses a particle filter to track price regimes in real time. Three modes:

RANGE: Mean-reverting. Price oscillates around equilibrium. velocity' = 0.5 × velocity + noise
TREND: Directional. Price follows order flow imbalance. velocity' = velocity + 0.3 × (gain × imbalance - velocity) × dt + noise
PANIC: High volatility. Random walk with large noise.

The Python+Numba implementation had 19 JIT-compiled functions. Every process restart: 30+ seconds of JIT compilation before the first estimate. In live trading, 30 blind seconds means missed regime transitions, delayed signals, frozen risk management.

And Numba holds the GIL. While 500 particles propagate through nonlinear dynamics, the entire Python runtime blocks.

19 functions in safe Rust

I rewrote everything in Rust with PyO3 bindings. Six categories:

Core particle filter

predict_particles    → Rayon-parallelized regime-specific propagation
update_weights       → Bayesian weight update via Gaussian likelihood
transition_regimes   → Markov chain mode switching (3×3 matrix)
systematic_resample  → O(N) two-pointer resampling
effective_sample_size → Degeneracy diagnostic (ESS = 1/Σwᵢ²)
estimate             → Weighted mean + regime probabilities

Kalman smoothing

kalman_update              → 2D level/slope tracker
slope_confidence_interval  → 95% CI for slope
is_slope_significant       → Directional significance test
kalman_slope_acceleration  → Second derivative for early trend entry

Signal processing

calculate_vwap_bands    → Volume-weighted price with σ-bands
calculate_momentum_score → Normalized momentum in [-1, +1]
rolling_kurtosis        → Fat-tail detection (excess kurtosis)

Statistical tests

hurst_exponent          → R/S analysis: trending vs mean-reverting
cusum_test              → Page's CUSUM: structural break detection
volatility_compression  → Range squeeze (short/long vol ratio)

Extended

particle_price_variance    → Weighted variance of particle cloud
ess_and_uncertainty_margin → Combined ESS + regime dominance
adaptive_vwap_sigma        → Kurtosis-adapted VWAP band width

The math that matters

Particle propagation (the regime-specific part)

Each particle i carries state [xᵢ, vᵢ] (log-price, velocity) and regime rᵢ ∈ {0, 1, 2}.

RANGE (r=0):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[0]·√dt·εₓ
vᵢ' = 0.5·vᵢ + σ_vel[0]·√dt·εᵥ

The 0.5·vᵢ term pulls velocity toward zero — mean reversion.

TREND (r=1):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[1]·√dt·εₓ
vᵢ' = vᵢ + 0.3·(G·imbalance - vᵢ)·dt + σ_vel[1]·√dt·εᵥ

Velocity tracks G × imbalance — the order flow signal.

PANIC (r=2):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[2]·√dt·εₓ
vᵢ' = vᵢ + σ_vel[2]·√dt·εᵥ

Pure random walk with high noise.

Bayesian weight update

After observing the actual price z:

wᵢ' = wᵢ × exp(-0.5·(z - xᵢ)²/σ²_price[rᵢ]) × exp(-0.5·(vᵢ - G·imb)²/σ²_vel[rᵢ])

Particles close to the observation get high weight. Particles far away get low weight. Normalize. Resample when weights degenerate (ESS < N/2).

Why log-price space?

All operations use log(price). This makes the filter scale-invariant — the same parameters and noise levels work identically for a $0.50 penny stock and a $50,000 Bitcoin. Multiplicative price noise becomes additive in log space.

Engineering decisions

Rayon only where it matters

predict_particles is the only parallelized function. It's O(N) with substantial per-particle computation (regime branching, noise injection). The other 18 functions are either memory-bound (weight updates, resampling) or operate on small arrays (Kalman 2×2 matrices, signal windows).

Adding rayon to memory-bound functions would increase latency from thread pool overhead.

No internal randomness

The library takes pre-generated random arrays as input. This gives the caller:

Full reproducibility (same seed = same output)
Choice of RNG (numpy default, PCG64, whatever)
Ability to inject structured noise for testing

Numerical stability

Weight normalization: +1e-300 guard prevents underflow to zero
ESS denominator: +1e-12 prevents division by zero
Kalman covariance: symmetrized after every predict and update step
dt guard: max(dt, 1e-8).sqrt() prevents noise explosion at tiny timesteps

Hurst exponent — my favorite function

The Hurst exponent tells you if a price series is:

H > 0.5: Trending (persistent — ups followed by ups)
H = 0.5: Random walk (no memory)
H < 0.5: Mean-reverting (anti-persistent — ups followed by downs)

Computed via R/S (Rescaled Range) analysis:

For each window size n in [min_window, max_window]:
- Split series into blocks of size n
- For each block: compute range of cumulative deviations from mean, divide by standard deviation
- Average the R/S values
Fit log(R/S) = H × log(n) + c via least squares
H is the slope

This single number tells you whether to use trend-following or mean-reversion strategies. Combined with the particle filter's regime probabilities, you get a multi-scale view of market behavior.

Before and after

Dimension	Python + Numba	Rust + PyO3
Cold start (19 functions)	30-90s JIT warmup	Zero
GIL	Held during all compute	Released
Parallelism	prange (limited)	Rayon work-stealing
Memory safety	Runtime bounds checks	Compile-time guarantees
Dependency weight	~150 MB	~2 MB single .so
Reproducibility	JIT varies by LLVM version	Deterministic binary

Try it

pip install particle-filter-rs

import numpy as np
from particle_filter_rs import (
    predict_particles, update_weights, systematic_resample,
    effective_sample_size, estimate, transition_regimes,
    kalman_update, hurst_exponent, cusum_test,
)

# Initialize 500 particles in log-price space
N = 500
particles = np.column_stack([
    np.full(N, np.log(100.0)),  # log-price
    np.zeros(N),                 # velocity
])
regimes = np.zeros(N, dtype=np.int64)
weights = np.full(N, 1.0 / N)

# One filter step
rng = np.random.default_rng(42)
particles = predict_particles(
    particles, regimes,
    np.array([0.001, 0.002, 0.005]),  # process noise per regime
    np.array([0.01, 0.02, 0.05]),
    imbalance=0.3, dt=1.0, vel_gain=0.1,
    random_pos=rng.standard_normal(N),
    random_vel=rng.standard_normal(N),
)

GitHub: https://github.com/RMANOV/particle-filter-rs

What I learned

Rayon's overhead is real — for memory-bound functions, single-threaded Rust beats rayon-parallelized Rust
Log-space is non-negotiable — a particle filter in linear price space needs different noise parameters for every asset. Log-space is universal.
Deterministic filters are testable filters — external RNG makes every test reproducible, every bug reproducible
The Kalman complement matters — particle filters alone give noisy estimates. A parallel Kalman provides smooth baseline + confidence intervals. Best of both worlds.
60 years of math still works — Kálmán's 1960 insight (recursive Bayesian estimation) is alive in every function in this library

Built with Rust, PyO3 0.27, rayon 1.10, and respect for 60 years of state estimation research.

DEV Community

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

60 years of hidden state estimation

The problem: 19 functions × 2-5s JIT each = pain

19 functions in safe Rust

Core particle filter

Kalman smoothing

Signal processing

Statistical tests

Extended

The math that matters

Particle propagation (the regime-specific part)

Bayesian weight update

Why log-price space?

Engineering decisions

Rayon only where it matters

No internal randomness

Numerical stability

Hurst exponent — my favorite function

Before and after

Try it

What I learned

Top comments (0)