Kyoko

Posted on Jan 26

How I Made My Python Backtester 56x Faster with Rust

#rust #python #performance #finance

Stack

Python — ML pipeline, orchestration, data handling
Rust + PyO3 — Backtesting engine, indicators, grid search
Rayon — Parallel execution

Before: pure Python

def backtest(prices, signals, stop_loss, take_profit):
    equity = [initial_capital]
    for i in range(len(prices)):
        # lots of loops and branching
    return calculate_metrics(equity)

# Grid search: 1000 combinations
results = [
    backtest(prices, signals, sl, tp)
    for sl, tp in param_grid
]  # ~7.5 minutes

After: Rust + PyO3

#[pyfunction]
fn run_backtest_py(
    py: Python,
    prices: PyReadonlyArray1<f64>,
    signals: PyReadonlyArray1<i8>,
    stop_loss: f64,
    take_profit: f64,
) -> PyResult<PyObject> {
    let prices = prices.as_slice()?;
    let signals = signals.as_slice()?;

    let result = run_backtest(prices, signals, stop_loss, take_profit);

    Ok(result.into_py(py))
}

Python stays clean:

from hyprl_supercalc import run_backtest

results = [
    run_backtest(prices, signals, sl, tp)
    for sl, tp in param_grid
]  # ~8 seconds

Same API. Drastically faster.

Key optimizations

1. Zero-copy NumPy access

use numpy::PyReadonlyArray1;

let prices = prices.as_slice()?;  // Direct pointer into NumPy memory

No data duplication between Python and Rust.

2. Parallel grid search with Rayon

use rayon::prelude::*;

pub fn run_grid_search(params: &[Params]) -> Vec<BacktestResult> {
    params
        .par_iter()
        .map(run_single_backtest)
        .collect()
}

All CPU cores used automatically.

3. Reusing buffers

// Bad: allocate every loop
for _ in 0..n {
    let temp = Vec::new();
}

// Good: reuse allocation
let mut buffer = Vec::with_capacity(n);
for _ in 0..n {
    buffer.clear();
}

Avoiding repeated allocations made a measurable difference.

Benchmarks

Operation	Python	Rust	Speedup
ATR (10k bars)	45 ms	1.2 ms	37×
Single backtest	450 ms	12 ms	37×
Grid search (1000)	7.5 min	8 sec	56×

The hard parts

Lifetimes & NumPy views
Convincing Rust’s borrow checker with NumPy-backed slices took trial and error.
Cross-platform builds
Maturin helps, but testing Linux / macOS / Windows is still required.
Debugging across the boundary
Rust crashes called from Python don’t produce great stack traces.

Was it worth it?

Yes.

Iteration speed increased by 56×
I can explore much larger parameter spaces
The Rust core ended up cleaner than the original Python code

Try it yourself

Source code:

https://github.com/Kacawaiii/HyprL/tree/main/native

If you're hitting performance limits in Python, rewriting only the hot path in Rust is often enough.
PyO3 makes the integration surprisingly painless.

Questions and feedback welcome in the comments.

DEV Community