DEV Community

Kyoko
Kyoko

Posted on

How I Made My Python Backtester 56x Faster with Rust

Stack

  • Python — ML pipeline, orchestration, data handling
  • Rust + PyO3 — Backtesting engine, indicators, grid search
  • Rayon — Parallel execution

Before: pure Python

def backtest(prices, signals, stop_loss, take_profit):
    equity = [initial_capital]
    for i in range(len(prices)):
        # lots of loops and branching
    return calculate_metrics(equity)

# Grid search: 1000 combinations
results = [
    backtest(prices, signals, sl, tp)
    for sl, tp in param_grid
]  # ~7.5 minutes
Enter fullscreen mode Exit fullscreen mode

After: Rust + PyO3

#[pyfunction]
fn run_backtest_py(
    py: Python,
    prices: PyReadonlyArray1<f64>,
    signals: PyReadonlyArray1<i8>,
    stop_loss: f64,
    take_profit: f64,
) -> PyResult<PyObject> {
    let prices = prices.as_slice()?;
    let signals = signals.as_slice()?;

    let result = run_backtest(prices, signals, stop_loss, take_profit);

    Ok(result.into_py(py))
}
Enter fullscreen mode Exit fullscreen mode

Python stays clean:

from hyprl_supercalc import run_backtest

results = [
    run_backtest(prices, signals, sl, tp)
    for sl, tp in param_grid
]  # ~8 seconds
Enter fullscreen mode Exit fullscreen mode

Same API. Drastically faster.


Key optimizations

1. Zero-copy NumPy access

use numpy::PyReadonlyArray1;

let prices = prices.as_slice()?;  // Direct pointer into NumPy memory
Enter fullscreen mode Exit fullscreen mode

No data duplication between Python and Rust.


2. Parallel grid search with Rayon

use rayon::prelude::*;

pub fn run_grid_search(params: &[Params]) -> Vec<BacktestResult> {
    params
        .par_iter()
        .map(run_single_backtest)
        .collect()
}
Enter fullscreen mode Exit fullscreen mode

All CPU cores used automatically.


3. Reusing buffers

// Bad: allocate every loop
for _ in 0..n {
    let temp = Vec::new();
}

// Good: reuse allocation
let mut buffer = Vec::with_capacity(n);
for _ in 0..n {
    buffer.clear();
}
Enter fullscreen mode Exit fullscreen mode

Avoiding repeated allocations made a measurable difference.


Benchmarks

Operation Python Rust Speedup
ATR (10k bars) 45 ms 1.2 ms 37×
Single backtest 450 ms 12 ms 37×
Grid search (1000) 7.5 min 8 sec 56×

The hard parts

  • Lifetimes & NumPy views
    Convincing Rust’s borrow checker with NumPy-backed slices took trial and error.

  • Cross-platform builds
    Maturin helps, but testing Linux / macOS / Windows is still required.

  • Debugging across the boundary
    Rust crashes called from Python don’t produce great stack traces.


Was it worth it?

Yes.

  • Iteration speed increased by 56×
  • I can explore much larger parameter spaces
  • The Rust core ended up cleaner than the original Python code

Try it yourself

Source code:

https://github.com/Kacawaiii/HyprL/tree/main/native

If you're hitting performance limits in Python, rewriting only the hot path in Rust is often enough.
PyO3 makes the integration surprisingly painless.


Questions and feedback welcome in the comments.


Top comments (0)