Stack
- Python — ML pipeline, orchestration, data handling
- Rust + PyO3 — Backtesting engine, indicators, grid search
- Rayon — Parallel execution
Before: pure Python
def backtest(prices, signals, stop_loss, take_profit):
equity = [initial_capital]
for i in range(len(prices)):
# lots of loops and branching
return calculate_metrics(equity)
# Grid search: 1000 combinations
results = [
backtest(prices, signals, sl, tp)
for sl, tp in param_grid
] # ~7.5 minutes
After: Rust + PyO3
#[pyfunction]
fn run_backtest_py(
py: Python,
prices: PyReadonlyArray1<f64>,
signals: PyReadonlyArray1<i8>,
stop_loss: f64,
take_profit: f64,
) -> PyResult<PyObject> {
let prices = prices.as_slice()?;
let signals = signals.as_slice()?;
let result = run_backtest(prices, signals, stop_loss, take_profit);
Ok(result.into_py(py))
}
Python stays clean:
from hyprl_supercalc import run_backtest
results = [
run_backtest(prices, signals, sl, tp)
for sl, tp in param_grid
] # ~8 seconds
Same API. Drastically faster.
Key optimizations
1. Zero-copy NumPy access
use numpy::PyReadonlyArray1;
let prices = prices.as_slice()?; // Direct pointer into NumPy memory
No data duplication between Python and Rust.
2. Parallel grid search with Rayon
use rayon::prelude::*;
pub fn run_grid_search(params: &[Params]) -> Vec<BacktestResult> {
params
.par_iter()
.map(run_single_backtest)
.collect()
}
All CPU cores used automatically.
3. Reusing buffers
// Bad: allocate every loop
for _ in 0..n {
let temp = Vec::new();
}
// Good: reuse allocation
let mut buffer = Vec::with_capacity(n);
for _ in 0..n {
buffer.clear();
}
Avoiding repeated allocations made a measurable difference.
Benchmarks
| Operation | Python | Rust | Speedup |
|---|---|---|---|
| ATR (10k bars) | 45 ms | 1.2 ms | 37× |
| Single backtest | 450 ms | 12 ms | 37× |
| Grid search (1000) | 7.5 min | 8 sec | 56× |
The hard parts
Lifetimes & NumPy views
Convincing Rust’s borrow checker with NumPy-backed slices took trial and error.Cross-platform builds
Maturin helps, but testing Linux / macOS / Windows is still required.Debugging across the boundary
Rust crashes called from Python don’t produce great stack traces.
Was it worth it?
Yes.
- Iteration speed increased by 56×
- I can explore much larger parameter spaces
- The Rust core ended up cleaner than the original Python code
Try it yourself
Source code:
https://github.com/Kacawaiii/HyprL/tree/main/native
If you're hitting performance limits in Python, rewriting only the hot path in Rust is often enough.
PyO3 makes the integration surprisingly painless.
Questions and feedback welcome in the comments.
Top comments (0)