Why I rewrote 11 trading indicators from Python to Rust (and got bit-exact parity)
A Japanese newspaper reporter spent 30 years perfecting a trading system by hand. I rewrote it in Rust. Here's the full story — the history, the math, and the engineering.
The problem: Numba's cold start kills live trading
My Python trading system relied on Numba-JIT compiled Ichimoku Cloud calculations. Numba is excellent — until your process restarts.
Every cold start: 2-5 seconds of JIT compilation per function. In a live trading loop that restarts on errors, those seconds mean missed signals. And Numba holds the GIL during execution, blocking every other Python thread.
I needed:
- Zero startup latency
- GIL-free execution
- Bit-exact results (no behavioral changes)
- Single-file deployment (no LLVM runtime)
Rust + PyO3 checked every box.
A brief detour: the man on the mountain
Before we get to code, the history matters — because it explains why Ichimoku is designed the way it is.
Goichi Hosoda was a Japanese newspaper reporter who began developing a trading system in the 1930s. His pen name was Ichimoku Sanjin (一目山人) — literally "a glance from a man on a mountain." His goal: a single chart that shows support, resistance, trend, momentum, and future projections — all at one glance.
He enlisted teams of university students to manually compute and backtest the system across decades of Japanese stock and commodity data. No computers. Just pencils, paper, and price tables.
He published Ichimoku Kinko Hyo (一目均衡表 — "one-glance equilibrium chart") in 1968, after 30 years of development. The parameters 9, 26, 52 weren't arbitrary — they mapped to the Japanese trading calendar: 9 trading days (1.5 weeks), 26 days (1 month), 52 days (2 months).
The system remained almost exclusively Japanese until the internet era. Western traders discovered it in the 2000s and recognized its power: not just an indicator, but a complete trading framework.
The five classical components
| Component | Japanese | Formula | Purpose |
|---|---|---|---|
| Conversion Line | Tenkan-sen | (highest high + lowest low) / 2 over short period | Short-term equilibrium |
| Base Line | Kijun-sen | Same formula, medium period | Primary signal line |
| Leading Span A | Senkou Span A | (Tenkan + Kijun) / 2 | Front cloud edge |
| Leading Span B | Senkou Span B | Same formula, long period | Back cloud edge |
| Lagging Span | Chikou Span | Close shifted back N periods | Trend confirmation |
The area between Senkou Span A and B forms the cloud (kumo). Price above cloud = bullish. Below = bearish. Inside = transitioning. Cloud thickness = support/resistance strength.
The key innovation: Hull Moving Average
Classic Ichimoku uses (max + min) / 2 — it only reacts when a new extreme appears in the window. This creates stepped, laggy lines.
Alan Hull (2005) solved the fundamental lag-vs-smoothness tradeoff with an algebraic trick:
HMA(n) = WMA(sqrt(n), 2 * WMA(n/2) - WMA(n))
Why it works:
-
WMA(n)(slow) lags by ~n/2 bars -
WMA(n/2)(fast) lags by ~n/4 bars -
2 * fast - slowextrapolates ahead, compensating the slow line's lag - Final
WMA(sqrt(n))smoothing adds onlysqrt(n)/2bars of lag
Result: ~50% lag reduction with smooth output.
I applied this to Ichimoku by replacing the midpoint calculation with Hull MA of (high + low) / 2. Same cloud structure, faster reaction, smoother boundaries.
The Rust implementation
Architecture
Python layer
│
▼
advanced_ichimoku_cloud (Rust, PyO3)
├── hull.rs → wma, hullma (+ inner functions)
├── hull_signals.rs → trend, pullback, bounce detection
├── ichimoku.rs → classic Ichimoku
├── ichimoku_hull.rs → Hull-enhanced Ichimoku
└── indicators.rs → ema, atr
Key design: inner functions
Every computation exists as a plain fn (no PyO3 overhead). The #[pyfunction] wrappers just handle NumPy conversion and delegate:
// Used by ichimoku_hull.rs without FFI cost
pub(crate) fn hullma_inner(data: &[f64], period: usize) -> Vec<f64> {
// Pure computation — no Python types
}
#[pyfunction]
fn hullma(py: Python, prices: PyReadonlyArray1<f64>, period: usize) -> Py<PyArray1<f64>> {
let slice = prices.as_slice().unwrap();
let result = hullma_inner(slice, period);
PyArray1::from_vec(py, result).into()
}
This enables cross-module reuse: ichimoku_hull.rs calls hull::hullma_inner() directly, with zero FFI overhead.
Zero-copy I/O
-
Input:
as_slice().unwrap()reads NumPy arrays directly — no copying, no allocation -
Output:
PyArray1::from_vecallocates once in Rust, transfers ownership to Python
GIL release
PyO3 releases the GIL during Rust computation by default. Other Python threads (WebSocket handlers, order management) run freely while indicators compute.
Proving parity: 25+ assertions at 1e-12 tolerance
The test suite implements every function in pure Python, generates identical random data (seed=42, N=200), and asserts:
np.testing.assert_allclose(rust_result, python_result, atol=1e-12)
All 11 functions. All edge cases (NaN propagation, initial positions, backfill behavior). If Rust disagrees with Python by more than 1e-12, the test fails.
============================================================
Parity Tests: advanced-ichimoku-cloud
============================================================
PASS wma
PASS hullma
PASS hullma_trend
PASS hullma_pullback
PASS hullma_bounce
PASS ichimoku_line
PASS ichimoku_components
PASS ichimoku_line_hull
PASS ichimoku_components_hull
PASS ema
PASS atr
============================================================
ALL 11 FUNCTIONS PASS PARITY TESTS
============================================================
Before and after
| Dimension | Python + Numba | Rust + PyO3 |
|---|---|---|
| First-call latency | 2-5s JIT warmup | Zero |
| GIL | Held during execution | Released |
| Memory safety | Runtime bounds checks | Compile-time guarantees |
| Dependency weight | ~150 MB (numba + llvmlite) | ~2 MB single .so |
| Reproducibility | JIT varies across LLVM versions | Deterministic binary |
Try it
pip install advanced-ichimoku-cloud
from advanced_ichimoku_cloud import (
ichimoku_components, # classic cloud
ichimoku_components_hull, # Hull-enhanced cloud
hullma, wma, ema, atr, # individual indicators
)
import numpy as np
high = np.random.rand(200) * 100 + 50
low = high - np.random.rand(200) * 5
tenkan, kijun, senkou_a, senkou_b = ichimoku_components(high, low, 9, 26, 52)
GitHub: https://github.com/RMANOV/advanced-ichimoku-cloud
What I learned
-
PyO3's
as_slice()is the killer feature — zero-copy NumPy access makes Rust competitive even for small arrays - Inner function pattern is essential — without it, cross-module reuse requires double FFI
- Bit-exact parity testing catches subtle issues (NaN propagation order, integer division rounding) that benchmarks miss
- The history of your domain matters — understanding why Hosoda chose those parameters helped me design better enhanced variants
Built with Rust, PyO3 0.27, and a deep appreciation for a journalist who spent 30 years perfecting a chart.
Top comments (0)