DEV Community

Ruslan Manov
Ruslan Manov

Posted on

Rust + PyO3 Enhanced Ichimoku Cloud with Hull MA Smoothing

Why I rewrote 11 trading indicators from Python to Rust (and got bit-exact parity)

A Japanese newspaper reporter spent 30 years perfecting a trading system by hand. I rewrote it in Rust. Here's the full story — the history, the math, and the engineering.


The problem: Numba's cold start kills live trading

My Python trading system relied on Numba-JIT compiled Ichimoku Cloud calculations. Numba is excellent — until your process restarts.

Every cold start: 2-5 seconds of JIT compilation per function. In a live trading loop that restarts on errors, those seconds mean missed signals. And Numba holds the GIL during execution, blocking every other Python thread.

I needed:

  • Zero startup latency
  • GIL-free execution
  • Bit-exact results (no behavioral changes)
  • Single-file deployment (no LLVM runtime)

Rust + PyO3 checked every box.


A brief detour: the man on the mountain

Before we get to code, the history matters — because it explains why Ichimoku is designed the way it is.

Goichi Hosoda was a Japanese newspaper reporter who began developing a trading system in the 1930s. His pen name was Ichimoku Sanjin (一目山人) — literally "a glance from a man on a mountain." His goal: a single chart that shows support, resistance, trend, momentum, and future projections — all at one glance.

He enlisted teams of university students to manually compute and backtest the system across decades of Japanese stock and commodity data. No computers. Just pencils, paper, and price tables.

He published Ichimoku Kinko Hyo (一目均衡表 — "one-glance equilibrium chart") in 1968, after 30 years of development. The parameters 9, 26, 52 weren't arbitrary — they mapped to the Japanese trading calendar: 9 trading days (1.5 weeks), 26 days (1 month), 52 days (2 months).

The system remained almost exclusively Japanese until the internet era. Western traders discovered it in the 2000s and recognized its power: not just an indicator, but a complete trading framework.


The five classical components

Component Japanese Formula Purpose
Conversion Line Tenkan-sen (highest high + lowest low) / 2 over short period Short-term equilibrium
Base Line Kijun-sen Same formula, medium period Primary signal line
Leading Span A Senkou Span A (Tenkan + Kijun) / 2 Front cloud edge
Leading Span B Senkou Span B Same formula, long period Back cloud edge
Lagging Span Chikou Span Close shifted back N periods Trend confirmation

The area between Senkou Span A and B forms the cloud (kumo). Price above cloud = bullish. Below = bearish. Inside = transitioning. Cloud thickness = support/resistance strength.


The key innovation: Hull Moving Average

Classic Ichimoku uses (max + min) / 2 — it only reacts when a new extreme appears in the window. This creates stepped, laggy lines.

Alan Hull (2005) solved the fundamental lag-vs-smoothness tradeoff with an algebraic trick:

HMA(n) = WMA(sqrt(n),  2 * WMA(n/2) - WMA(n))
Enter fullscreen mode Exit fullscreen mode

Why it works:

  1. WMA(n) (slow) lags by ~n/2 bars
  2. WMA(n/2) (fast) lags by ~n/4 bars
  3. 2 * fast - slow extrapolates ahead, compensating the slow line's lag
  4. Final WMA(sqrt(n)) smoothing adds only sqrt(n)/2 bars of lag

Result: ~50% lag reduction with smooth output.

I applied this to Ichimoku by replacing the midpoint calculation with Hull MA of (high + low) / 2. Same cloud structure, faster reaction, smoother boundaries.


The Rust implementation

Architecture

Python layer
    │
    ▼
advanced_ichimoku_cloud (Rust, PyO3)
    ├── hull.rs          → wma, hullma (+ inner functions)
    ├── hull_signals.rs  → trend, pullback, bounce detection
    ├── ichimoku.rs      → classic Ichimoku
    ├── ichimoku_hull.rs → Hull-enhanced Ichimoku
    └── indicators.rs    → ema, atr
Enter fullscreen mode Exit fullscreen mode

Key design: inner functions

Every computation exists as a plain fn (no PyO3 overhead). The #[pyfunction] wrappers just handle NumPy conversion and delegate:

// Used by ichimoku_hull.rs without FFI cost
pub(crate) fn hullma_inner(data: &[f64], period: usize) -> Vec<f64> {
    // Pure computation — no Python types
}

#[pyfunction]
fn hullma(py: Python, prices: PyReadonlyArray1<f64>, period: usize) -> Py<PyArray1<f64>> {
    let slice = prices.as_slice().unwrap();
    let result = hullma_inner(slice, period);
    PyArray1::from_vec(py, result).into()
}
Enter fullscreen mode Exit fullscreen mode

This enables cross-module reuse: ichimoku_hull.rs calls hull::hullma_inner() directly, with zero FFI overhead.

Zero-copy I/O

  • Input: as_slice().unwrap() reads NumPy arrays directly — no copying, no allocation
  • Output: PyArray1::from_vec allocates once in Rust, transfers ownership to Python

GIL release

PyO3 releases the GIL during Rust computation by default. Other Python threads (WebSocket handlers, order management) run freely while indicators compute.


Proving parity: 25+ assertions at 1e-12 tolerance

The test suite implements every function in pure Python, generates identical random data (seed=42, N=200), and asserts:

np.testing.assert_allclose(rust_result, python_result, atol=1e-12)
Enter fullscreen mode Exit fullscreen mode

All 11 functions. All edge cases (NaN propagation, initial positions, backfill behavior). If Rust disagrees with Python by more than 1e-12, the test fails.

============================================================
  Parity Tests: advanced-ichimoku-cloud
============================================================
  PASS  wma
  PASS  hullma
  PASS  hullma_trend
  PASS  hullma_pullback
  PASS  hullma_bounce
  PASS  ichimoku_line
  PASS  ichimoku_components
  PASS  ichimoku_line_hull
  PASS  ichimoku_components_hull
  PASS  ema
  PASS  atr
============================================================
  ALL 11 FUNCTIONS PASS PARITY TESTS
============================================================
Enter fullscreen mode Exit fullscreen mode

Before and after

Dimension Python + Numba Rust + PyO3
First-call latency 2-5s JIT warmup Zero
GIL Held during execution Released
Memory safety Runtime bounds checks Compile-time guarantees
Dependency weight ~150 MB (numba + llvmlite) ~2 MB single .so
Reproducibility JIT varies across LLVM versions Deterministic binary

Try it

pip install advanced-ichimoku-cloud
Enter fullscreen mode Exit fullscreen mode
from advanced_ichimoku_cloud import (
    ichimoku_components,       # classic cloud
    ichimoku_components_hull,  # Hull-enhanced cloud
    hullma, wma, ema, atr,    # individual indicators
)

import numpy as np
high = np.random.rand(200) * 100 + 50
low = high - np.random.rand(200) * 5

tenkan, kijun, senkou_a, senkou_b = ichimoku_components(high, low, 9, 26, 52)
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/RMANOV/advanced-ichimoku-cloud


What I learned

  1. PyO3's as_slice() is the killer feature — zero-copy NumPy access makes Rust competitive even for small arrays
  2. Inner function pattern is essential — without it, cross-module reuse requires double FFI
  3. Bit-exact parity testing catches subtle issues (NaN propagation order, integer division rounding) that benchmarks miss
  4. The history of your domain matters — understanding why Hosoda chose those parameters helped me design better enhanced variants

Built with Rust, PyO3 0.27, and a deep appreciation for a journalist who spent 30 years perfecting a chart.

Top comments (0)