<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex</title>
    <description>The latest articles on DEV Community by Alex (@ab_as_62eafcb6a433008952b).</description>
    <link>https://dev.to/ab_as_62eafcb6a433008952b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4010844%2F77a8c5ad-5c81-4cc1-9dc9-c779f0a67f0d.png</url>
      <title>DEV Community: Alex</title>
      <link>https://dev.to/ab_as_62eafcb6a433008952b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ab_as_62eafcb6a433008952b"/>
    <language>en</language>
    <item>
      <title>From Python/Pandas to Rust/C++: taking our tick simulation from 140ms to microseconds per window</title>
      <dc:creator>Alex</dc:creator>
      <pubDate>Wed, 01 Jul 2026 11:51:26 +0000</pubDate>
      <link>https://dev.to/ab_as_62eafcb6a433008952b/from-pythonpandas-to-rustc-taking-our-tick-simulation-from-140ms-to-microseconds-per-window-4e8f</link>
      <guid>https://dev.to/ab_as_62eafcb6a433008952b/from-pythonpandas-to-rustc-taking-our-tick-simulation-from-140ms-to-microseconds-per-window-4e8f</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;We're a small ML lab building alpha models for a handful of partners. Our market &lt;strong&gt;simulation&lt;/strong&gt; loop — the part that keeps you honest about look-ahead bias — was 900–1300 ms per window in Python/Pandas, which made every experiment a 6–20 hour run. We went &lt;strong&gt;pandas → numpy → hand-written Rust + C++ models&lt;/strong&gt; and landed at 1–5 ms per window on a cheap cloud box (4–40 µs on a high-clock CPU). This is the honest engineering story, and an actual question at the end for anyone who does HFT/MM.&lt;/p&gt;

&lt;p&gt;Not a pitch — I'll explain why at the bottom.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: simulation, not latency
&lt;/h2&gt;

&lt;p&gt;Our whole training stack is Python: feature engineering → targets → training → backtests → and the one that actually matters, &lt;strong&gt;simulation&lt;/strong&gt; (strict, no look-ahead).&lt;/p&gt;

&lt;p&gt;Simulation is brutal on compute. On 1m/5m bars over years of history, a single run on a normal workstation took &lt;strong&gt;6–20 hours&lt;/strong&gt;. For each window we compute several hundred features, then run inference. Data → features → inference of one window was &lt;strong&gt;900–1300 ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We never cared about that latency for trading. We cared because &lt;strong&gt;every experiment took a day&lt;/strong&gt;, and I had a backlog of hypotheses to test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: pandas → numpy
&lt;/h2&gt;

&lt;p&gt;Being Python people, the first move was obvious: rip pandas out of the hot path and go numpy. Real win — &lt;strong&gt;~140 ms/window&lt;/strong&gt;. We could finally evaluate models across more angles.&lt;/p&gt;

&lt;p&gt;But rolling-window recomputation and allocation churn were still the ceiling, and 140 ms only let me run the basic experiments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: accepting the language was the wall
&lt;/h2&gt;

&lt;p&gt;My friend has written Rust for years and never shut up about it: &lt;em&gt;"your Python is nonsense, rewrite it in Rust."&lt;/em&gt; We argued for years about whether Rust is always worth it.&lt;/p&gt;

&lt;p&gt;This time I got it: no matter what CPU I throw at it, the GIL and Python's overhead cap me. There was no way up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Rust + C++
&lt;/h2&gt;

&lt;p&gt;Not fast, not easy — we rewrote &lt;strong&gt;every feature in Rust&lt;/strong&gt;, with &lt;strong&gt;O(1) incremental state per tick&lt;/strong&gt; instead of recomputing rolling windows. That single change killed both the allocation churn and the latency variance. Then we converted the &lt;strong&gt;models to a C++ engine AOT-compiled for the target CPU&lt;/strong&gt;, called over FFI.&lt;/p&gt;

&lt;p&gt;Results, full cycle, one window:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Latency/window&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python / pandas&lt;/td&gt;
&lt;td&gt;~140 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheap cloud box (vCPU)&lt;/td&gt;
&lt;td&gt;1–5 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-clock AMD test rig&lt;/td&gt;
&lt;td&gt;4–40 µs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Simulations that took hours now take minutes. The memory-leak whack-a-mole is gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I didn't expect
&lt;/h2&gt;

&lt;p&gt;The interesting outcome wasn't prod speed — it's the experiments this &lt;strong&gt;unlocked&lt;/strong&gt;. We can now run real &lt;strong&gt;tick-level simulation&lt;/strong&gt; (not a backtest) to test ideas we simply couldn't touch before, including some inspired by Michael Levin's work (bioelectric / collective-behavior stuff that turns out useful well beyond biology). In Python that was infeasible; in Rust it's basically bounded only by infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verify it yourself (no cherry-picked CSVs)
&lt;/h2&gt;

&lt;p&gt;We stream raw live signals to a public board. Every signal is written to public S3 at generation time, immutable, with a microsecond timestamp — so you can confirm there's &lt;strong&gt;no look-ahead&lt;/strong&gt;: &lt;code&gt;signal_gen_time &amp;gt; bar_time&lt;/code&gt;, for every single one. The demo box also reports its real inference latency (you'll see ms, not µs — cheap silicon, honest number).&lt;/p&gt;

&lt;h2&gt;
  
  
  Where we're NOT flexing
&lt;/h2&gt;

&lt;p&gt;We have real data-feed latency and &lt;strong&gt;zero colocation / kernel-bypass / exchange adjacency&lt;/strong&gt;. This is fast &lt;em&gt;compute&lt;/em&gt;, not a colocated HFT desk. Not pretending otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest question
&lt;/h2&gt;

&lt;p&gt;If anyone here actually runs &lt;strong&gt;HFT / market-making&lt;/strong&gt; in production: given fast compute but no colo (real feed latency), is any of this usable in prod? Our only idea so far is &lt;strong&gt;adverse-selection defense for market-making&lt;/strong&gt; — skew/pull quotes ahead of a microstructure move. We might be completely wrong. I'd love a reality check from someone who's actually done it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this isn't an ad
&lt;/h2&gt;

&lt;p&gt;We don't sell to retail, and I doubt there are buyers for this among readers here. I'm writing it because this community appreciates a real Rust-rewrite story and will tear bad engineering apart — which is exactly what I want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rust is cool. That's the post.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live board (raw signals + real inference latency): &lt;a href="https://livefinai.synlabs.pro/" rel="noopener noreferrer"&gt;https://livefinai.synlabs.pro/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rust</category>
      <category>performance</category>
      <category>showdev</category>
      <category>python</category>
    </item>
  </channel>
</rss>
