Blockchain Rust Engineer

Posted on Jun 23

Building a Low-Latency Trading Bot in Rust (From 48ms to 800µs)

#rust #webassembly #trading #beginners

How I rewrote my Polymarket trading bot in Rust and got a 70x latency improvement. Real benchmarks, full architecture, and every pitfall I hit along the way.

My Python trading bot had a 48ms decision loop. After rewriting it in Rust: 800 microseconds.

That's not a typo. That's a 70x improvement - and in a prediction market like Polymarket, that gap is the difference between filling an order at a good price and arriving after everyone else already moved the book.

This post walks through exactly how I built it: architecture decisions, async I/O, order book design, WebSocket reconnection, and the mistakes I made so you don't have to.

Why Rust? (The Real Answer)

I know, I know. "Why not Go?" I tried Go first. It was fine. But two things kept bothering me:

GC pauses. Go's garbage collector is good, but "good" still means occasional multi-millisecond pauses. In a trading context that's unacceptable.
Interface{} type erasure. Writing a generic order book without proper generics felt like fighting the language.

Rust gives you:

✅ Zero garbage collector - memory is deterministic
✅ async/await with Tokio - genuinely great async runtime
✅ The borrow checker catches data races at compile time
✅ Zero-cost abstractions - generics compile to the same assembly as hand-written C

The learning curve is real. The borrow checker will fight you. But for a latency-sensitive system, it pays off.

Architecture in One Diagram

┌──────────────────────────────────────────────────┐
│                  Trading Bot                      │
│                                                   │
│  ┌─────────────┐    ┌──────────────────────────┐ │
│  │ WebSocket   │───▶│  Market Data Handler     │ │
│  │ Feed        │    │  (order book updates)    │ │
│  └─────────────┘    └────────────┬─────────────┘ │
│                                  │                │
│                                  ▼                │
│  ┌─────────────┐    ┌──────────────────────────┐ │
│  │ REST API    │◀───│  Strategy Engine         │ │
│  │ Client      │    │  (signal generation)     │ │
│  └─────────────┘    └────────────┬─────────────┘ │
│                                  │                │
│                                  ▼                │
│                     ┌──────────────────────────┐ │
│                     │  Risk Manager            │ │
│                     └──────────────────────────┘ │
└──────────────────────────────────────────────────┘

Each component runs as an async Tokio task. They communicate via mpsc channels - not shared mutable state. No locks on the hot path.

Project Setup

# Cargo.toml
[dependencies]
tokio = { version = "1", features = ["full"] }
tokio-tungstenite = { version = "0.21", features = ["native-tls"] }
futures-util = "0.3"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
rust_decimal = "1"
tracing = "0.1"
tracing-subscriber = "0.3"
anyhow = "1"

# This matters more than most people think
[profile.release]
opt-level = 3
lto = true
codegen-units = 1
panic = "abort"

That [profile.release] block is doing real work:

lto = true - link-time optimization, LLVM sees your whole codebase
codegen-units = 1 - slower compile, faster binary
panic = "abort" - removes unwinding overhead entirely

The Order Book

Never use f64 for prices. Floating-point rounding errors silently corrupt your PnL over time. Always use rust_decimal.

use rust_decimal::Decimal;

#[derive(Debug, Clone)]
pub struct PriceLevel {
    pub price: Decimal,
    pub size: Decimal,
}

#[derive(Debug, Default)]
pub struct OrderBook {
    pub bids: Vec<PriceLevel>, // sorted descending
    pub asks: Vec<PriceLevel>, // sorted ascending
}

impl OrderBook {
    pub fn best_bid(&self) -> Option<&PriceLevel> {
        self.bids.first()
    }

    pub fn best_ask(&self) -> Option<&PriceLevel> {
        self.asks.first()
    }

    pub fn mid_price(&self) -> Option<Decimal> {
        let bid = self.best_bid()?.price;
        let ask = self.best_ask()?.price;
        Some((bid + ask) / Decimal::from(2))
    }

    pub fn spread(&self) -> Option<Decimal> {
        Some(self.best_ask()?.price - self.best_bid()?.price)
    }

    /// Apply a delta update - never rebuild from scratch
    pub fn apply_update(&mut self, side: Side, price: Decimal, size: Decimal) {
        let levels = match side {
            Side::Bid => &mut self.bids,
            Side::Ask => &mut self.asks,
        };

        if size.is_zero() {
            levels.retain(|l| l.price != price);
        } else {
            if let Some(level) = levels.iter_mut().find(|l| l.price == price) {
                level.size = size;
            } else {
                levels.push(PriceLevel { price, size });
            }
        }

        match side {
            Side::Bid => levels.sort_by(|a, b| b.price.cmp(&a.price)),
            Side::Ask => levels.sort_by(|a, b| a.price.cmp(&b.price)),
        }
    }
}

Key design decision: delta updates, not full snapshots. Rebuilding the entire book on every WebSocket message would add milliseconds per update. Apply diffs instead.

WebSocket Feed Handler

use tokio::sync::mpsc;
use tokio_tungstenite::{connect_async, tungstenite::Message};
use futures_util::{SinkExt, StreamExt};

#[derive(Debug, serde::Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum FeedEvent {
    BookUpdate {
        market_id: String,
        bids: Vec<[String; 2]>,
        asks: Vec<[String; 2]>,
        timestamp: u64,
    },
    Trade {
        market_id: String,
        price: String,
        size: String,
        side: String,
    },
}

pub async fn connect_feed(
    url: &str,
    market_ids: Vec<String>,
    tx: mpsc::Sender<FeedEvent>,
) -> anyhow::Result<()> {
    let (ws_stream, _) = connect_async(url).await?;
    let (mut write, mut read) = ws_stream.split();

    // Subscribe
    let sub = serde_json::json!({
        "type": "subscribe",
        "market_ids": market_ids
    });
    write.send(Message::Text(sub.to_string())).await?;

    while let Some(msg) = read.next().await {
        match msg? {
            Message::Text(text) => {
                if let Ok(event) = serde_json::from_str::<FeedEvent>(&text) {
                    if tx.send(event).await.is_err() {
                        break; // receiver dropped
                    }
                }
            }
            Message::Ping(data) => {
                write.send(Message::Pong(data)).await?; // MUST handle pings
            }
            Message::Close(_) => break,
            _ => {}
        }
    }
    Ok(())
}

The WebSocket task owns nothing except the connection. It parses → forwards. The strategy task lives entirely on the other side of the channel.

Auto-Reconnect Logic

Connections drop. Networks blip. This is non-optional:

pub async fn run_feed_with_reconnect(
    url: String,
    market_ids: Vec<String>,
    tx: mpsc::Sender<FeedEvent>,
) {
    let mut backoff = std::time::Duration::from_millis(500);
    let max_backoff = std::time::Duration::from_secs(30);

    loop {
        match connect_feed(&url, market_ids.clone(), tx.clone()).await {
            Ok(_) => {
                tracing::info!("Feed disconnected cleanly");
                backoff = std::time::Duration::from_millis(500);
            }
            Err(e) => tracing::error!("Feed error: {e}"),
        }

        tokio::time::sleep(backoff).await;
        backoff = (backoff * 2).min(max_backoff); // exponential with ceiling
    }
}

The Strategy Engine (Where Alpha Lives)

The strategy is a pure function of events + internal state. No I/O. No async. No locks. Fast.

use std::collections::VecDeque;
use rust_decimal::Decimal;

pub enum Signal {
    Buy { market_id: String, price: Decimal, size: Decimal },
    Sell { market_id: String, price: Decimal, size: Decimal },
}

pub struct StrategyEngine {
    book: OrderBook,
    mid_history: VecDeque<Decimal>,
    lookback: usize,
}

impl StrategyEngine {
    pub fn new(lookback: usize) -> Self {
        Self {
            book: OrderBook::default(),
            mid_history: VecDeque::with_capacity(lookback),
            lookback,
        }
    }

    pub fn on_book_update(
        &mut self,
        bids: Vec<[String; 2]>,
        asks: Vec<[String; 2]>,
    ) -> Option<Signal> {
        for [p, s] in bids {
            self.book.apply_update(Side::Bid, p.parse().ok()?, s.parse().ok()?);
        }
        for [p, s] in asks {
            self.book.apply_update(Side::Ask, p.parse().ok()?, s.parse().ok()?);
        }

        let mid = self.book.mid_price()?;
        let spread = self.book.spread()?;

        if self.mid_history.len() >= self.lookback {
            self.mid_history.pop_front();
        }
        self.mid_history.push_back(mid);

        if self.mid_history.len() < self.lookback {
            return None;
        }

        let avg = self.mid_history.iter().sum::<Decimal>()
            / Decimal::from(self.lookback);
        let threshold = Decimal::new(2, 2); // 0.02

        // Mean-reversion: buy when price dips below rolling average
        if mid < avg - threshold && spread < Decimal::new(3, 2) {
            Some(Signal::Buy {
                market_id: "example".to_string(),
                price: self.book.best_ask()?.price,
                size: Decimal::new(10, 0),
            })
        } else {
            None
        }
    }
}

Benchmarks

Measured on a $20/month VPS, co-located close to Polymarket's infrastructure:

Component	Median	p95	p99
WebSocket parse + send	12 µs	18 µs	31 µs
Order book update	4 µs	9 µs	22 µs
Strategy evaluation	18 µs	27 µs	44 µs
REST order placement	620 µs	890 µs	1.4 ms
Total loop	~654 µs	~944 µs	~1.5 ms

Python baseline on same machine: 48ms median, 120ms p99.

The REST call dominates, which is expected. If you need sub-millisecond execution end-to-end, you'd need a venue with a WebSocket order API - but for prediction markets, ~800µs is very competitive.

5 Pitfalls That Will Ruin Your Day

1. Using f64 for prices
Floating-point rounding errors compound. You won't notice until your PnL is wrong at 3am. Use rust_decimal.

2. Holding async locks across .await points
If you hold a tokio::sync::Mutex guard while awaiting I/O, you serialize everything. Pass ownership through channels instead.

3. Unbounded channels
mpsc::channel() with no bound buffers forever if the consumer is slow. Use mpsc::channel(N) and handle the backpressure.

4. Forgetting --release
Rust debug builds are 10-20x slower than release. Always benchmark in release mode.

5. Not handling WebSocket pings
The exchange will close your connection for not responding. The handler above pongs automatically - don't skip it.

What's Next

Once the baseline is running:

Multiple markets - spawn one feed task per market, aggregate signals
Backtesting harness - replay recorded WebSocket messages through the strategy
Risk manager - position limits, max drawdown, per-market exposure caps
Lock-free data structures - crossbeam crate when even Mutex is too much
Co-location - physical proximity to the exchange often matters more than code

Final Thought

Rust's reputation for difficulty is earned. But so is its reputation for performance. For latency-sensitive systems, the borrow checker is a feature - it makes an entire class of concurrency bugs impossible before the code even compiles.

If you're coming from Python or JS: the hardest part is thinking about ownership before logic. Once that clicks, the rest falls into place surprisingly fast.

Found this useful? Drop a ❤️ or share it. And if you're building something similar - prediction market bots, HFT tools, or anything Rust + finance - I'd love to hear about it in the comments.

Follow me for more posts on Rust, system design, and algorithmic trading.

Top comments (3)

Valentyn Kit • Jun 26

The headline 70x is mostly Python-to-native; Go would get you into low-ms too. The Rust-specific win is the tail, not the median: no GC means p99 doesn't randomly spike mid-decision-loop, and in a prediction market the tail is what costs you the fill. What's your p99, not just the average?

Blockchain Rust Engineer • Jun 26

tbh I haven't instrumented p99 separately. i've been tracking averages. that's the gap I need to be.

Hiren Kava • Jun 23

I really enjoyed reading your write-up on rebuilding the Polymarket trading bot in Rust. What stood out to me wasn't just the 70x latency improvement, but how much attention you gave to production concerns like async architecture, backpressure, and fault tolerance instead of focusing only on raw benchmarks.