Sumana

Posted on Mar 22

Building a Low-Latency Trading Engine in Rust

#rust #architecture #tokio #webdev

When I started building a perpetual futures engine in Rust, the requirements looked straightforward:

Binance streams price updates
Users constantly check balances
Liquidations happen automatically

All of this needs to run at the same time, with low latency and zero mistakes.

Naturally, I started with a Mutex.
It felt like the safe, obvious choice.

But that assumption didn’t last long.

The First Problem: Reads Were Slower Than They Should Be

In this system, most operations are reads:

balance checks
position queries

Writes (like price updates) are much less frequent.

But with a Mutex, everything queues behind everything.

So even simple reads were waiting behind writes—and worse, sometimes behind network delays.

That’s when it became clear:

I was treating all operations equally, even though they aren’t.

Switching to RwLock

The first real improvement was replacing Mutex with RwLock.

let engine = Arc::new(RwLock::new(Engine::new(1000.0)));

Now:

Multiple readers can access the state at the same time
Writers still get exclusive access

// Read
let engine = engine.read().await;

// Write
let mut engine = engine.write().await;

This change alone removed the biggest bottleneck.

Reads stopped blocking each other, and latency dropped significantly.

The Next Issue: Blocking vs Yielding

Even with RwLock, there’s another subtle issue.

If you use blocking locks, the entire thread waits.

That means while one task is waiting:

no other tasks can run
the thread is effectively idle

With async:

let engine = engine.write().await;

The task yields instead of blocking.

That allows:

other API calls to run
other tasks to progress

This is what makes it possible to handle a large number of concurrent tasks efficiently.

Network I/O Was Still a Problem

Even after fixing locks, something still felt off.

The engine was directly tied to the WebSocket feed.

And network behavior is unpredictable:

sometimes fast
sometimes slow
sometimes delayed

That means your core engine inherits that unpredictability.

Decoupling with MPSC

The fix here was to separate concerns using a channel.

Instead of processing prices directly:

WebSocket → Channel → Engine

let (tx, mut rx) = tokio::sync::mpsc::channel(100);

The WebSocket task just sends updates
The engine processes them independently

This removes network jitter from the critical path.

The engine becomes more predictable, even if the network isn’t.

Sharing State Across Tasks

At this point, multiple parts of the system needed access:

WebSocket task
API handlers
liquidation logic

Rust doesn’t allow multiple owners by default, so this needs to be explicit.

That’s where Arc comes in:

let engine = Arc::new(RwLock::new(Engine::new(1000.0)));

Each task gets a clone:

let engine_clone = engine.clone();

Now everything shares the same state safely.

Financial Accuracy: Why f64 Doesn’t Work

This is one of those things that seems small but isn’t.

Using f64:

1000.0 - 0.1 - 0.1 - 0.1
= 999.7000000000001

That error might look tiny, but in a trading system:

it accumulates
it affects PnL
it can break liquidation logic

So instead:

use rust_decimal::Decimal;

Now calculations are exact.

No rounding surprises.

Handling Errors Without Crashing

One last thing: reliability.

In a system like this, a panic isn’t just a bug—it’s downtime.

So instead of:

.unwrap()

Everything returns a Result<T, E>:

let position = self.positions.get(&id)
    .ok_or("Position not found")?;

Errors are handled and propagated, not ignored.

Putting It Together

At a high level, the system looks like this:

WebSocket → MPSC Channel → Engine Processor
                                ↓
                         Arc<RwLock<Engine>>
                                ↓
                  API + Reads + Writes + Liquidations

Each part solves a specific problem:

RwLock → efficient reads
async/await → no thread blocking
MPSC → isolates network behavior
Arc → shared ownership
Decimal → exact calculations
Result<T, E> → reliability

Final Thought

None of these choices are “fancy.”

They’re just responses to real constraints:

high read volume
unpredictable I/O
strict correctness requirements

Once those constraints are clear, the architecture almost designs itself.

DEV Community