Bill Tu

Posted on Apr 8

From O(n log n) to O(log p): Rebuilding the Order Book for Performance

#java #matchengine

When we first built MatchEngine, the order book used a sorted slice. Every time an order was inserted, the entire slice was re-sorted. It was simple, correct, and easy to understand. It was also O(n log n) per insertion, which means it got slower with every order added to the book.

This article walks through the redesign: replacing the sorted slice with a price-level map backed by binary search insertion. The result is O(log p) insertion where p is the number of distinct price levels — typically 50-200 even when the book holds tens of thousands of orders.

The Original Design

The order book stored all orders in a flat slice, sorted by price-time priority:

type OrderBook struct {
    Bids []*model.Order  // highest price first
    Asks []*model.Order  // lowest price first
}

Insertion appended the order and re-sorted:

func (ob *OrderBook) AddOrder(order *model.Order) {
    ob.Asks = append(ob.Asks, order)
    sort.SliceStable(ob.Asks, func(i, j int) bool {
        if ob.Asks[i].Price.Equal(ob.Asks[j].Price) {
            return ob.Asks[i].Timestamp.Before(ob.Asks[j].Timestamp)
        }
        return ob.Asks[i].Price.LessThan(ob.Asks[j].Price)
    })
}

sort.SliceStable is O(n log n). For a book with 10,000 orders, every single insertion triggers a comparison-sort across all 10,000 elements. The 10,001st order is just as expensive to insert as rebuilding the entire book from scratch.

Cancellation was O(n) too — a linear scan to find the order by ID:

func removeFromSlice(orders []*model.Order, id string) []*model.Order {
    for i, o := range orders {
        if o.ID == id {
            return append(orders[:i], orders[i+1:]...)
        }
    }
    return orders
}

This was fine for a prototype. But the issue was clear: the data structure didn't match the access pattern.

The Access Pattern

A matching engine's order book has a very specific access pattern:

Orders cluster at price levels. A book with 10,000 orders might have only 100 distinct prices.
The most frequent operations are insert and best-price lookup.
Orders at the same price are processed in FIFO order.
Removal by ID needs to be fast (for cancellation).

The sorted slice treats every order as an independent element. It doesn't exploit the fact that orders group by price. It re-sorts everything even when the new order's price already exists in the book.

The New Design: Price-Level Map

The new architecture has three layers:

OrderBook
├── bids: orderSide (highest price first)
├── asks: orderSide (lowest price first)
└── orders: map[string]*Order (ID index)

orderSide
├── prices: []decimal.Decimal (sorted, binary search)
├── levels: map[string]*priceLevel (price -> queue)
└── count: int

priceLevel
└── orders: []*Order (FIFO queue)

Each side of the book (orderSide) maintains:

A sorted slice of distinct price keys
A map from price string to a priceLevel
A total order count

Each priceLevel is a simple FIFO queue — a slice of orders at that exact price, in arrival order.

Why This Structure?

It matches the access pattern exactly:

Insert at an existing price: O(1) map lookup + O(1) append to the queue. No sorting.
Insert at a new price: O(log p) binary search to find the insertion point in the price slice + O(p) shift to make room. This happens rarely compared to inserts at existing prices.
Best price: O(1) — it's prices[0].
Cancel by ID: O(1) map lookup to find the order, then O(k) scan within its price level (where k is typically small).

The Implementation

priceLevel: The FIFO Queue

The simplest component. Each price level holds orders in arrival order:

type priceLevel struct {
    orders []*model.Order
}

func newPriceLevel() *priceLevel {
    return &priceLevel{orders: make([]*model.Order, 0, 4)}
}

func (pl *priceLevel) append(order *model.Order) {
    pl.orders = append(pl.orders, order)
}

func (pl *priceLevel) front() *model.Order {
    if len(pl.orders) == 0 {
        return nil
    }
    return pl.orders[0]
}

The initial capacity of 4 is a small optimization — most price levels have a handful of orders, and this avoids the first few slice growths.

The removeFilled method uses an in-place compaction pattern to avoid allocating a new slice:

func (pl *priceLevel) removeFilled() int {
    n := 0
    for _, o := range pl.orders {
        if !o.IsFilled() {
            pl.orders[n] = o
            n++
        }
    }
    pl.orders = pl.orders[:n]
    return n
}

This overwrites filled orders with unfilled ones from left to right, then truncates. Zero allocations, single pass.

orderSide: Sorted Prices + Level Map

This is where the interesting logic lives. Each side of the book is an orderSide:

type orderSide struct {
    prices []decimal.Decimal
    levels map[string]*priceLevel
    less   func(a, b decimal.Decimal) bool
    count  int
}

The less function determines sort order — GreaterThan for bids (highest first), LessThan for asks (lowest first). This lets both sides share the same code with different comparators.

Insertion: Binary Search

The core of the optimization. When an order arrives:

func (s *orderSide) insert(order *model.Order) {
    key := order.Price.String()
    pl, exists := s.levels[key]
    if !exists {
        pl = newPriceLevel()
        s.levels[key] = pl
        idx := sort.Search(len(s.prices), func(i int) bool {
            return s.less(order.Price, s.prices[i]) || order.Price.Equal(s.prices[i])
        })
        // Insert price at idx
        s.prices = append(s.prices, decimal.Zero)
        copy(s.prices[idx+1:], s.prices[idx:])
        s.prices[idx] = order.Price
    }
    pl.append(order)
    s.count++
}

Two paths:

Path 1: Price level exists (common case). Map lookup is O(1). Append to the queue is O(1) amortized. Total: O(1).

Path 2: New price level (rare case). sort.Search does a binary search over the price slice — O(log p). Then we insert the price at the found index using the standard Go slice insertion pattern (append + copy). The copy is O(p) in the worst case, but p is the number of distinct prices, not the number of orders.

In a real order book, most orders arrive at prices that already exist. The common path is O(1). The rare path is O(log p + p). Compared to the old O(n log n) on every insert, this is a massive improvement.

Best Price: O(1)

func (s *orderSide) best() *model.Order {
    if len(s.prices) == 0 {
        return nil
    }
    key := s.prices[0].String()
    return s.levels[key].front()
}

The best price is always prices[0]. The best order at that price is the front of the FIFO queue. Two lookups, both O(1).

Removal

When an order is cancelled, the engine already knows the order (via the order ID index). The orderbook looks up the price level and removes the order from the queue:

func (s *orderSide) remove(order *model.Order) {
    key := order.Price.String()
    pl, exists := s.levels[key]
    if !exists {
        return
    }
    if pl.remove(order.ID) {
        s.count--
        if pl.len() == 0 {
            s.removePrice(key, order.Price)
        }
    }
}

If the price level becomes empty after removal, it is cleaned up — deleted from the map and removed from the price slice. This keeps the data structure compact.

Cleanup: removeFilled

After each matching cycle, filled orders need to be removed. The new implementation scans each price level, compacts it in-place, and collects empty levels for removal:

func (s *orderSide) removeFilled() []*model.Order {
    var filled []*model.Order
    pricesToRemove := make([]int, 0)

    for i, price := range s.prices {
        key := price.String()
        pl := s.levels[key]
        for _, o := range pl.allOrders() {
            if o.IsFilled() {
                filled = append(filled, o)
                s.count--
            }
        }
        remaining := pl.removeFilled()
        if remaining == 0 {
            delete(s.levels, key)
            pricesToRemove = append(pricesToRemove, i)
        }
    }

    // Remove empty price levels in reverse order to preserve indices
    for i := len(pricesToRemove) - 1; i >= 0; i-- {
        idx := pricesToRemove[i]
        s.prices = append(s.prices[:idx], s.prices[idx+1:]...)
    }

    return filled
}

The reverse-order removal of empty price indices is a subtle but important detail. Removing index 3 before index 7 would shift index 7 to 6, invalidating the stored index. Removing in reverse (7 first, then 3) avoids this.

The OrderBook: Putting It Together

The public OrderBook struct wires the two sides together:

type OrderBook struct {
    bids   *orderSide
    asks   *orderSide
    orders map[string]*model.Order
}

func New() *OrderBook {
    return &OrderBook{
        bids: newOrderSide(func(a, b decimal.Decimal) bool {
            return a.GreaterThan(b)  // highest first
        }),
        asks: newOrderSide(func(a, b decimal.Decimal) bool {
            return a.LessThan(b)     // lowest first
        }),
        orders: make(map[string]*model.Order),
    }
}

The orders map provides O(1) lookup by ID, used by RemoveOrder (cancellation) and HasOrder (duplicate detection).

One notable change: Bids and Asks are no longer public fields. They are now methods that return flattened slices:

func (ob *OrderBook) Bids() []*model.Order {
    return ob.bids.flatten()
}

This is a deliberate API change. The internal representation is no longer a flat slice, so exposing it as one would either leak the abstraction or require materializing the slice on every access. Making it a method makes the cost explicit.

Complexity Comparison

Operation	Sorted Slice (before)	Price-Level Map (after)
AddOrder (existing price)	O(n log n)	O(1)
AddOrder (new price)	O(n log n)	O(log p + p)
RemoveOrder	O(n)	O(1) + O(k)
BestBid / BestAsk	O(1)	O(1)
RemoveFilled	O(n)	O(n)
Snapshot	O(n)	O(n)

Variables: n = total orders, p = distinct price levels, k = orders at a given price.

The key insight is that p is much smaller than n. A liquid market might have 10,000 resting orders spread across 100 price levels. The old design paid O(10000 × log 10000) ≈ 130,000 comparisons per insert. The new design pays O(1) for the common case (existing price) or O(log 100 + 100) ≈ 107 operations for a new price level.

What Didn't Change

The matching algorithm is completely untouched. It still calls BestAsk(), checks the price, calls executeTrade(), and loops. The engine code changed only in one place — cleanFilled now calls book.Bids() and book.Asks() methods instead of accessing public fields.

The test suite passes without modification to test logic (only field access syntax changed to method calls). This is the benefit of a clean abstraction boundary — the order book's public API (AddOrder, RemoveOrder, BestBid, BestAsk, Depth, Spread) stayed the same.

Trade-offs

Nothing is free:

Memory: The price-level map uses more memory than a flat slice. Each price level has its own slice header, and the map has per-bucket overhead. For a book with 100 price levels and 10,000 orders, the overhead is roughly 100 × (slice header + map entry) ≈ a few KB. Negligible.

Flatten cost: Bids() and Asks() now allocate a new slice and copy all orders into it. The old design returned the slice directly (O(1)). This matters for Snapshot(), which calls flatten() twice. For a 10,000-order book, that is two allocations of 10,000 pointers. Acceptable for a snapshot operation that is already O(n).

Complexity: The code is more complex. Three types (orderSide, priceLevel, OrderBook) instead of one. The binary search insertion with slice shifting is trickier than sort.SliceStable. But the complexity is localized — the matching engine doesn't know or care about price levels.

Benchmarks

We added three benchmarks to measure the impact:

func BenchmarkInsertOnly(b *testing.B) {
    e := New(WithMaxTradeLog(0))
    price := decimal.NewFromInt(100)
    qty := decimal.NewFromInt(1)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        id := fmt.Sprintf("s%d", i)
        e.SubmitOrder("BTC", model.NewLimitOrder(id, model.Sell, price, qty))
    }
}

BenchmarkInsertOnly: All orders at the same price. Tests the O(1) common path.
BenchmarkInsertAndMatch: Alternating buy/sell at the same price. Tests insert + match + cleanup.
BenchmarkInsertMultiplePriceLevels: Orders spread across 100 price levels. Tests the O(log p) path.

Run with:

go test -bench=. -benchmem ./pkg/engine/

The single-price insert benchmark is the most dramatic improvement. With the old sorted slice, each insert re-sorted the growing slice. With the new design, each insert is a map lookup + slice append — constant time regardless of book depth.

Conclusion

The optimization boils down to one idea: exploit the structure of the data. Orders in a matching engine cluster by price. A flat sorted list ignores this structure and pays O(n log n) to maintain global order. A price-level map respects the structure and pays O(1) for the common case.

The change touched two files (orderbook.go rewritten, pricelevel.go added) and required only syntax changes in consumers (book.Bids → book.Bids()). The matching algorithm, the engine, and all existing tests remained functionally identical.

For most open-source matching engines, this is the right level of optimization. It eliminates the obvious bottleneck without introducing the complexity of a red-black tree or skip list. If profiling later shows that the O(p) slice shift during new-price insertion is a problem, the prices slice can be replaced with a balanced tree — but that is a localized change within orderSide, invisible to the rest of the system.

GitHub: https://github.com/iwtxokhtd83/MatchEngine
Release: v0.3.0
Issue: #4 — Order book uses O(n log n) sort on every insert

DEV Community