<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hpc group</title>
    <description>The latest articles on DEV Community by hpc group (@hpc_group_b579dc28b930e08).</description>
    <link>https://dev.to/hpc_group_b579dc28b930e08</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3858344%2F0507cdf5-0758-44e3-bcad-4a462ce49e48.png</url>
      <title>DEV Community: hpc group</title>
      <link>https://dev.to/hpc_group_b579dc28b930e08</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hpc_group_b579dc28b930e08"/>
    <language>en</language>
    <item>
      <title>How We Built a Sub-Millisecond Crypto Market Data Feed in C++</title>
      <dc:creator>hpc group</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:10:05 +0000</pubDate>
      <link>https://dev.to/hpc_group_b579dc28b930e08/how-we-built-a-sub-millisecond-crypto-market-data-feed-in-c-1oal</link>
      <guid>https://dev.to/hpc_group_b579dc28b930e08/how-we-built-a-sub-millisecond-crypto-market-data-feed-in-c-1oal</guid>
      <description>&lt;p&gt;Every crypto exchange speaks its own dialect. Binance sends &lt;code&gt;depthUpdate&lt;/code&gt; messages with &lt;code&gt;"b"&lt;/code&gt; and &lt;code&gt;"a"&lt;/code&gt; arrays. Coinbase wraps updates in a &lt;code&gt;channel&lt;/code&gt;/&lt;code&gt;type&lt;/code&gt; envelope. OKX gzip-compresses its WebSocket frames. Bybit uses a different snapshot synchronization protocol than any of them. If you want to build anything that consumes order book data from multiple exchanges, you are stuck writing and maintaining a bespoke parser for each one, each with its own reconnect logic, snapshot sync state machine, and symbol naming convention.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://microversesystems.com" rel="noopener noreferrer"&gt;Microverse&lt;/a&gt; to solve this problem: a single C++ pipeline that normalizes real-time order book data from 20 exchanges into a uniform stream, and serves it over a free WebSocket API. This article walks through the architecture, the hard engineering problems we hit, and the techniques we used to keep end-to-end latency under one millisecond.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipeline at a Glance
&lt;/h2&gt;

&lt;p&gt;The data path from exchange to client has seven stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Exchange WS → WS Driver → Parser → Book → SHM Ring → mdf_server → Gateway → Client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each exchange runs as its own handler process. The handler connects to the exchange, parses messages, maintains a local order book, and writes normalized updates into a shared-memory ring buffer. A central &lt;code&gt;mdf_server&lt;/code&gt; process reads from all 20 ring buffers and distributes updates to downstream consumers: a web dashboard, a WebSocket gateway for external clients, and internal analytics. No message broker. No serialization framework. Just lock-free shared memory and TCP.&lt;/p&gt;

&lt;p&gt;Let us walk through each stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: WebSocket Driver
&lt;/h2&gt;

&lt;p&gt;Each handler spawns an SSL WebSocket connection to its exchange. The driver (&lt;code&gt;mcast_websocket.cpp&lt;/code&gt;) handles the full lifecycle: TLS handshake, WebSocket frame decoding, ping/pong keepalives, and transparent decompression for exchanges that gzip their payloads (HTX, OKX, and others).&lt;/p&gt;

&lt;p&gt;When a complete text frame arrives, the driver writes the raw JSON into a buffer and tags it with a port number: &lt;code&gt;1&lt;/code&gt; for incremental depth updates, &lt;code&gt;2&lt;/code&gt; for snapshots. Every message is also written to a binary capture file (24-byte header plus JSON payload) so we can replay production traffic through the pipeline deterministically during development.&lt;/p&gt;

&lt;p&gt;The driver also runs a separate snapshot thread. On initial subscription or when a sequence gap is detected, it makes an HTTPS REST call to fetch a full book snapshot and pushes it into an internal message queue, which the main &lt;code&gt;recv()&lt;/code&gt; loop picks up on the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2: Parser (simdjson)
&lt;/h2&gt;

&lt;p&gt;Each exchange has a dedicated parser class (e.g., &lt;code&gt;BinanceParser&lt;/code&gt;, &lt;code&gt;CoinbaseParser&lt;/code&gt;, &lt;code&gt;KrakenParser&lt;/code&gt;) that implements a common interface: &lt;code&gt;processPacket(buffer, len, channel)&lt;/code&gt;. The parser's job is to extract price level updates from exchange-specific JSON and translate them into uniform &lt;code&gt;levelAdd&lt;/code&gt; / &lt;code&gt;levelDelete&lt;/code&gt; calls on the book.&lt;/p&gt;

&lt;p&gt;We use &lt;a href="https://simdjson.org/" rel="noopener noreferrer"&gt;simdjson&lt;/a&gt; for JSON parsing. It processes JSON at gigabytes per second using SIMD instructions, which matters when you are parsing hundreds of thousands of messages per second across all exchanges. One critical lesson we learned the hard way: &lt;strong&gt;simdjson's on-demand parser modifies the buffer in-place&lt;/strong&gt; during string unescaping. The escaped bytes &lt;code&gt;\"bids\"&lt;/code&gt; get rewritten to &lt;code&gt;bids\0...&lt;/code&gt;, destroying the structural quote characters. A second &lt;code&gt;iterate()&lt;/code&gt; call over the same buffer silently returns zero results. Every parser must do exactly one parse pass per buffer.&lt;/p&gt;

&lt;p&gt;The parser also handles the trickiest part of exchange integration: price normalization. All prices are converted to fixed-point integers with 8 decimal places. The string &lt;code&gt;"98500.12"&lt;/code&gt; becomes the integer &lt;code&gt;9850012000000&lt;/code&gt;. This eliminates floating-point comparison issues entirely and keeps the book operations branch-free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: Snapshot Synchronization
&lt;/h2&gt;

&lt;p&gt;Every exchange uses a variant of the same pattern: subscribe to a WebSocket stream of incremental updates, fetch a REST snapshot to establish a baseline, then apply only those incremental updates whose sequence numbers come after the snapshot.&lt;/p&gt;

&lt;p&gt;The devil is in the details. Binance gives you an &lt;code&gt;updateId&lt;/code&gt; on both snapshots and deltas; you buffer deltas until the snapshot arrives, discard any with &lt;code&gt;updateId &amp;lt;= snapshot.lastUpdateId&lt;/code&gt;, and apply the rest in order. If you detect a gap (&lt;code&gt;updateId != lastUpdateId + 1&lt;/code&gt;), you need to re-snapshot. OKX uses a &lt;code&gt;checksum&lt;/code&gt; field you can validate against. Coinbase has a completely different sequencing model.&lt;/p&gt;

&lt;p&gt;Each parser maintains per-symbol sync state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;SymbolState&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;snapshot_synced&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;needs_resnapshot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;seq_last_applied&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PendingUpdate&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pending_updates&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the parser detects a gap or stale data, it sets &lt;code&gt;needs_resnapshot = true&lt;/code&gt;. The handler's main loop polls for this via &lt;code&gt;popResnapshot()&lt;/code&gt; and triggers a new REST snapshot fetch. Until the snapshot arrives and sync is re-established, all incremental updates for that symbol are silently dropped. This is a deliberate design choice: we would rather show stale data for a fraction of a second than apply updates to a book that is out of sync, which would produce silently wrong prices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 4: The Order Book
&lt;/h2&gt;

&lt;p&gt;The book (&lt;code&gt;mdf_book.h&lt;/code&gt;) stores price levels in a sorted linked list per side (bid/ask). When a parser calls &lt;code&gt;levelAdd&lt;/code&gt;, the book finds or inserts the price level, updates its quantity, and calls a virtual &lt;code&gt;priceLevelChanged()&lt;/code&gt; callback. When &lt;code&gt;levelDelete&lt;/code&gt; is called (quantity goes to zero), the level is removed from the list and the same callback fires.&lt;/p&gt;

&lt;p&gt;The linked list uses a slab allocator (&lt;code&gt;SlabbedVector&lt;/code&gt;) rather than &lt;code&gt;std::vector&lt;/code&gt; to avoid pointer invalidation on growth. Slabs are allocated in fixed-size chunks (128 elements) and never freed until the container is destroyed. This gives us O(1) allocation, zero reallocation copies, and stable pointers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 5: Shared-Memory Ring Buffers
&lt;/h2&gt;

&lt;p&gt;This is where the latency story gets interesting. Each handler writes to a shared-memory ring buffer mapped at &lt;code&gt;/dev/shm/&amp;lt;exchange&amp;gt;_response&lt;/code&gt;. The ring is a single-producer, single-consumer (SPSC) lock-free queue implemented with two cache-aligned atomic counters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;Offset&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;Offset&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;   &lt;span class="c1"&gt;// reader position  (CACHE_ALIGNED)&lt;/span&gt;
&lt;span class="n"&gt;Offset&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;   &lt;span class="c1"&gt;// writer position   (CACHE_ALIGNED)&lt;/span&gt;
&lt;span class="n"&gt;Offset&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="o"&gt;+:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="n"&gt;MDFMsg&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reader and writer positions are on separate cache lines (64-byte aligned) to eliminate false sharing. The writer advances with &lt;code&gt;store(release)&lt;/code&gt;, the reader reads with &lt;code&gt;load(acquire)&lt;/code&gt;. There are no locks, no syscalls, and no kernel involvement in the hot path. A Linux futex is used only when the reader has no data and wants to sleep rather than spin.&lt;/p&gt;

&lt;p&gt;Messages are variable-length and written directly as packed C structs. A price level change is 48 bytes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;MDFPriceLevelChangeMsg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint16_t&lt;/span&gt;  &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// message size&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt;   &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// 42&lt;/span&gt;
    &lt;span class="n"&gt;secid_t&lt;/span&gt;   &lt;span class="n"&gt;secid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// symbol ID&lt;/span&gt;
    &lt;span class="n"&gt;side_t&lt;/span&gt;    &lt;span class="n"&gt;side&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// BID=0, ASK=1&lt;/span&gt;
    &lt;span class="n"&gt;price_t&lt;/span&gt;   &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// fixed-point, 8 decimals&lt;/span&gt;
    &lt;span class="kt"&gt;int64_t&lt;/span&gt;   &lt;span class="n"&gt;shares&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// quantity&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt;       &lt;span class="n"&gt;num_orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// order count at level&lt;/span&gt;
    &lt;span class="n"&gt;timestamp_t&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="c1"&gt;// nanoseconds&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No serialization, no deserialization. The &lt;code&gt;mdf_server&lt;/code&gt; reads the struct directly out of shared memory. This is true zero-copy: the data written by the handler is the exact byte layout read by the server.&lt;/p&gt;

&lt;p&gt;The handler also batches writes to amortize the cost of the atomic store. It accumulates messages in a local buffer and flushes to the shared-memory ring when a threshold is reached or the main loop goes idle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 6: mdf_server (Aggregator)
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;mdf_server&lt;/code&gt; process attaches to all 20 handler ring buffers and runs a tight poll loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for each ring:
    while ring has data:
        read MDFMsg from ring
        route to subscribed clients via TCP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It maintains a subscription table mapping symbol IDs to connected clients. When a web dashboard or gateway subscribes to &lt;code&gt;"binance:BTCUSDT"&lt;/code&gt;, the server writes a subscription request into the handler's &lt;em&gt;request&lt;/em&gt; ring (&lt;code&gt;/dev/shm/binance_request&lt;/code&gt;). The handler picks it up, fetches a snapshot, builds the initial book, and writes a full &lt;code&gt;MDFRefreshMsg&lt;/code&gt; (containing all bid and ask levels) back through the response ring. From that point on, incremental updates flow automatically.&lt;/p&gt;

&lt;p&gt;The server also handles heartbeats, connection management, and a subscription protocol that lets clients dynamically add and remove symbols.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 7: WebSocket Gateway
&lt;/h2&gt;

&lt;p&gt;The gateway (&lt;code&gt;mdf_gateway.cpp&lt;/code&gt;) connects to &lt;code&gt;mdf_server&lt;/code&gt; as an internal TCP client, maintains its own in-memory copy of every book it subscribes to, and serves external clients over WebSocket with JSON payloads. It supports per-exchange subscriptions, consolidated cross-exchange views, and top-of-book snapshots.&lt;/p&gt;

&lt;p&gt;The gateway includes an embedded HTML test page, so you can point a browser at it and immediately see live order books rendered with a cyberpunk-themed dashboard. But more practically, you can connect with any WebSocket client and get structured JSON updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Characteristics
&lt;/h2&gt;

&lt;p&gt;The pipeline achieves sub-millisecond end-to-end latency from exchange WebSocket receipt to client delivery. Here is where the time goes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSL read + WebSocket decode&lt;/strong&gt;: ~50-100us&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;simdjson parse + book update&lt;/strong&gt;: ~10-30us&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SHM ring write + read&lt;/strong&gt;: ~1-5us&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TCP send to gateway/viewer&lt;/strong&gt;: ~50-200us&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key design decisions that keep latency low:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No serialization layer.&lt;/strong&gt; Messages are packed C structs written directly to shared memory and read directly by the consumer. No protobuf, no flatbuffers, no JSON encoding between internal components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SPSC lock-free rings.&lt;/strong&gt; The only synchronization primitive in the hot path is a pair of atomic load/store operations on cache-aligned counters. No mutexes, no condition variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Slab allocation.&lt;/strong&gt; The order book never calls &lt;code&gt;malloc&lt;/code&gt; or &lt;code&gt;free&lt;/code&gt; in the hot path. Price levels are allocated from pre-allocated slabs that grow but never shrink.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fixed-point arithmetic.&lt;/strong&gt; All prices and quantities are 64-bit integers. No floating-point comparison, no rounding issues, no &lt;code&gt;epsilon&lt;/code&gt; checks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Per-exchange process isolation.&lt;/strong&gt; Each handler is a separate OS process. A crash or hang in the Kraken parser does not affect Binance. The &lt;code&gt;mdf_server&lt;/code&gt; simply stops seeing updates on that ring until the watchdog restarts the handler.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  20 Exchanges, One API
&lt;/h2&gt;

&lt;p&gt;The system currently normalizes data from: Ascendex, Binance, BingX, Bitfinex, Bitget, Bitmart, Bybit, Coinbase, CoinEx, Crypto.com, Gate.io, Gemini, HTX, Kraken, KuCoin, LBank, MEXC, OKX, Phemex, and Upbit. Each required writing a dedicated parser, figuring out its snapshot sync protocol, handling its compression scheme, and mapping its symbol naming convention to our normalized format.&lt;/p&gt;

&lt;p&gt;Adding a new exchange typically takes a day of work: study the WebSocket API docs, write the parser class, add snapshot sync logic, test against captures, and deploy. The driver, book, ring buffer, and distribution layers are all reusable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The WebSocket API is free and requires no authentication. Connect and subscribe to any symbol across any supported exchange:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://api.microversesystems.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onopen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Subscribe to BTC/USDT books from Binance and Coinbase&lt;/span&gt;
  &lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;op&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;subscribe&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;symbols&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;binance:BTCUSDT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;coinbase:BTC-USD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;book&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exchange&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;symbol&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`  Best bid: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt; @ &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`  Best ask: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;asks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt; @ &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;asks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see live order books from all 20 exchanges on the &lt;a href="https://data.microversesystems.com" rel="noopener noreferrer"&gt;dashboard&lt;/a&gt;, read the &lt;a href="https://microversesystems.com/docs" rel="noopener noreferrer"&gt;API documentation&lt;/a&gt;, or learn more at &lt;a href="https://microversesystems.com" rel="noopener noreferrer"&gt;microversesystems.com&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Building a low-latency market data feed is not about any single optimization. It is about eliminating unnecessary work at every stage: no serialization overhead, no lock contention, no memory allocation in the hot path, no floating-point arithmetic. Each decision compounds.&lt;/p&gt;

&lt;p&gt;The hardest part was not the C++ or the performance work. It was the exchange integration: 20 different WebSocket APIs, 20 different snapshot sync protocols, 20 different ways of naming &lt;code&gt;BTC/USDT&lt;/code&gt;. That is the real engineering work, and it is the reason we built this as a service rather than a library. You should not have to reverse-engineer Phemex's sequence number semantics just to get a clean order book.&lt;/p&gt;

&lt;p&gt;If you are building trading systems, analytics, or dashboards that need real-time crypto data, give the API a try. It is free, it is fast, and it covers 20 exchanges with a single WebSocket connection.&lt;/p&gt;

</description>
      <category>cryptocurrency</category>
      <category>marketdata</category>
      <category>l2</category>
      <category>api</category>
    </item>
    <item>
      <title>How We Built a Sub-Millisecond Crypto Feed in C++</title>
      <dc:creator>hpc group</dc:creator>
      <pubDate>Thu, 02 Apr 2026 21:13:22 +0000</pubDate>
      <link>https://dev.to/hpc_group_b579dc28b930e08/how-we-built-a-sub-millisecond-crypto-feed-in-c-57ml</link>
      <guid>https://dev.to/hpc_group_b579dc28b930e08/how-we-built-a-sub-millisecond-crypto-feed-in-c-57ml</guid>
      <description>&lt;p&gt;Most crypto market data APIs give you top-of-book prices with 100ms+ latency. We wanted full L2 order books from 21 exchanges, all normalized into a single WebSocket stream, at sub-millisecond speed. So we built it.&lt;/p&gt;

&lt;p&gt;This post covers the core engineering decisions behind &lt;a href="https://microversesystems.com" rel="noopener noreferrer"&gt;Microverse Systems&lt;/a&gt; — a free, real-time order book API that aggregates depth-of-market data across major crypto exchanges.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you're building a trading bot, an arbitrage scanner, or even a simple price dashboard, you hit the same wall: every exchange has its own WebSocket protocol, its own message format, its own rate limits. Binance sends JSON. Bybit sends JSON but structures it differently. Some exchanges batch updates, others stream individual changes.&lt;/p&gt;

&lt;p&gt;Normalizing all of this in Python or Node means you're spending more time parsing messages than actually using the data. And if latency matters to your strategy, the language overhead alone puts you at a disadvantage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why C++
&lt;/h2&gt;

&lt;p&gt;We went with C++ for the core feed handler, not because we enjoy debugging segfaults, but because it was the only way to hit our latency targets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-copy message parsing&lt;/strong&gt;: Incoming WebSocket frames are parsed in-place using pointer arithmetic rather than deserializing into intermediate objects. This avoids heap allocations on the hot path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock-free order book structures&lt;/strong&gt;: Each exchange's order book is maintained in a lock-free data structure that allows readers (subscriber threads) to access snapshots without blocking the writer (the feed handler thread).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kernel bypass networking&lt;/strong&gt;: On our production boxes, we use DPDK to bypass the kernel's TCP/IP stack entirely. This shaves off ~15 microseconds per packet compared to standard socket reads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is internal tick-to-publish latency under 50 microseconds for most exchanges. The bottleneck is almost always the exchange's own WebSocket server, not our processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Exchange WS Feeds ──► C++ Feed Handlers ──► Normalized Book Builder
                                                    │
                                                    ▼
                                            Snapshot Cache (shared memory)
                                                    │
                                            ┌───────┴───────┐
                                            ▼               ▼
                                    WebSocket Gateway   REST API
                                    (user-facing)       (historical)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each exchange gets its own feed handler process. These are independent — if Bybit's feed dies, it doesn't take down Binance. The handlers write normalized book updates into a shared-memory ring buffer that the WebSocket gateway reads from.&lt;/p&gt;

&lt;p&gt;The gateway fans out to subscribers. When a new client connects and requests, say, BTC/USDT on Binance, it gets an immediate full-depth snapshot from the cache, then a stream of incremental updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Normalization Layer
&lt;/h2&gt;

&lt;p&gt;This is where most of the complexity lives. Every exchange represents order books slightly differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Binance&lt;/strong&gt; sends an initial snapshot + diff updates with &lt;code&gt;firstUpdateId&lt;/code&gt; / &lt;code&gt;lastUpdateId&lt;/code&gt; for sequencing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bybit&lt;/strong&gt; sends periodic snapshots + delta updates with a sequence number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OKX&lt;/strong&gt; batches multiple instruments in a single message with checksums&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kraken&lt;/strong&gt; uses a completely different depth model with &lt;code&gt;republish&lt;/code&gt; flags&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our normalization layer maintains a state machine per exchange per instrument. It handles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Initial sync (requesting a snapshot, buffering diffs until the snapshot arrives)&lt;/li&gt;
&lt;li&gt;Sequence validation (detecting gaps and re-syncing)&lt;/li&gt;
&lt;li&gt;Cross normalization (converting all price/qty to the same decimal format)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We checksum the book state after every update and compare it against exchange-provided checksums where available (OKX, Kraken). If there's a mismatch, we force a full re-sync.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Ship to Users
&lt;/h2&gt;

&lt;p&gt;The API is intentionally simple. Connect via WebSocket, send a subscribe message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"subscribe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exchange"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"binance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BTC/USDT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"depth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get back a full snapshot, then a stream of incremental updates. All exchanges use the same message format — no need to learn 21 different APIs.&lt;/p&gt;

&lt;p&gt;No API key required. No rate limits on the WebSocket stream. We want this to be the easiest way to get institutional-grade market data without paying institutional-grade prices (it's free).&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Shared memory is underrated.&lt;/strong&gt; We initially tried passing data between the feed handlers and the gateway over Unix sockets. Switching to &lt;code&gt;mmap&lt;/code&gt;-backed ring buffers cut our internal latency by 10x and eliminated a whole class of backpressure issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exchange WebSocket connections are fragile.&lt;/strong&gt; We've seen Binance silently stop sending updates without closing the connection. We now have heartbeat monitors on every feed that force a reconnect if no message arrives within 2x the expected interval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't trust exchange timestamps.&lt;/strong&gt; Some exchanges report timestamps in seconds, some in milliseconds, some with timezone offsets, some without. We stamp everything with our own receive time and treat exchange timestamps as advisory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The API is live now at &lt;a href="https://microversesystems.com" rel="noopener noreferrer"&gt;microversesystems.com&lt;/a&gt;. The &lt;a href="https://microversesystems.com/docs" rel="noopener noreferrer"&gt;docs&lt;/a&gt; have code samples for Python, Node, and Rust. There's also a &lt;a href="https://data.microversesystems.com" rel="noopener noreferrer"&gt;live dashboard&lt;/a&gt; where you can see the order books updating in real time.&lt;/p&gt;

&lt;p&gt;If you're building anything that needs crypto market data — trading bots, analytics dashboards, academic research — give it a shot. We'd love feedback.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://microversesystems.com" rel="noopener noreferrer"&gt;Microverse Systems&lt;/a&gt;. Questions? Drop a comment or open an issue on our &lt;a href="https://github.com/microversesystems" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>crypto</category>
      <category>websocket</category>
      <category>api</category>
    </item>
  </channel>
</rss>
