TL;DR: Crypto arbitrage windows on liquid pairs now close in under 100 ms. A REST polling loop typically takes 1–1.5 seconds round-trip. WebSocket delivers the same data in 20–100 ms. If you're still polling REST endpoints for orderbook data in 2026, you're missing the majority of opportunities — not because your strategy is wrong, but because your data plane is fundamentally too slow.
This post walks through the math, shows a benchmark I ran on a handful of major exchanges, and provides production-grade Python code for a WebSocket client that handles reconnects, heartbeats, and orderbook reconstruction.
1. The numbers that broke REST polling
When I started writing crypto arbitrage bots a few years ago, polling Binance's REST API every 500 ms was perfectly acceptable. Spreads were wide, arbitrage windows lasted multiple seconds, and the orderbook for BTCUSDT moved slowly enough that a half-second-old snapshot was still tradeable.
In 2026, the same approach doesn't work. Here are the numbers as they stand today:
| Metric | Value |
|---|---|
| Median crypto arbitrage window on liquid pairs | 30–80 ms |
| Window closes in under 100 ms | ~90% of cases |
| REST round-trip latency (request → response → JSON parse) | 1.0–1.5 seconds |
| WebSocket update delivery latency (push from exchange to client) | 20–100 ms |
The math is brutal. A 100 ms window cannot be caught by a 1500 ms poll. By the time your REST response arrives, the orderbook you're reading is 15 cycles stale. You're not "slow" — you're not even in the same temporal universe as the event you're trying to react to.
2. Why REST is fundamentally slow
REST APIs over HTTPS carry overhead that adds up:
- TCP handshake — three packets to establish, typically 50–150 ms on intercontinental hops.
- TLS handshake — another full round-trip, 30–100 ms.
- HTTP request/response — the actual data exchange.
- JSON parse — depending on payload size, 5–50 ms.
- Rate-limit budget — most exchanges cap REST to 10–20 requests per second per IP. Polling faster gets you banned.
Yes, modern clients use HTTP keep-alive to avoid steps 1 and 2 on every request. But you still pay them periodically. And rate limits are the real killer — even if you could parse responses in 1 ms, the exchange will throttle you after 20 requests.
# This is what every REST polling loop looks like.
# Every. Single. Iteration. Pays full round-trip cost.
import time
import requests
def poll_orderbook(url, interval_ms=500):
while True:
start = time.perf_counter()
response = requests.get(url, timeout=2)
book = response.json()
elapsed_ms = (time.perf_counter() - start) * 1000
print(f"Got {len(book['bids'])} bids in {elapsed_ms:.0f} ms")
time.sleep(max(0, interval_ms / 1000 - elapsed_ms / 1000))
Running this against https://api.binance.com/api/v3/depth?symbol=BTCUSDT&limit=20 from a typical residential or VPS connection produces round-trip times of 800–1500 ms consistently. Best case: maybe 600 ms from a co-located server. Still 6× too slow for a 100 ms window.
3. WebSocket: push, not poll
WebSocket inverts the model. Instead of the client asking "what's the orderbook now?" twice a second and accepting that the answer is already stale, the client opens one persistent connection and the exchange pushes updates the instant they happen.
Concretely:
- One TCP/TLS handshake at connection time. Amortised across thousands of messages.
- One subscription message declaring what streams you want.
- A continuous stream of deltas flowing from server to client over the same connection.
- No rate limit on inbound messages (the exchange controls the rate).
The delivery latency on a properly-configured WebSocket client to a major crypto exchange is 20–100 ms, depending on geographic distance. That's the time between the exchange's matching engine processing an event and your code receiving the update. There is no polling overhead because there is no polling.
Here's the bare-minimum Python client for Binance's depth stream:
import asyncio
import json
import websockets
async def stream_depth(symbol="btcusdt"):
uri = f"wss://stream.binance.com:9443/ws/{symbol}@depth20@100ms"
async with websockets.connect(uri) as ws:
async for raw in ws:
update = json.loads(raw)
best_bid = float(update["bids"][0][0])
best_ask = float(update["asks"][0][0])
spread = best_ask - best_bid
print(f"bid={best_bid:.2f} ask={best_ask:.2f} spread={spread:.2f}")
asyncio.run(stream_depth())
Run this and you'll get updates every 100 ms (the slowest tier — Binance offers @100ms, @1000ms, and unthrottled real-time streams). Each update arrives with the new top-of-book state. No polling. No rate-limit risk. The connection stays open as long as your process runs.
4. A simple benchmark
Here's a script that measures REST round-trip vs WebSocket inter-message arrival time for the same orderbook data. It's not a perfect apples-to-apples comparison (REST gives a full snapshot; WebSocket gives a stream of updates), but it makes the order-of-magnitude difference impossible to miss.
import asyncio
import json
import statistics
import time
import requests
import websockets
# ----------------------------------------------------------------
# REST: measure round-trip for orderbook snapshot
# ----------------------------------------------------------------
def benchmark_rest(url, samples=50):
latencies = []
for _ in range(samples):
start = time.perf_counter()
r = requests.get(url, timeout=5)
r.json()
elapsed_ms = (time.perf_counter() - start) * 1000
latencies.append(elapsed_ms)
time.sleep(0.1) # respect rate limits
return {
"median_ms": round(statistics.median(latencies), 1),
"p90_ms": round(statistics.quantiles(latencies, n=10)[8], 1),
"p99_ms": round(max(latencies), 1),
}
# ----------------------------------------------------------------
# WebSocket: measure time between pushed updates
# ----------------------------------------------------------------
async def benchmark_websocket(uri, samples=50):
gaps = []
async with websockets.connect(uri) as ws:
last = time.perf_counter()
for _ in range(samples + 1): # +1 to discard the first
await ws.recv()
now = time.perf_counter()
gaps.append((now - last) * 1000)
last = now
gaps = gaps[1:] # discard first
return {
"median_ms": round(statistics.median(gaps), 1),
"p90_ms": round(statistics.quantiles(gaps, n=10)[8], 1),
"p99_ms": round(max(gaps), 1),
}
# ----------------------------------------------------------------
# Run both
# ----------------------------------------------------------------
if __name__ == "__main__":
rest_url = "https://api.binance.com/api/v3/depth?symbol=BTCUSDT&limit=20"
ws_uri = "wss://stream.binance.com:9443/ws/btcusdt@depth20@100ms"
print("REST:", benchmark_rest(rest_url, samples=50))
print("WebSocket:", asyncio.run(benchmark_websocket(ws_uri, samples=50)))
A representative run from a European VPS to Binance:
REST: {'median_ms': 920.4, 'p90_ms': 1180.5, 'p99_ms': 1485.2}
WebSocket: {'median_ms': 100.2, 'p90_ms': 105.1, 'p99_ms': 142.8}
The WebSocket median is the throttle setting (@100ms), not the underlying delivery latency — that's faster. The REST median is genuine round-trip cost. A 9× gap on the median; closer to 12× on the p99.
Switching to Binance's unthrottled depth stream (btcusdt@depth) drops the WebSocket median below 50 ms, widening the gap further.
5. The architectural shift
Moving from REST polling to WebSocket isn't just changing a library — it changes the architecture of your bot.
Before (REST polling):
┌─────────────────┐ ┌──────────────────┐
│ Poll loop │ ── │ Strategy engine │
│ every 500 ms │ │ runs on snapshot │
└─────────────────┘ └──────────────────┘
A single thread asks for state on a timer, hands the snapshot to the strategy, repeats. The strategy is stateless between polls — it has no idea what happened in the gap.
After (WebSocket event-driven):
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ WS connection │ ── │ Local orderbook │ ── │ Strategy engine │
│ pushes deltas │ │ kept current │ │ reacts to events│
└──────────────────┘ └──────────────────┘ └──────────────────┘
Now the client maintains a local replica of the orderbook, applying deltas as they arrive. The strategy engine reacts to specific events (a bid lifted, an ask hit, a spread widening past a threshold). State is continuous, not sampled.
This is more code. It's also the only way to react inside a 100 ms window.
6. Production-grade WebSocket client
The bare-minimum example earlier works for a demo. For a real arbitrage bot, you need:
- Automatic reconnect on disconnect
- Heartbeat / ping-pong to detect dead connections faster than the OS will
- Sequence number validation to detect dropped messages (most exchanges include a sequence ID)
- Local orderbook state that applies deltas correctly
- Backoff on reconnects to avoid hammering the exchange after an outage
Here's a more robust skeleton (Binance-style stream, simplified):
import asyncio
import json
import logging
import random
import websockets
log = logging.getLogger("ws_client")
class CryptoOrderbookClient:
def __init__(self, uri, on_update):
self.uri = uri
self.on_update = on_update
self._stop = False
self._backoff_s = 1
async def run(self):
while not self._stop:
try:
async with websockets.connect(
self.uri,
ping_interval=20, # heartbeat every 20 s
ping_timeout=10, # treat as dead after 10 s no pong
max_size=2**20, # 1 MB message cap
) as ws:
self._backoff_s = 1 # reset on successful connect
log.info("WS connected to %s", self.uri)
await self._consume(ws)
except (websockets.ConnectionClosed, OSError) as e:
log.warning("WS disconnected: %s; backing off %ss", e, self._backoff_s)
await asyncio.sleep(self._backoff_s + random.uniform(0, 1))
self._backoff_s = min(self._backoff_s * 2, 30) # exponential up to 30s
async def _consume(self, ws):
async for raw in ws:
try:
msg = json.loads(raw)
await self.on_update(msg)
except json.JSONDecodeError:
log.warning("non-JSON message dropped")
except Exception:
log.exception("handler crashed; continuing")
def stop(self):
self._stop = True
async def my_handler(msg):
bid = float(msg["bids"][0][0])
ask = float(msg["asks"][0][0])
# ... your strategy logic ...
print(f"bid={bid:.2f} ask={ask:.2f}")
client = CryptoOrderbookClient(
"wss://stream.binance.com:9443/ws/btcusdt@depth20@100ms",
my_handler,
)
asyncio.run(client.run())
Notes on what this gives you:
-
ping_interval=20, ping_timeout=10is the single most important pair of settings. Exchanges will silently drop your connection during network blips; the OS-level TCP timeout is minutes. Without explicit ping-pong, you'll think you're connected for ages while receiving nothing. With it, you detect the dead connection in ~30 s and reconnect. - Exponential backoff on reconnect prevents you from being the bot that DDoSes an exchange during their outage.
- Catch all handler exceptions at the top level. A bug in your strategy code should not kill the WebSocket loop and lose minutes of market data.
For orderbook reconstruction with full delta application and sequence-number gap detection, see the Binance WebSocket reference implementation — every major exchange has a similar document, and following it exactly is the only way to avoid subtle desync bugs.
7. What still needs REST
WebSocket replaces REST for market data. It does not replace REST for everything. Things that still belong on REST:
- Order placement and cancellation on most exchanges (some have WebSocket order entry; coverage is uneven).
- Account balance queries, position queries, fee tier lookups — infrequent enough that polling cost is irrelevant.
- Historical data fetches — REST is the right tool for "give me the last 1000 trades".
- One-shot administrative calls — withdrawals, API key management, etc.
A real arbitrage bot in 2026 typically runs a WebSocket data plane and a REST control plane side by side. Market events arrive on WebSocket, orders go out on REST (or WebSocket order entry where available).
8. What happens to retail bots that don't make this transition
A polling-based crypto arbitrage bot in 2026 isn't broken — it just runs into a degraded version of the problem:
- Signals fire slower because the bot only sees market state 2× per second.
- Most opportunities have already closed by the time the strategy reacts.
- Per-trade edge collapses as the bot consistently takes the worst price of the window.
- Win rate drops to the point where execution costs (fees + spread + slippage) exceed gross edge.
The strategy logic might be perfect. The execution layer is what kills it.
This is the same dynamic that broke retail latency arbitrage on forex brokers a decade ago — except in crypto the resolution is happening over months, not years, because every major exchange now offers WebSocket and the technical bar is lower. The asymmetry will only get worse: traders running on WebSocket are pulling away from traders running on REST.
9. Where I came from on this
For context: I'm one of the developers behind BJF Trading Group's crypto arbitrage software. We migrated the entire market-data path from REST polling to WebSocket through late 2025 and Q1 2026. Internal measurements showed that signal-to-fill rate improved by roughly an order of magnitude on liquid pairs, and that strategies which had become marginal under polling (especially cross-exchange Hedge and intra-exchange Latency) were viable again under WebSocket.
We rolled the lessons into a focused product configuration — SharpTrader Crypto — built specifically around WebSocket-native execution for crypto exchanges, with Latency and Hedge strategy modules. It also integrates with US-accepting exchanges (Coinbase, Kraken, Gemini, Bitstamp), which mattered to us because most retail crypto arbitrage tools rely on Binance/Bybit and geo-block US residents.
If you're maintaining your own bot, the rest of this article is everything you need to do the transition yourself. If you'd rather skip the connection-management plumbing and use something off-the-shelf, that's what we built.
10. TL;DR for the impatient
- Crypto arbitrage windows close in under 100 ms on liquid pairs.
- REST polling takes 1–1.5 seconds round-trip. You will miss most opportunities.
- WebSocket pushes updates in 20–100 ms. You will catch most opportunities.
- Migration is not optional in 2026.
- The minimum reliable client needs:
ping_interval,ping_timeout, exponential backoff, exception isolation, sequence-number validation. Skip any of these and the bot will silently lose minutes of market data when the connection blips. - Keep REST for order placement, balances, history.
- The strategy logic in your bot is probably fine. The data plane is what's killing it.
Further reading
- Why latency arbitrage backtests don't survive in production — the broader execution-time gap problem, applied to forex but the same logic carries over.
- BEQI: Open-source toolkit to audit broker execution quality — five-dimension measurement of execution quality, originally for forex but the same five dimensions matter on crypto.
- Forex pairs trading and statistical arbitrage explained — pairs trading pillar; the leg-risk dynamics it describes apply directly to spot–futures pair trading on crypto.
If you want to compare notes on connection management or have a horror story about a dropped WebSocket that silently cost you a session of fills, drop a comment below — these stories are how everyone in this niche gets better at it.

Top comments (0)