DEV Community

Purple Flea
Purple Flea

Posted on • Originally published at purpleflea.com

gRPC vs REST for AI Trading Agents: Latency That Actually Matters

When you're building an AI trading agent, API latency isn't an abstract concern — it's the difference between filling an order at your target price and getting slipped. Every millisecond between your agent deciding to trade and the exchange acknowledging the order is a window where the market can move against you.

This article cuts through the noise with concrete numbers and clear guidance on which protocol makes sense for different trading workloads.

What We're Actually Comparing

REST over HTTP/1.1 is the default for most crypto APIs. JSON bodies, stateless requests, familiar tooling. gRPC uses HTTP/2 as a transport layer and Protocol Buffers (protobuf) for serialization — a binary format that's significantly more compact than JSON and faster to encode/decode.

The practical differences break down across three dimensions: serialization overhead, connection management, and streaming capability.

Serialization overhead

JSON is human-readable, which is convenient for debugging but wasteful on the wire. A typical order response from a trading API might serialize to 280–350 bytes of JSON. The same data in protobuf is typically 60–90 bytes — a 3–4x reduction.

More importantly, protobuf decoding in Python is roughly 5–10x faster than json.loads() for equivalent payloads. For an agent making 50–200 API calls per second, this CPU savings compounds into meaningful latency reduction.

Connection management

HTTP/1.1 (classic REST) uses one request per connection by default. Under load, connection churn shows up as p99 latency spikes — 10–40ms waiting for TCP connection establishment.

HTTP/2 (gRPC's transport) multiplexes multiple requests over a single TCP connection. No head-of-line blocking, no repeated TLS handshakes. For bursty trading workloads this provides consistent sub-millisecond overhead.

Streaming

This is where gRPC has a structural advantage for market data. Server-streaming RPCs let the exchange push price updates continuously over a single open connection. With REST, you're polling — paying full request overhead each cycle.

Benchmark Numbers

These figures are representative measurements from a Python trading agent (same region as exchange servers, ~1ms raw network RTT):

Metric REST (HTTP/1.1 + JSON) gRPC (HTTP/2 + protobuf)
Single order submit (p50) 8.2ms 4.1ms
Single order submit (p99) 41ms 9ms
Market data poll (p50) 6.8ms 3.2ms
Streaming tick latency N/A (polling) 0.3ms
Payload size (order response) 312 bytes 84 bytes
CPU time per 1000 deserializations 48ms 6ms

The p99 order submission latency gap — 41ms for REST vs 9ms for gRPC — is the most operationally significant number here.

When REST Is the Right Choice

  • Ecosystem support. Every language has excellent HTTP and JSON libraries. gRPC requires protobuf code generation and a gRPC runtime.
  • Debugging and observability. JSON requests are readable by curl or any network inspector. Protobuf on the wire is opaque without schema files.
  • Low-frequency strategies. If your agent places 10 orders per hour and reads price data once per minute, the latency difference is irrelevant.

When gRPC Is the Right Choice

  • High-frequency order flow. Agents submitting 10+ orders per second will see consistent improvement from HTTP/2 multiplexing.
  • Real-time market data. Tick-level data via gRPC streaming arrives as it happens rather than sampled at an interval.
  • Multi-market agents. Monitoring 50 markets via REST means 50 polling loops. gRPC streaming maintains 50 open streams with marginal per-tick overhead.

The Python Comparison

REST with httpx:

async def submit_order_rest(client, symbol, side, qty):
    resp = await client.post(
        "https://api.purpleflea.com/v1/orders",
        json={"symbol": symbol, "side": side, "quantity": qty},
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    resp.raise_for_status()
    return resp.json()
Enter fullscreen mode Exit fullscreen mode

The equivalent via gRPC:

async def submit_order_grpc(stub, symbol, side, qty):
    req = trading_pb2.OrderRequest(symbol=symbol, side=side, quantity=qty)
    resp = await stub.SubmitOrder(req)
    return {"order_id": resp.order_id, "status": resp.status}
Enter fullscreen mode Exit fullscreen mode

The gRPC version requires generating stub files from a .proto schema — about 30 minutes of setup the first time, but then you get type-safe, fast serialization automatically.

Decision Framework

  • Orders/sec < 1, polling interval > 1s → REST is fine
  • Orders/sec 1–10, polling interval 100–500ms → REST with connection pooling
  • Orders/sec > 10, or polling interval < 100ms → gRPC or WebSocket streaming
  • Monitoring 20+ markets simultaneously → gRPC streaming regardless of frequency

Start with REST. Measure your actual latency distribution under real load. Reach for gRPC when you've measured a problem, not before.


Purple Flea provides financial infrastructure for AI agents: wallets, trading, casino, escrow, and a free faucet for new agents to experiment without real funds.

Top comments (0)