DEV Community

Cover image for Building a Real-Time Oracle Latency Bot for Polymarket with Python and asyncio
Jonathan Peterson
Jonathan Peterson

Posted on

Building a Real-Time Oracle Latency Bot for Polymarket with Python and asyncio

I recently open sourced a trading bot that exploits a timing gap on Polymarket 15 minute crypto markets. I am not going to rehash the strategy here (there is a full writeup in the repo if you are curious), but I do want to talk about the engineering side because I ran into some fun problems.

What the bot actually needs to do

So here is the list of things happening at the same time:

  1. Hold open a WebSocket to a price oracle that pushes updates every fraction of a second
  2. Keep track of like 16 overlapping 15 minute markets at once
  3. Check trading signals on every single price tick
  4. Place orders, watch for fills, handle settlement
  5. Never drop a price update, even while placing an order
  6. If it crashes, pick up where it left off

And all of this runs in one process.

Why I went with asyncio

My first thought was threads. But then I actually looked at what the bot does and almost all of it is just waiting. Waiting for WebSocket frames, waiting for HTTP responses, waiting on timers. There is basically no CPU work happening.

asyncio fits this perfectly. One event loop, no locks, no thread safety headaches.

The main loop kicks off about 7 tasks that all run at the same time:

tasks = [
    oracle.run(shutdown),           # WebSocket price feed
    market_lifecycle_loop(...),     # market tracking
    signal_evaluation_loop(...),    # trade signals
    telegram.run(shutdown),         # notifications
    state_persist_loop(...),        # crash recovery
    redeem_loop(...),               # auto redeem winners
    sanity_check_loop(...),         # API health checks
]
await asyncio.gather(*tasks)
Enter fullscreen mode Exit fullscreen mode

They are all independent. The oracle task fires up first and then the market and signal loops wait a few seconds before starting so the price buffer has some data in it.

Detecting zombie connections

This one took me a while to figure out. The WebSocket connection stays alive, ping pong works fine, everything looks healthy. But no actual price data is coming through. The upstream just quietly stops sending.

A simple recv() timeout does not catch this because you still get heartbeat frames. The connection is technically alive, just useless.

So I track the last time a real price came in using monotonic clock and check it on every timeout:

while not shutdown_event.is_set():
    try:
        raw = await asyncio.wait_for(ws.recv(), timeout=30)
    except TimeoutError:
        since_last = time.monotonic() - self._last_price_ts
        if since_last > self.STALL_TIMEOUT:
            # connected but no real data, force reconnect
            break
        continue
Enter fullscreen mode Exit fullscreen mode

Sounds obvious in hindsight but I spent an embarrassing amount of time debugging "why did the bot just sit there doing nothing for 20 minutes" before I added this.

Two oracle backends, same interface

The bot can pull prices from two different sources:

ChainlinkOracle connects directly to Chainlink Data Streams. HMAC auth, raw binary price reports. This is the "real" source.

PolymarketOracle uses Polymarket public RTDS relay. No authentication, JSON messages. Free and good enough for demo mode and live trading.

Both write into the same OracleBuffer. The signal evaluation loop has no idea which one is running and does not care. You switch between them with one env var.

One quirk with the RTDS source: it needs you to send a PING string every 5 seconds. Not a WebSocket level ping, an actual text message that says PING. So I spin up a separate task just for that:

ping_task = asyncio.create_task(self._ping_loop(ws, shutdown_event))
try:
    # main recv loop
finally:
    ping_task.cancel()
Enter fullscreen mode Exit fullscreen mode

Verifying credentials before anything starts

This was born out of pure frustration. I would start the bot, everything looks great, logs are clean. Then 10 minutes later I realize my Telegram token was wrong because a notification silently failed and I had no idea.

Now the bot hits the Telegram API with getMe and sends a test message before it even enters the main loop. Same thing for Polymarket API keys. If something is off you know immediately:

ERROR: Telegram chat_id=123456 is invalid. Response: Forbidden: bots can't send messages to bots
Tip: send any message to @your_bot first, then use getUpdates to find your chat_id.
Enter fullscreen mode Exit fullscreen mode

Tiny change but it saved me so much time after that.

The bisect trick for the price buffer

Prices are stored in a per asset circular buffer (just a deque with maxlen). The signal loop constantly needs to look up "what was the oracle price when this market opened?" which means searching by timestamp.

The obvious tool is bisect_right but it does not work on a deque because there is no __getitem__ on the timestamp component. I could copy to a list every time but that felt gross on a hot path.

Instead I wrote a tiny wrapper:

class _DequeTimestampKey:
    __slots__ = ("_buf",)
    def __init__(self, buf):
        self._buf = buf
    def __len__(self):
        return len(self._buf)
    def __getitem__(self, idx):
        return self._buf[idx][0]  # timestamp component
Enter fullscreen mode Exit fullscreen mode

O(log n) lookups, zero allocations. Probably my favorite 10 lines in the whole codebase.

Crash recovery

State gets dumped to a JSON file every 30 seconds. Open positions, risk counters, daily P&L, kill switch status. When the bot restarts it reads that file and picks up where it left off. Any markets that expired while it was down get resolved on the next cycle.

Nothing fancy but it means I can restart the bot mid session without losing track of anything.

How it performed

61.4% win rate over 5,017 backtested trades, consistent across BTC, ETH, XRP, and SOL. The backtest includes 7 falsification tests and a 60/40 in sample / out of sample split by date.

Full source is here: https://github.com/JonathanPetersonn/oracle-lag-sniper

I would genuinely love to hear how other people have structured similar systems. The "bunch of long lived concurrent tasks sharing state" pattern comes up a lot but I have not found many open source projects that do it well to learn from.

Top comments (0)