How to backtest a Polymarket strategy with free 15-minute historical data

#trading #python #datascience #data

Most people backtest prediction-market strategies wrong, and it's not their fault — the data to do it right is annoying to assemble. You need time series per contract (not just the final resolution), aligned to a clock, with the resolution label attached so you know who won. Polymarket's API gives you the live order book, but the moment a market resolves, that history is gone from where most people look.

So here's a clean recipe. Free data, ~40 lines of pandas, and the caveats that separate a backtest you can trust from one that lies to you.

0. The data

You need a price history per market at a fixed interval. I've been archiving Polymarket at a 15-minute cadence since late March — 18.6M+ price snapshots across 22,410 markets, 92 days of history as of this writing. The live market index is free to browse at protodex.io.

One honest caveat before you write a line of code: this is price history, not officially-labeled outcomes. Polymarket's public API does not hand you a clean post-hoc resolved_yes flag, so you derive the outcome from where the price ends up — and roughly 5% of ended markets never converge decisively, so they stay ambiguous and must be dropped, not guessed. That limitation is the single most important thing to understand before you trust any backtest built on this (or any price-only prediction-market dataset). I measured it: 94.6% of ended markets close at ≥0.95 or ≤0.05; the rest are coin-flips you cannot score.

You can roll your own collector against the Polymarket CLOB + Gamma APIs (the snapshot loop is maybe 60 lines), or skip the three-month wait and grab the parquet bundle — link at the end. Either way, the analysis below is identical.

1. Load it

Assume a parquet with columns market_id, timestamp, and price (the YES probability, 0–1). There is no ground-truth resolved_yes column — you derive a proxy outcome from each market's terminal price, and you throw away anything that didn't converge.

import pandas as pd

df = pd.read_parquet("polymarket_history.parquet")
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)
df = df.sort_values(["market_id", "timestamp"])

# Derive a PROXY outcome from the terminal price — there is no official label.
# End >= 0.95 -> treat YES, <= 0.05 -> treat NO; anything in between never
# converged, so it's unscoreable and gets dropped (NOT guessed at 0.5).
last = df.groupby("market_id").tail(1)[["market_id", "price"]]
last = last.rename(columns={"price": "final_price"})
last["resolved_yes"] = pd.NA
last.loc[last["final_price"] >= 0.95, "resolved_yes"] = 1
last.loc[last["final_price"] <= 0.05, "resolved_yes"] = 0
df = df.merge(last[["market_id", "resolved_yes"]], on="market_id", how="left")

# Keep only markets that converged decisively — you can't score a coin-flip
resolved = df.dropna(subset=["resolved_yes"]).copy()
print(resolved["market_id"].nunique(), "decisively-converged markets")

Be honest with yourself about what this proxy is: a circularity check is worth doing here — you're using the late price to label the outcome and the early price as the signal, so as long as your entry snapshot is well before convergence (see §2) the label isn't leaking your signal. But it is still a convergence proxy, not an official settlement, and it silently excludes the ~5% messy markets. State that in any result you publish.

2. The strategy: fade the longshots

The favorite-longshot bias is the most-documented inefficiency in betting markets: longshots (low-probability contracts) are systematically over-priced, favorites slightly under-priced. Translated to Polymarket: a contract trading at 8¢ resolves YES less than 8% of the time on average. So a naive edge is short the longshots (buy NO when YES is cheap).

Let's test whether that holds in the snapshots. Take each market's price in a chosen band at a fixed point before resolution, and compare to the realized outcome.

# For each market, grab the last snapshot at least 24h before its final timestamp
def snapshot_24h_out(g):
    end = g["timestamp"].max()
    cutoff = end - pd.Timedelta(hours=24)
    pre = g[g["timestamp"] <= cutoff]
    return pre.iloc[-1] if len(pre) else None

picks = (resolved.groupby("market_id", group_keys=False)
         .apply(snapshot_24h_out)
         .dropna())

# Longshot band: YES priced 2–15¢
band = picks[(picks["price"] >= 0.02) & (picks["price"] <= 0.15)]
implied = band["price"].mean()          # what the market said
realized = band["resolved_yes"].mean()  # what actually happened
print(f"implied YES {implied:.3f} vs realized YES {realized:.3f}")

If realized < implied, the longshots were overpriced — the bias is present and shorting them has positive expectancy before costs.

3. The part most backtests skip — costs and survivorship

A backtest without these three corrections is marketing, not research.

1. Spread + fees. You don't trade at the mid. On thin Polymarket longshots the bid/ask can be 2–4¢ wide. A 1¢ edge on a 6¢ contract evaporates the moment you cross a 3¢ spread. Always subtract a realistic fill cost from the implied price before scoring P&L.

2. Survivorship / resolution timing. If your archive only kept markets that resolved cleanly, you've dropped the messy ones (extended, disputed, voided) — and those aren't random. Score against every market that hit your entry filter, not just the tidy winners.

3. Liquidity ceiling. A 20¢ edge on a market with $300 of depth is a $60 edge, not a strategy. Weight every backtested position by the order-book depth at entry, or you'll "discover" an edge you can't actually fill. (This is exactly why snapshot data needs the order book, not just last-price — depth is the difference between a paper edge and a real one.)

Do these three and the favorite-longshot edge usually survives — but smaller than the raw number, and only in the deeper markets. That gap is the finding.

4. Why 15-minute snapshots specifically

Tick data is overkill for this and a nightmare to store; daily closes are too coarse to catch the late convergence where most of the price action lives (markets snap toward 0/1 in the final hours). 15 minutes is the sweet spot: dense enough to study convergence and intraday moves, sparse enough that three months fits in a few hundred MB of parquet.

If you'd rather not run a collector for three months before you can test a single idea, I've packaged the full archive — parquet + CSV with resolution labels included, plus a ready-to-run Python notebook that runs the exact favorite-longshot backtest above, as a one-time purchase:

👉 Polymarket Quant Toolkit — 18.6M-snapshot dataset + analysis notebook ($49)

It's price history with the convergence-proxy approach above already wired into the notebook — no over-promised resolution labels, just the honest version of the backtest. And the live market index is free to poke at first: protodex.io.

What strategy would you test first — favorite-longshot, late-convergence momentum, or cross-market arbitrage? Drop it in the comments and I'll point you at the columns you'd need.