What 91 days of Polymarket price data can — and can't — tell you

#datasets #data #trading #opensource

Most "dataset" posts oversell. This one tells you the limits first.

For 91 straight days (28 Mar → 27 Jun 2026) I collected Polymarket prices on one Mac, no cloud bill. The frozen export is:

18,591,646 price snapshots
22,362 markets (19,819 of them with an actual price series — avg 938 snapshots per market)
1,854,788 orderbook snapshots

Category mix: sports 12,258 · other 4,689 · crypto 2,514 · politics 1,441 · geopolitics 685 · science/tech 241 · economics 224 · entertainment 169 · weather 137.

Here's what you can actually do with that — and three things you can't.

What it's good for

1. Price-path / convergence studies. Of the 6,776 markets that ended inside the window, 6,551 (96.7%) closed decisively — last Yes-price above 0.95 or below 0.05. Only 3.3% were still mushy in the middle at the end. So the data is clean enough to study how and when a market makes up its mind: most of the information arrives well before close, and you can measure the shape of that arrival per category. (Crypto markets snap late; politics drifts.)

2. Mean-reversion / momentum backtests on the spread. With ~938 snapshots/market you have enough intraday resolution to test "does a 10-point move in 6 hours revert or continue?" across thousands of contracts and eight categories.

3. Behavioral / microstructure features. Volume, 24h volume, liquidity, best bid/ask, spread, and last-trade are all captured per snapshot, so you can build features without re-hitting the API 17 million times.

What it can NOT do (the part most listings hide)

1. True calibration. The dataset has no resolution labels. Every market in the export carries price series — not the final settled outcome. So you can study price convergence, but you cannot compute real calibration (predicted probability vs realized result) without joining external resolution data yourself. If a vendor shows you a "calibration curve" derived purely from prices, they're measuring the market against itself, not against reality. I'm not going to pretend otherwise.

2. Deep order-book research. The orderbook table is large (1.85M rows) but only a single-digit % of those rows are genuine two-sided quotes — the rest are placeholder/one-sided. The price series is the real product here, not the book. Buy it for price history; don't buy it expecting an L2 reconstruction.

3. Sub-minute tick data. This is snapshot cadence, not a trade-by-trade feed. Great for hourly/daily strategy research, wrong tool for HFT.

Why collect it at all

Polymarket's API is free but rate-limited and ephemeral — query it today and last month's prices are gone. The value of a frozen 91-day export is that the history exists and is queryable in one SQLite file (prices, markets, orderbooks, market_features, …), indexed on (market_id, ts). Pull a market's full path in one query instead of paginating a live endpoint.

SELECT ts, price
FROM prices
WHERE market_id = ? AND outcome = 'Yes'
ORDER BY ts;

That's the whole pitch: clean, indexed, honestly-scoped prediction-market price history you can backtest against tonight.

If that's the shape of data you need, the full export (and a free sample tier) is here:
👉 Polymarket Historical Price Dataset

Questions about schema or coverage before you buy — ask in the comments and I'll answer with a real query against the file.

Counts above are computed directly from the delivered export (the exact file you download), not the live collector, so what you read is what you get.

DEV Community

What 91 days of Polymarket price data can — and can't — tell you

What it's good for

What it can NOT do (the part most listings hide)

Why collect it at all

Top comments (0)