What 86 Days of Clean Prediction-Market History Actually Shows

#datascience #trading #python #machinelearning

Everyone has an opinion about whether prediction markets are "smart." Almost nobody has the data to check.

I do. For 86 days straight, a single Mac mini has been polling Polymarket's public API and writing every price tick to a local SQLite file. As of this morning:

20,585 markets tracked
17,256,332 price snapshots
1,721,218 order-book snapshots
85 continuous days (2026-03-28 → 2026-06-21)
$0/month in infrastructure — no cloud, no cluster, one always-on machine

(Those are the exact counts in the downloadable export. The live recorder keeps running — the local feed is already past 18.0M snapshots across 19,084 markets — but a buyer should know precisely what's in the file, so these are the file's numbers, not the live feed's.)

That's not a toy sample. It's enough to ask the only question that matters about a prediction market: when the crowd says 70%, does the thing happen 70% of the time?

Why this is hard to get anywhere else

Polymarket's API serves you the current price. It does not hand you history. If you want to know what a market was trading at three weeks ago — the thing you need for any backtest — you had to have been recording it then. There's no rewind button.

So the dataset isn't valuable because the data is secret. It's valuable because it's time-stamped and continuous. You can't reconstruct it after the fact at any price. You either started the recorder in March or you didn't.

The calibration question, concretely

Take every market that resolved. Bucket the price history into deciles — the 0–10% bucket, the 10–20% bucket, and so on. For each bucket, compute the fraction of those markets that actually resolved YES.

A perfectly calibrated market draws a straight 45° line: the 30% bucket resolves YES ~30% of the time, the 90% bucket ~90%, etc.

When I ran exactly this earlier in June, the broad finding held up better than the cynics expect — the crowd is roughly honest in the middle of the distribution, and the interesting distortions live in the tails (very cheap longshots and very expensive favorites), which is precisely where the favorite-longshot bias lives in every betting market ever studied. I wrote that up with the bucket-by-bucket table here: when Polymarket says 70%, does it happen 70% of the time?.

The point of this post isn't to re-paste that table. It's to show you the query is trivial once you have the history:

import sqlite3, pandas as pd

con = sqlite3.connect("market_universe.db")

# Latest price per resolved market, joined to its outcome
df = pd.read_sql("""
  SELECT p.market_id, p.outcome, p.price, mo.outcome_label
  FROM prices p
  JOIN market_outcomes mo
    ON p.market_id = mo.market_id
  WHERE p.outcome = 'Yes'
""", con)

df["bucket"] = (df.price * 10).astype(int).clip(0, 9)
# join your resolution labels, then:
calib = df.groupby("bucket").agg(
    n=("price", "size"),
    mean_price=("price", "mean"),
)
print(calib)

Five lines of pandas. The hard part — having 17 million rows of honest, timestamped history to run it against — is the part that took 86 days.

What you can build on top of it

Calibration audits — is a specific market category (politics? sports? crypto?) better calibrated than another?
Longshot fade backtests — systematically short the sub-10% bucket and measure the edge after fees.
Mean-reversion / momentum studies on the minute-level price path.
Event-study windows — how fast does a market re-price around a news shock?

All of it needs the same thing: a continuous price record you didn't have to be there to capture.

Get the data

Free sample + schema to kick the tires: comment and I'll point you at it.
Full archive (17.2M snapshots, one-time download): on Gumroad.
Monthly refresh (the recorder keeps running; you get the new days): ask in the comments — I'm pricing it for the people who actually backtest.

The crowd is mostly calibrated. The edge is in knowing exactly where it isn't — and that only shows up if someone was recording. Someone was.