94.6% of ended prediction markets converge to near-certainty — and why I still can't tell you if they were right

#datascience #dataset #python #trading

I run a collector that snapshots every active Polymarket market's price every 15 minutes. After 92 days it's 18.6 million price points across 22,410 markets. People keep asking the same question, and the honest answer is more interesting than the marketing one.

"Is Polymarket well-calibrated? When it says 70%, does the thing happen 70% of the time?"

I can't answer that from this data. But I can show you something adjacent and real — and, more usefully, I can show you exactly where the wall is, because most "prediction markets are smart" posts walk straight through it without noticing.

The thing I CAN measure: convergence

Take every market that has ended (end_date in the past) and look at its last traded YES price. On the frozen export (2026-03-28 → 2026-06-28):

SELECT
  COUNT(*) AS ended,
  ROUND(100.0*SUM(CASE WHEN last_trade_price>=0.95 OR last_trade_price<=0.05
        THEN 1 ELSE 0 END)/COUNT(*),1) AS pct_decisive,
  ROUND(100.0*SUM(CASE WHEN last_trade_price>0.40 AND last_trade_price<0.60
        THEN 1 ELSE 0 END)/COUNT(*),1) AS pct_coinflip
FROM markets
WHERE end_date < date('now') AND end_date != '' AND last_trade_price IS NOT NULL;

ended    pct_decisive   pct_coinflip
19402    94.6           0.9

94.6% of ended markets had their price collapse to ≥0.95 or ≤0.05 by close. Only 0.9% were still a coin-flip (0.40–0.60). The crowd makes up its mind almost every time before the window shuts.

That's a genuinely useful fact if you're building anything time-aware: a market sitting at 0.55 two days before close is in the rare 4.5% that hasn't resolved its uncertainty yet — that's where the trades live.

The thing I CANNOT measure: was it right?

Here's where the wall is. Convergence is not correctness. A market going to 0.97 tells you traders agreed. It does not tell you the YES outcome happened. To know that, you need the resolved outcome — and this dataset doesn't have it:

SELECT resolved, COUNT(*) FROM markets GROUP BY resolved;
-- 0 | 22410   (every single row)

resolved_outcome is empty for all 22,410 markets. The collector reads live prices from Polymarket's Gamma/CLOB APIs every 15 minutes; it never joins the on-chain resolution feed. So calibration curves, favorite-longshot bias, Brier scores — anything that needs "which side won" — are not computable from this file alone.

You could approximate by assuming the final price equals the truth (price ≥ 0.95 → "YES happened"). But that's circular: you'd be grading the market against its own last guess, then announcing it's well-calibrated. It's the single most common mistake in prediction-market blog posts.

Why ship a dataset and tell you what it can't do?

Because the alternative is a refund and a 1-star review. The price series is the dense, reliable layer — 15-minute resolution, 92 days, no gaps, no scraping (straight from the public Gamma + CLOB APIs). That's worth paying for if you're backtesting price-based strategies, training a short-horizon predictor, or studying microstructure. It is not worth paying for if you wanted a calibration study, and I'd rather you know that before you click.

(Same reason I published the audit showing ~94% of the order-book rows are thin-market placeholders. Price = the product. Order book and resolution labels = honest caveats.)

Reproduce it in 30 seconds

Free 1-day sample (no signup): Hugging Face

import sqlite3, pandas as pd
con = sqlite3.connect("polymarket.db")
m = pd.read_sql("SELECT last_trade_price p FROM markets "
                "WHERE end_date < date('now') AND last_trade_price IS NOT NULL", con)
decisive = ((m.p >= 0.95) | (m.p <= 0.05)).mean()
print(f"{decisive:.1%} of ended markets converged")

Full dataset, $19 one-time: Gumroad. Live auto-refreshing API: api.protodex.io.

Open question for the comments: if you had the resolution labels joined in, what's the first thing you'd test — calibration, or fading the longshots? That's the next build and I'll prioritize by what people actually want.