Free Polymarket Dataset on Kaggle: 13,963 Active Markets + 100K Price Sample (May 2026)

#polymarket #dataset #kaggle #trading

I just pushed a fresh sample of the Polymarket dataset to Kaggle. Free download, Apache 2.0, no email gate.

Updated 2026-06-21: stats refreshed against the live database — 17,256,332 price snapshots across 20,585 markets (2026-03-28 → 2026-06-21, ~85 days, every 15 min).

👉 https://www.kaggle.com/datasets/luciferforge/polymarket-markets-prices-sample-2026

What's in it

Two CSV files:

File	Rows	What it gives you
`markets.csv`	20,585	Every tracked Polymarket market with question, category, volume, liquidity, status, end date, slug
`prices_sample.csv`	100,000	The most recent 15-minute price snapshots across the universe — preview of the full 17.2M+ snapshot corpus

8 MB compressed. Loads instantly in pandas, DuckDB, Polars, or any spreadsheet.

What you can build in 20 minutes

A screener. Filter markets by category × volume × spread × days-to-resolution.
A volume leaderboard. markets.csv has volume, volumeNum, volume24hr — rank, sort, group by category.
A "lottery ticket" finder. Find low-price (<$0.10) outcomes with non-trivial liquidity. There are hundreds.
A category dashboard. What's the average spread in sports vs politics vs crypto? The CSV has the data.
A correlation map. Even with 100K snapshots, you can compute simple correlations between sub-markets in the same event (semi-final A vs final winner, for example).

What it's NOT

This is the sample, not the full historical corpus.

If you need:

Every 15-minute snapshot since March 2026 (17.2M+ price rows)
Orderbook depth (1.5M+ snapshots)
The full SQLite database for joins
Continuous updates

...that's the $49 Polymarket Quant Toolkit on Gumroad — the full 17.2M-snapshot SQLite corpus plus an analysis notebook. The Kaggle sample is the on-ramp.

How it was collected

Automated pipeline running 24/7 since March 2026:

Gamma API → market metadata + prices for every active market
CLOB API → orderbook depth (top 10 levels) for top 200 markets by volume
SQLite storage with indexed timestamps for analytical queries
Sample CSVs regenerated when the dataset is republished

Source: protodex.io — the MCP-server directory with security scores. I built the Polymarket collector to feed a separate trading bot project; the dataset is the byproduct.

Why a Kaggle release at all

Two reasons:

It's where the prediction-market quant crowd lives. Kaggle has a dedicated "prediction markets" search niche. The other dataset I have there (polymarket-historical-prices) is sitting at 68 downloads from zero promo. Putting structured data where researchers already are has been higher-leverage than waiting for Reddit posts to go viral.
Sample-to-paid funnels work. If you find an edge in the free sample, paying $49 for the full corpus + notebook is an easy yes. If the sample data is unusable to you, you'd never have bought it anyway. Aligns incentives.

Limitations to know

The 100K-row price sample is the most recent 100K snapshots only — not a uniform random sample across the full 85-day window.
"Active markets" here means markets that were live as of the last collection run. Resolved markets aren't included in markets.csv.
Polymarket pulled a few markets between the snapshot and publish. Cross-check by slug if you find any 404s.
The dataset is updated when I refresh it — not in real time. For live data, hit the Gamma API directly (it's free and rate-limited generously).

License

Apache 2.0. Use commercially, modify, redistribute. Credit Protodex if you publish derived work; no obligation otherwise.

What I'd love feedback on

If you download this and build something — even a half-finished notebook — drop a comment with what you tried. I'm watching which use cases come up so I know what the V2 corpus should prioritize (more orderbook depth? more historical reach? resolved-market history? a labeled crash-event dataset?).

The dataset is live now: Polymarket Markets + Price Sample (2026) on Kaggle.

If it's useful, an upvote on Kaggle helps it surface in the prediction-markets search — that's the only ask.