Prediction markets are supposed to be efficient. Prices should reflect true probabilities. But with 7,500+ active markets on Polymarket, some contracts stay mispriced for hours — sometimes days.
I built a scanner that finds these mispricings automatically using 6M+ historical price points. Here's exactly how it works, and what patterns it surfaces.
What "Mispriced" Actually Means
A binary contract on Polymarket has YES and NO shares. In a perfectly efficient market, YES + NO = $1.00. In practice, you see:
- Spread inefficiency: YES at $0.62, NO at $0.35. That's $0.97 — a 3-cent gap where the market disagrees with itself.
- Stale pricing: A market hasn't moved in 48 hours despite a major news event. The information hasn't been incorporated.
- Liquidity-driven mispricing: A large sell pushes YES from $0.70 to $0.55 in a thin market. The "true" price is probably $0.65.
Each pattern requires a different detection approach.
The Architecture
class MispricingScanner:
def __init__(self, db_path: str):
self.db = sqlite3.connect(db_path)
# 6M+ price points, 7,500 markets
# Collected every 4 minutes from Polymarket Gamma API
def find_spread_inefficiencies(self, min_gap: float = 0.03):
"""Markets where YES + NO < 0.97 (3%+ spread)"""
query = """
SELECT market_id, yes_price, no_price,
(1.0 - yes_price - no_price) as gap,
volume_24h
FROM latest_prices
WHERE (1.0 - yes_price - no_price) > ?
AND volume_24h > 1000
ORDER BY gap DESC
"""
return self.db.execute(query, (min_gap,)).fetchall()
def find_stale_markets(self, hours: int = 24):
"""Markets with no price movement despite active trading"""
query = """
SELECT market_id,
MAX(price) - MIN(price) as price_range,
COUNT(*) as data_points,
SUM(volume) as total_volume
FROM prices
WHERE timestamp > datetime('now', ?)
GROUP BY market_id
HAVING price_range < 0.01 AND total_volume > 5000
"""
return self.db.execute(query, (f'-{hours} hours',)).fetchall()
def find_crash_rebounds(self, drop_pct: float = 0.15):
"""Markets where price dropped >15% and is recovering"""
query = """
WITH price_changes AS (
SELECT market_id,
price,
LAG(price, 6) OVER (
PARTITION BY market_id
ORDER BY timestamp
) as price_24min_ago
FROM prices
WHERE timestamp > datetime('now', '-6 hours')
)
SELECT market_id, price, price_24min_ago,
(price_24min_ago - price) / price_24min_ago as drop
FROM price_changes
WHERE drop > ?
ORDER BY drop DESC
"""
return self.db.execute(query, (drop_pct,)).fetchall()
This runs against a SQLite database that's been collecting every Polymarket price, every 4 minutes, since mid-March. That continuous collection is what makes mispricing detection possible — you can't spot stale markets or crash rebounds without historical context.
Pattern 1: Spread Inefficiencies
When the bid-ask spread is wider than 3%, there's an opportunity. The scanner found that on average, 8-12% of active markets have spreads over 3% at any given time. Most of these are low-volume markets, but about 15-20 per day have meaningful volume (>$5K/24h).
Here's what the output looks like:
SPREAD INEFFICIENCIES (gap > 3%, vol > $5K)
-------------------------------------------
Market: "Will ETH hit $5,000 by June?"
YES: $0.23 NO: $0.71 Gap: $0.06 Vol: $12,400
Market: "Fed rate cut in May?"
YES: $0.41 NO: $0.52 Gap: $0.07 Vol: $8,200
Market: "Bitcoin above $100K on April 30?"
YES: $0.58 NO: $0.36 Gap: $0.06 Vol: $31,000
A 6-7 cent gap in a market with $10K+ daily volume is tradeable. You buy both sides for $0.94 total, and one side will pay $1.00. That's a 6.4% guaranteed return — if you can fill both sides at the displayed prices.
Pattern 2: News-Driven Stale Pricing
The most profitable pattern. A major event happens, but a related Polymarket contract doesn't move because:
- The market is small and most participants haven't checked it
- It's a less popular category (science, entertainment) where watchers are fewer
- Weekend/holiday timing when active traders are offline
The scanner cross-references price staleness with volume changes. If volume spikes but price doesn't move, someone knows something but hasn't moved the price yet.
Pattern 3: Crash Rebounds
This is the pattern I've traded most aggressively. When a market drops 15%+ in under an hour on Polymarket, it rebounds 60-70% of the time within 6 hours. Why? Because most crashes are:
- Single large sellers liquidating positions (not new information)
- Panic cascades in thin orderbooks
- Misinterpretation of ambiguous news
My crash-fade bot has processed 120 closed trades with an 85.8% win rate and $97.78 paper P&L using exactly this pattern. The scanner identifies the crash, checks orderbook depth to confirm it's liquidity-driven (not information-driven), and triggers a buy signal.
The Database That Makes This Work
All of this depends on having granular historical data. Polymarket's Gamma API only gives you the current state — no historical prices, no orderbook history, no spread changes over time.
I built a collector that stores everything:
| Metric | Count |
|---|---|
| Price points | 6,091,088 |
| Orderbook snapshots | 585,745 |
| Markets tracked | 7,500+ |
| Collection runs | 1,514 |
| Update frequency | Every 4 minutes |
| Database size | Growing ~250K rows/day |
You can explore this data live on PolyScope — a free dashboard I built to visualize market-wide patterns, spreads, and price movements across all 7,500+ Polymarket contracts.
Building Your Own Scanner
If you want to build something similar, you need:
- A data collector running continuously against the Gamma API
- At minimum 1 week of data before mispricing detection becomes reliable (you need baseline volatility per market)
- Volume filtering — ignore markets under $1K daily volume, the spreads are wide but untradeable
- News correlation — the hardest part. Matching external events to specific contracts is what separates good calls from random noise
For connecting to external APIs and building these kinds of data pipelines, I use a modular API connector pattern. If you want a pre-built version: API Connector Builder — Claude Code Skill ($7) handles auth, rate limiting, pagination, and error handling out of the box.
What I Learned From 3 Weeks of Scanning
- Weekends are gold. Spreads widen 40-60% on Saturdays. Fewer traders = more inefficiency.
- New markets misprice for 2-3 hours after creation. Early liquidity providers set prices loosely.
- Crash rebounds are the highest-EV pattern but require speed. The rebound window is typically 30 minutes to 2 hours.
- Correlated markets often misprice independently. If "Will X happen by June?" moves but "Will X happen by December?" doesn't, that's an edge.
Try It Yourself
PolyScope — free, no signup. Browse all 7,500+ Polymarket markets, check spreads, and view price history.
Want the raw data? The Polymarket Historical Price Dataset ($1 on Gumroad) includes all 6M+ price points with a Jupyter notebook to get started.
I write about prediction markets, quantitative trading, and building with AI. Follow for more data-driven trading content.
Top comments (0)