Prediction markets are one of the few areas where AI automation can generate real alpha — not because AI is smarter at predicting events, but because it can analyze structural mispricings across thousands of markets simultaneously.
I built a scanner that does exactly this. Here's the technical breakdown.
The Problem
Polymarket has 750+ active events and 11,000+ markets. No human can monitor all of them. But three types of structural mispricings can be detected programmatically:
1. Outcome Sum Errors
For mutually exclusive events ("Who wins the election?"), the sum of all Yes prices must equal 1.0. Any deviation = guaranteed profit.
from polymarket_scanner import PolymarketClient, ArbitrageScanner
client = PolymarketClient()
scanner = ArbitrageScanner(client)
events = client.get_events(max_pages=15)
opportunities = scanner.scan_exclusive_outcomes(events)
for opp in opportunities:
if opp.confidence == "high":
print(f"{opp.description}")
print(f" Profit: {opp.estimated_profit_pct}%")
2. Ladder Contradictions
Threshold markets must follow monotonicity: P("BTC > K") >= P("BTC > K").
Violations are logical impossibilities, and they appear more often than you'd think — I found 45 in a single scan.
ladders = scanner.scan_ladder_contradictions(events)
for opp in ladders:
print(f"{opp.description}")
print(f" Strategy: {opp.strategy}")
print(f" Deviation: {opp.deviation_pct}%")
3. Cross-Market Implications
"Trump wins" implies "Republican wins". If P(Trump) > P(Republican), that's a contradiction.
Finding these requires comparing markets across different events — an O(n²) problem that becomes tractable with keyword matching or embeddings.
Architecture Decisions
Why dataclasses, not dicts?
Every market and opportunity is a typed dataclass. This makes the code self-documenting and catches errors at development time rather than runtime.
@dataclass
class Market:
question: str
yes_price: float
no_price: float
liquidity: float
volume_24h: float
condition_id: str
@dataclass
class Opportunity:
type: str
description: str
markets: list
deviation_pct: float
estimated_profit_pct: float
min_liquidity: float
strategy: str
confidence: str # "high", "medium", "low"
Why confidence levels?
Not all "mispricings" are real. "Which teams make the playoffs?" has 30 markets summing to 16 — because 16 teams qualify. Without confidence filtering, you'd trade on noise.
Why no external dependencies beyond ?
Fewer dependencies = faster setup, fewer security risks, easier auditing. The Gamma API returns simple JSON. No need for heavy frameworks.
Real Results from Today's Scan
| Metric | Value |
|---|---|
| Events scanned | 750 |
| Markets scanned | 11,821 |
| High-confidence exclusive outcome mispricings | 31 |
| Ladder contradictions | 45 |
| Cross-market implications | 20 |
Most high-liquidity markets are efficiently priced (< 0.5% deviation). The edges are in:
- Lower-liquidity niche markets
- Newly opened markets (before bots adjust)
- Complex multi-leg logical relationships
Running Continuously
import time
client = PolymarketClient()
scanner = ArbitrageScanner(client)
while True:
results = scanner.full_scan(max_pages=5)
for opp in results["ladder_contradictions"]:
if opp["deviation_pct"] > 5:
send_alert(opp) # your notification method
time.sleep(300)
Get the Full Toolkit
The complete scanner with all three analysis modules, API client, and examples is available for :
Polymarket Scanner Toolkit on Gumroad
Python 3.10+. One dependency (). Works out of the box.
What approaches are you using for prediction market analysis? I am especially interested in hearing about embedding-based semantic matching for cross-market analysis.
Top comments (0)