I Scanned 11,000+ Prediction Markets for Arbitrage — Here's What I Found
Prediction markets like Polymarket have exploded in 2026, with 50M+ combined users and billions in monthly volume. Where there's volume, there should be inefficiencies. I built a Python scanner to find them.
The Hypothesis
In theory, prediction markets should have three types of exploitable mispricings:
- Outcome sum errors — mutually exclusive outcomes that don't sum to 100%
- Ladder contradictions — threshold markets that violate monotonicity
- Cross-market conflicts — related markets across events with inconsistent pricing
The Stack
Minimal by design:
# That's it. No API key needed.
import requests
GAMMA_API = "https://gamma-api.polymarket.com"
events = requests.get(f"{GAMMA_API}/events", params={"closed": "false", "limit": 50}).json()
Polymarket's Gamma API is public and returns everything: event metadata, market prices, liquidity, volume.
Results: Scanning 750 Events, 11,000+ Markets
Exclusive Outcome Markets (The "Guaranteed Profit" Test)
For events like "Who will win the 2028 Presidential Election?", exactly one outcome is true. The sum of all "Yes" prices should be 1.0.
2028 Presidential Election:
35 valid markets, Sum(Yes) = 0.9955
Deviation from 1.0: -0.45%
Verdict: Efficiently priced. The 0.45% gap is well below Polymarket's 2% fee. No free lunch here.
Ladder Contradictions (Logical Impossibilities)
If "BTC hits $100K" costs X, then "BTC hits $90K" should cost ≥ X (hitting 100K implies hitting 90K). Found 45 violations across the dataset.
Most were in low-liquidity sports prop markets where the spread barely covers transaction costs. But a few crypto threshold markets had meaningful deviations.
The Trap: Non-Exclusive Outcomes
The biggest gotcha: most Polymarket events are not mutually exclusive. "Which teams make the NBA Playoffs?" has 30 markets, but 16 teams advance — so the sum should be 16, not 1.
My scanner initially flagged hundreds of "opportunities" that were just this misunderstanding. Correctly classifying exclusive vs. independent outcomes is critical.
Architecture
@dataclass
class Market:
question: str
yes_price: float
no_price: float
liquidity: float
volume_24h: float
condition_id: str
class ArbitrageScanner:
def scan_exclusive_outcomes(self, events) -> list[Opportunity]: ...
def scan_ladder_contradictions(self, events) -> list[Opportunity]: ...
def scan_cross_market(self, events) -> list[Opportunity]: ...
def full_scan(self) -> dict: ...
Each scan returns Opportunity objects with confidence levels ("high" for true exclusive outcomes, "low" for potentially non-exclusive).
What I Learned
- High-liquidity markets are efficient. Simple arbitrage is basically dead.
- The edge is in analysis speed, not execution speed. Finding logical contradictions across 11,000 markets is where AI/automation adds value.
- Non-exclusive outcome classification is the hardest part. Getting this wrong means you're trading on false signals.
- Infrastructure compounds. The real value isn't one scan — it's running this 24/7 and being first to spot new mispricings as markets open.
Get the Code
The full toolkit (scanner + API client + examples) is available on Gumroad ($29).
Or build your own — the Gamma API is free and well-documented.
Top comments (0)