DEV Community

vesper_finch
vesper_finch

Posted on

I Scanned 11,000+ Prediction Markets for Arbitrage — Here's What I Found

I Scanned 11,000+ Prediction Markets for Arbitrage — Here's What I Found

Prediction markets like Polymarket have exploded in 2026, with 50M+ combined users and billions in monthly volume. Where there's volume, there should be inefficiencies. I built a Python scanner to find them.

The Hypothesis

In theory, prediction markets should have three types of exploitable mispricings:

  1. Outcome sum errors — mutually exclusive outcomes that don't sum to 100%
  2. Ladder contradictions — threshold markets that violate monotonicity
  3. Cross-market conflicts — related markets across events with inconsistent pricing

The Stack

Minimal by design:

# That's it. No API key needed.
import requests

GAMMA_API = "https://gamma-api.polymarket.com"
events = requests.get(f"{GAMMA_API}/events", params={"closed": "false", "limit": 50}).json()
Enter fullscreen mode Exit fullscreen mode

Polymarket's Gamma API is public and returns everything: event metadata, market prices, liquidity, volume.

Results: Scanning 750 Events, 11,000+ Markets

Exclusive Outcome Markets (The "Guaranteed Profit" Test)

For events like "Who will win the 2028 Presidential Election?", exactly one outcome is true. The sum of all "Yes" prices should be 1.0.

2028 Presidential Election:
  35 valid markets, Sum(Yes) = 0.9955
  Deviation from 1.0: -0.45%
Enter fullscreen mode Exit fullscreen mode

Verdict: Efficiently priced. The 0.45% gap is well below Polymarket's 2% fee. No free lunch here.

Ladder Contradictions (Logical Impossibilities)

If "BTC hits $100K" costs X, then "BTC hits $90K" should cost ≥ X (hitting 100K implies hitting 90K). Found 45 violations across the dataset.

Most were in low-liquidity sports prop markets where the spread barely covers transaction costs. But a few crypto threshold markets had meaningful deviations.

The Trap: Non-Exclusive Outcomes

The biggest gotcha: most Polymarket events are not mutually exclusive. "Which teams make the NBA Playoffs?" has 30 markets, but 16 teams advance — so the sum should be 16, not 1.

My scanner initially flagged hundreds of "opportunities" that were just this misunderstanding. Correctly classifying exclusive vs. independent outcomes is critical.

Architecture

@dataclass
class Market:
    question: str
    yes_price: float
    no_price: float
    liquidity: float
    volume_24h: float
    condition_id: str

class ArbitrageScanner:
    def scan_exclusive_outcomes(self, events) -> list[Opportunity]: ...
    def scan_ladder_contradictions(self, events) -> list[Opportunity]: ...
    def scan_cross_market(self, events) -> list[Opportunity]: ...
    def full_scan(self) -> dict: ...
Enter fullscreen mode Exit fullscreen mode

Each scan returns Opportunity objects with confidence levels ("high" for true exclusive outcomes, "low" for potentially non-exclusive).

What I Learned

  1. High-liquidity markets are efficient. Simple arbitrage is basically dead.
  2. The edge is in analysis speed, not execution speed. Finding logical contradictions across 11,000 markets is where AI/automation adds value.
  3. Non-exclusive outcome classification is the hardest part. Getting this wrong means you're trading on false signals.
  4. Infrastructure compounds. The real value isn't one scan — it's running this 24/7 and being first to spot new mispricings as markets open.

Get the Code

The full toolkit (scanner + API client + examples) is available on Gumroad ($29).

Or build your own — the Gamma API is free and well-documented.

Top comments (0)