Building a Real-Time Consensus Engine for Prediction Markets — Architecture Deep Dive

#api #data #machinelearning #python

Prediction markets are now a $1.3 trillion industry. But if you want to know the consensus probability on an event — a basketball game, a Fed rate decision, an election — you have to manually check a dozen different platforms.

I built Meridian Edge to solve this. It's a REST API that aggregates prediction market data from regulated sources and computes a real-time consensus probability for each event.

This post walks through the architecture.

The problem

Prediction market data is fragmented. Each platform has its own API (if it has one at all), its own data format, and its own pricing. There's no unified view.

If you're a researcher studying probability calibration, a developer building an AI agent that needs real-time event probabilities, or an analyst covering multiple categories — you're stuck doing manual work.

The data pipeline

The system processes data in three stages:

Stage 1: Collection
Snapshot jobs run every 14 seconds, pulling current state from each tracked source. We normalize everything into a common schema: event, outcome, probability, source, timestamp.

As of today, we're tracking 27,000+ active markets across five categories: sports (NBA, NHL, NFL, MLS, FIFA World Cup), politics (US Midterms, state races), economics (Fed rates, CPI, GDP), crypto, and weather.

Stage 2: Consensus computation
Raw probabilities from individual sources go through an ML-based consensus engine. The models retrain every 2 hours on fresh data. We use ensemble methods to weight sources based on historical accuracy, liquidity, and recency.

The output is a single consensus probability per event, plus divergence metrics showing where sources agree and disagree.

Stage 3: Storage and serving
Everything lands in PostgreSQL (currently 20GB and growing). The API serves requests through a Python backend running on EC2 with 110+ cron jobs keeping the pipeline running.

Over 100 million data points are stored with full historical coverage.

The API

Straightforward REST API. JSON responses. Example:

{
  "event": "NBA: BOS vs CHA",
  "consensus_probability": 0.59,
  "sources": 2,
  "spread": 0.18,
  "updated_at": "2026-03-29T15:30:00Z",
  "category": "sports"
}

Endpoints cover:

Current consensus by category or event
Historical time series
Cross-source divergence
Settlement outcomes

What I learned building this

Normalization is 80% of the work. Every source structures their data differently. Event naming is inconsistent. Timestamps are in different formats. Building the normalization layer took longer than the ML engine.

Real-time is relative. 14-second snapshots with 10-minute consensus updates hits a good balance for prediction markets. These aren't high-frequency price feeds — probability estimates move on news cycles (minutes to hours), not microseconds.

110 cron jobs is a lot of cron jobs. Monitoring and alerting became critical once the pipeline grew beyond ~30 jobs. I built a watchdog system that checks every subsystem and reports twice daily.

Who's using it

Early adopters fall into three buckets:

AI agent developers — using prediction market probabilities as real-time context for LLM-based systems via MCP
Researchers — studying probability calibration and market efficiency across platforms
Data teams — building dashboards and analysis tools on top of the consensus data