RisingWave Labs

Posted on Dec 17, 2025

Building a Polymarket-Style Prediction Engine with RisingWave

#tutorial #database #datascience #coding

Polymarket and similar platforms have proved something simple and powerful: price behaves like probability. Traders want to bet on elections, sports, and crypto outcomes with the speed and responsiveness of a modern exchange.

While the user experience feels simple, a prediction market is actually a distributed system dealing with two conflicting engineering constraints. First, you have the need for real-time price discovery. You must continuously update odds, order books, and liquidity as trades stream in at high volume. Second, you face the "fan-out" settlement problem. When a real-world event resolves (e.g., "Did Candidate X win?"), a single message from an oracle must instantly settle hundreds of thousands of open positions and update user balances without race conditions.

Most teams attack this with a fragmented stack: Kafka for ingestion, Flink for windowing, Redis for hot state, and Postgres for the ledger. This requires extensive glue code to orchestrate, and often forces the application layer to handle the complex logic of settlement fan-out.

RisingWave offers a different path.

It is a Postgres-compatible streaming database. Instead of writing glue code, you describe your entire backend logic (from live odds to settlement and balances) as Materialized Views in SQL. RisingWave maintains these views incrementally, effectively turning your database into a reactive engine that solves the "fan-out" problem automatically.

In this post, we will walk through how to build a production-grade prediction market architecture, moving from raw streams to instant settlement.

1. The Anatomy of a Prediction Market

To understand the architecture, it helps to first understand the two data streams that drive a prediction market.

Trading Stream: All the Bets

The first is the Trading Stream. Every time a user places a bet, whether matched via an AMM or a CLOB (Central Limit Order Book), the system appends a record like:

{
  "user_id": 12345,
  "market_id": 42,
  "side": "YES",
  "size": 100,
  "price": 0.57,
  "timestamp": "2025-06-01T12:34:56Z"
}

This stream is high volume and continuous. It is what drives the live price you see on the frontend.

Oracle Stream: The Final Answer

The second is the Oracle Stream. It is low volume but high impact. At some point, each market receives a resolution event, for example:

{
  "market_id": 42,
  "outcome": "YES",
  "resolved_at": "2025-06-02T03:00:00Z"
}

This single message decides who won and who lost in that market.

Where the Hard Part Lives: When Streams Meet

The core engineering challenge is what happens when these two streams meet.

An oracle event arrives for market_id = 42. The system must:

Find every trade for that market
Compute Profit and Loss (PnL) for each user based on their entry price
Update user balances safely and atomically

In a traditional architecture, this often triggers a large batch job or a heavy, locking SQL update once the market resolves.

With a streaming database, you can instead treat this as a continuous join between the Trading Stream and the Oracle Stream. When the oracle event appears, the fan-out settlement is performed automatically as part of the normal stream processing.

2. RisingWave: Streaming SQL with Tiered Storage

RisingWave looks and feels like Postgres, but it processes data as it arrives. You connect it to sources like Kafka or database CDC, define Materialized Views (MVs) to express your logic, and then query those views directly.

Crucially for this use case, RisingWave offloads its internal state to object storage (S3). This "Tiered Storage" architecture means you can maintain the state of millions of open positions or historical trades without being limited by RAM or managing complex RocksDB instances.

For our market, we will define four core views to handle prices, settlement, risk, and the ledger.

3. Real-Time Market State

The most immediate requirement is that when a user buys "YES", the price on the dashboard should tick from 0.50 to 0.51 instantly.

We start by ingesting the raw trades from Kafka.

CREATE SOURCE trades (
  trade_id   BIGINT,
  user_id    BIGINT,
  market_id  BIGINT,
  side       VARCHAR, -- 'YES' or 'NO'
  size       NUMERIC,
  price      NUMERIC,
  ts         TIMESTAMPTZ
) WITH (
  connector = 'kafka',
  topic = 'trades',
  properties.bootstrap.server = '...',
  format = 'json'
);

To drive the UI, we create a materialized view that calculates the latest price and 24-hour volume. We use last_value ordered by the timestamp to ensure the price is deterministic.

CREATE MATERIALIZED VIEW market_prices AS
SELECT
  market_id,
  last_value(price ORDER BY ts) AS last_price,
  AVG(price)                    AS avg_price,
  SUM(size)                     AS total_volume_24h
FROM trades
WHERE ts > now() - interval '24 hours'
GROUP BY market_id;

Because this view is materialized, your API can serve the frontend by simply querying SELECT * FROM market_prices WHERE market_id = 42. The result is always fresh, eliminating the need to sync data to a separate Redis cache.

4. Instant Settlement: The Streaming Join

This is where the architecture diverges from the traditional approach. Instead of writing a batch script that runs "after the game" to calculate winners, we define settlement as a continuous join between the trades stream and the oracle_feed.

First, we define the oracle source:

CREATE SOURCE oracle_feed (
  market_id   BIGINT,
  outcome     VARCHAR,
  resolved_at TIMESTAMPTZ
) WITH (
  connector = 'kafka',
  topic = 'oracle_feed',
  format = 'json'
);

Next, we create the settlement engine. RisingWave maintains the state of all open trades. When an outcome message arrives, it immediately matches against that index and emits the calculated PnL for every affected trade.

CREATE MATERIALIZED VIEW settled_trades AS
SELECT
  t.trade_id,
  t.user_id,
  t.market_id,
  -- If you bought YES and outcome is YES, you get $1.00 per share.
  -- Otherwise, you lose your principal (price * size).
  CASE
    WHEN o.outcome = 'YES' AND t.side = 'YES' THEN (1 - t.price) * t.size
    WHEN o.outcome = 'NO'  AND t.side = 'NO'  THEN (1 - t.price) * t.size
    ELSE -1 * (t.price * t.size)
  END AS pnl,
  o.resolved_at
FROM trades t
JOIN oracle_feed o
  ON t.market_id = o.market_id;

This view is the engine of the platform. It handles the 1-to-N fan-out automatically. There is no loop logic to write and no state consistency to manage manually. As soon as the oracle message hits the Kafka topic, the PnL is calculated and ready for the ledger.

5. User Balances: The Ledger Pattern

A common anti-pattern in SQL is to attempt joining trades, transfers, and settlements all at once to calculate a user's balance. In a streaming context, this is inefficient.

A more robust approach is the Ledger Pattern. We treat every financial event—whether it's a deposit, a withdrawal, or a trading profit—as a line item in a unified ledger. We can then use a UNION ALL to combine these streams into a single flow of balance changes.

CREATE MATERIALIZED VIEW balance_ledger AS
-- 1. Cash movements from the transfers stream
SELECT user_id, amount AS change, ts FROM transfers
UNION ALL
-- 2. Realized PnL from the settled trades view we created above
SELECT user_id, pnl AS change, resolved_at as ts FROM settled_trades

With the streams unified, calculating the live balance becomes a simple aggregation:

CREATE MATERIALIZED VIEW user_balances AS
SELECT
  user_id,
  SUM(change) AS current_balance
FROM balance_ledger
GROUP BY user_id;

Finally, to show users their active risk, we can aggregate the trades that have not yet settled to show their current exposure:

CREATE MATERIALIZED VIEW user_positions AS
SELECT
  user_id,
  market_id,
  SUM(CASE WHEN side = 'YES' THEN size ELSE -size END) AS net_exposure
FROM trades
GROUP BY user_id, market_id;

6. End-to-End Architecture

By pushing the complexity down into the database, the resulting architecture is remarkably clean.

In a typical stack, the box labeled MV: settled_trades would be a complex microservice or a heavy Spark/Flink job responsible for the fan‑out settlement logic. Here, that complexity is absorbed by the database’s join operator.

Similarly, the box labeled MV: user_balances replaces what is usually a fragile orchestration of updates between Redis and a permanent data store.

Because RisingWave offloads the internal state of these joins to S3, you can scale to millions of open positions without managing expensive infrastructure. The result is a system where correctness is derived from the stream itself, providing built‑in auditability through the immutable MV: balance_ledger pattern.

7. Conclusion

To build a prediction market like Polymarket, you need streaming performance for the UI and relational correctness for the money. RisingWave bridges this gap by allowing you to ingest raw streams, define complex business logic using standard SQL, and serve the results directly to your application.

If you are designing a high-frequency trading application or a prediction market, consider pushing the "hard parts“ (state management and stream joining) down into the database. It allows you to focus on the market mechanics rather than the plumbing.

Get Started with RisingWave

Try RisingWave Today:
- Download the open-sourced version of RisingWave to deploy on your own infrastructure.
- Get started quickly with RisingWave Cloud for a fully managed experience.

DEV Community