How prediction monitoring differs from single-shot forecasting

#machinelearning #predictions #ai #productivity

Three distinct layers for "predicting the future" now coexist, and they solve fundamentally different problems. We have been building Watching Agents at Inithouse, a platform for continuous prediction monitoring, and the most common question we get is: "How is this different from forecasting APIs or prediction markets?"

Short answer: they are complementary. Here is how.

Layer 1: Single-shot forecasting APIs

A forecasting API takes a question about a future event and returns a calibrated probability. You ask, it answers, done.

Foresight by Lightning Rod is a strong example. It is an OpenAI-compatible API that returns scored, calibrated forecasts. Their Foresight-v3 model currently ranks first on ProphetArena (an independent benchmark from UChicago) by Brier score, outperforming GPT-5, Gemini 3 Pro, and other frontier models. The training approach uses reinforcement learning with time as a verifiable reward signal, so even their smaller models outpredict much larger ones.

This is powerful for developers who need a probability estimate inside a workflow. Feed it a question, get a number, make a decision. The interaction is stateless: one question, one answer, move on.

Layer 2: Prediction markets and community forecasting

Platforms like Polymarket and Metaculus aggregate human judgment into probability estimates, but through different mechanisms.

Polymarket uses real-money trading on a crypto-native exchange. Prices reflect market consensus, updated by every trade. Metaculus runs a non-monetary community where forecasters submit probability estimates and the aggregate tracks close to perfect calibration across thousands of resolved questions (events predicted at 70% actually occur about 70% of the time).

Both are excellent at capturing collective intelligence on well-defined binary or multi-choice questions. They struggle more with open-ended monitoring where the question itself might evolve as new information surfaces. And they require active participation: someone has to trade or update their forecast. If nobody is paying attention to a question, the signal stalls.

Layer 3: Continuous prediction monitoring

This is the layer we work in with Watching Agents. Instead of answering a question once, an AI agent watches it continuously. It builds hypotheses, tracks evidence as it appears in real time, and sends alerts when something changes.

The difference is structural, not just about update frequency. A forecasting API does not remember what it told you yesterday. A prediction market reacts to trades but does not explain why the probability shifted. A monitoring agent maintains context: it knows what evidence it collected last week, what hypotheses it has been tracking, and what specifically changed today.

We built Watching Agents because we kept running into situations where the interesting part was not "what is the probability right now" but "what just changed, what caused it, and should we care." That question requires persistent state, evidence tracking, and the ability to surface signals proactively rather than waiting for someone to ask.

Where each layer fits

The three layers map to different jobs:

Forecasting APIs work best when you need a probability estimate inside code. Building a trading bot? Scoring deal risk? Feed the question to an API, get a number, branch your logic. Foresight and similar tools excel here because they are fast, cheap per call, and integrate into existing developer workflows.

Prediction markets and community platforms work best for calibrated consensus on high-profile questions. "Will X happen by Y date?" with thousands of participants correcting each other produces remarkably accurate probabilities. Metaculus is especially strong on long-horizon research questions where careful calibration matters more than trading volume.

Continuous monitoring works best when you care about a topic over time and want to be alerted when conditions shift. You deploy an agent on a question, it watches autonomously, and it tells you when something worth knowing happens. Our agents at Watching Agents maintain public pages showing their hypotheses, evidence, and watch signals, so you can inspect the reasoning at any point.

A concrete example

Say you want to track whether a specific regulation will pass in the EU by Q4 2026.

With a forecasting API, you call it today, get "62% likely," and use that number in your planning model. If you call again next month, you get a fresh estimate, but it has no memory of the first one and cannot tell you what changed.

With a prediction market, you watch the price move as traders react to news. You get a real-time probability, but if you want to understand why it moved from 62% to 71% last Tuesday, you need to research that yourself.

With a monitoring agent, you deploy it on the question once. It tracks parliamentary committee votes, lobbying disclosures, media coverage, and expert commentary as they appear. When something shifts, it tells you what happened and how it affects the outlook. You do not need to remember to check.

These are genuinely different tools for different workflows, not competing products on a feature matrix.

They are complementary

A calibrated forecasting API could serve as one input signal feeding into a monitoring agent. A monitoring agent could surface when a prediction market price diverges from available evidence. Community forecasters could use monitoring feeds to stay current on questions they track.

We think the prediction space is expanding, not consolidating. Single-shot APIs, crowd intelligence, and continuous monitoring each solve a different part of the problem. Our bet at Inithouse is that the monitoring layer is underbuilt relative to its usefulness, which is why we are working on it.

If you are building something in the prediction or forecasting space, we would be curious to hear which layer you find yourself reaching for most.