DEV Community

Crene
Crene

Posted on

I track 420 prediction sources with AI. Here's the open-source framework.

Everyone makes predictions. Almost nobody tracks them.
Elon Musk has said "full self-driving next year" five years in a row. Jim Cramer's stock picks are famously inverse-correlated with outcomes. Media outlets make bold forecasts and quietly move on when they're wrong.
I built Crene — a platform that uses 4 LLMs (Claude, GPT-4, Gemini, Grok) to track predictions from 420+ sources across tech, finance, politics, and geopolitics. Today I'm open-sourcing the core framework.
The problem
There's no standard infrastructure for prediction tracking. LangChain exists for agents. HuggingFace exists for models. Supabase exists for backends. But nothing exists for:
"Who predicted what, when did they say it, and were they right?"
That's what Signal Tracker solves.
Install
bashpip install signal-tracker
Zero dependencies. Stdlib only. Python 3.10+.
5-minute walkthrough
Track sources and claims
pythonfrom signal_tracker import SignalTracker
from datetime import date

tracker = SignalTracker()

Add sources

elon = tracker.add_source("Elon Musk", source_type="person", category="tech")
cramer = tracker.add_source("Jim Cramer", source_type="person", category="finance")
imf = tracker.add_source("IMF", source_type="institution", category="economics")

Add predictions

tracker.add_claim(
source=elon,
text="Tesla will achieve full self-driving by end of 2025",
target_date=date(2025, 12, 31),
)

tracker.add_claim(
source=cramer,
text="Netflix will hit $800 by Q2 2025",
target_date=date(2025, 6, 30),
)

tracker.add_claim(
source=imf,
text="Global GDP growth will reach 3.2% in 2025",
target_date=date(2025, 12, 31),
)
Verify when outcomes are known
pythontracker.verify(claim1, outcome="wrong", reasoning="FSD not achieved by deadline")
tracker.verify(claim2, outcome="correct", reasoning="Netflix reached $820 in May")
tracker.verify(claim3, outcome="partial", reasoning="GDP grew 2.9%, close but below target")
Build leaderboards
pythonboard = tracker.leaderboard(min_claims=3)

for entry in board.top_accurate:
print(f"{entry.rank}. {entry.source.name}: {entry.score.accuracy_score}%")

Also available:

board.worst_accurate # Bottom performers
board.biggest_risers # Improving fast
board.biggest_fallers # Getting worse
board.notable_wrongs # High-profile misses
The scoring system
Accuracy scoring
Simple percentage-based accuracy, but with nuance:

Partial correctness weighting: Configurable (default 0.5 — a partial hit counts as half)
Minimum claim threshold: Sources need at least 3 resolved claims for a meaningful score
Time-windowed scoring: Calculate accuracy for 30d, 90d, 12mo, and all-time separately

pythonwindows = tracker.accuracy_scorer.score_windowed(claims, source_id=source.id)
for period, snapshot in windows.items():
print(f" {period}: {snapshot.accuracy_score}%")
Recency-weighted scoring
More recent predictions matter more. Uses exponential decay with a configurable half-life:
pythonfrom signal_tracker.scoring import AccuracyConfig

config = AccuracyConfig(recency_half_life_days=90)
tracker = SignalTracker(accuracy_config=config)
A prediction from last week has 8x more influence than one from a year ago. This catches sources who were historically good but have recently fallen off.
Claim quality scoring
Not all predictions are created equal. "Things will get better eventually" is not the same as "Bitcoin will reach $150k by Q4 2025."
The quality scorer rates each claim 0-100 based on:
FactorWeightWhat it checksTime-bound30%Has a specific deadline?Measurable30%Has numeric targets?Falsifiable20%Clear success/failure criteria?Recency20%How recent is the claim?
pythonfrom signal_tracker import QualityScorer

scorer = QualityScorer()
score = scorer.score(claim) # 87.5 — highly trackable

Filter your dataset to meaningful claims only

high_quality = [c for c in claims if scorer.is_high_quality(c)]
The scorer uses regex patterns to detect prediction language, dollar amounts, percentages, date references, and hedge words. Vague language ("might", "could", "eventually") gets penalized.
Extracting predictions from text
This is where it gets interesting. Feed it a transcript, article, or tweet and it pulls out the predictions.
Rule-based (fast, no API calls)
pythontext = """
In his latest interview, the CEO predicted that revenue would
exceed $10 billion by Q2 2025. He also forecast that the company
would reach 100 million users within 18 months.
"""

claims = tracker.extract_claims(text, source=ceo)
for claim in claims:
print(f" {claim.text}")
print(f" Target: {claim.target_date}")
print(f" Category: {claim.category}")
print(f" Quality: {claim.quality_score}")
LLM-powered (more accurate)
Bring your own LLM function:
pythonimport anthropic

client = anthropic.Anthropic()

def my_llm(prompt: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}],
)
return response.content[0].text

tracker = SignalTracker(llm_fn=my_llm)
claims = tracker.extract_claims(transcript, source=analyst, use_llm=True)
The LLM integration is completely model-agnostic. Any function with signature (str) -> str works — OpenAI, Anthropic, Gemini, local models, whatever.
Multi-model consensus verification
This is how we verify claims in production at Crene. Instead of trusting one model, run multiple:
pythontracker.verify_with_consensus(claim, [
{"outcome": "correct", "verifier": "ai:claude", "confidence": 0.9},
{"outcome": "correct", "verifier": "ai:gpt-4", "confidence": 0.85},
{"outcome": "wrong", "verifier": "ai:gemini", "confidence": 0.6},
])

Result: "correct" — weighted consensus wins

Outcomes are weighted by confidence scores. If three models agree with high confidence and one disagrees with low confidence, the consensus still holds.
Tamper detection
Every claim gets a SHA-256 hash at creation time:
pythonclaim = tracker.add_claim(source, "Bitcoin to $200k by 2025")
print(claim.content_hash) # a1b2c3d4...

Later, verify nothing was changed

claim.verify_integrity() # True

If someone modifies the text...

claim.text = "I never said that"
claim.verify_integrity() # False — hash mismatch
Persistence
JSON (simple)
pythontracker.save("my_tracker.json")
tracker = SignalTracker.load("my_tracker.json")
SQLite (for larger datasets)
pythonfrom signal_tracker.storage import SQLiteBackend

backend = SQLiteBackend("tracker.db")
backend.save_source(source)
backend.save_claim(claim)

Query

all_claims = backend.list_claims(source_id="elon-musk")
Architecture
signal-tracker/
├── tracker.py # SignalTracker — main interface
├── models.py # Source, Claim, Verification, ScoreSnapshot
├── scoring.py # AccuracyScorer, QualityScorer
├── extractors.py # ClaimExtractor (rules + LLM)
├── leaderboard.py # Leaderboard engine
└── storage.py # SQLiteBackend
Design principles:

Zero required dependencies — stdlib only for core
Bring your own LLM — any provider works
Pluggable storage — JSON, SQLite, or build your own
Plain dataclasses — no ORM dependency anywhere

What's next
The roadmap depends on what the community wants:

v0.2 — REST API server (FastAPI)
v0.3 — Auto-ingest from RSS, Twitter, YouTube transcripts
v0.4 — Dashboard UI (React)
v0.5 — Prediction market integrations (Polymarket, Kalshi)
v0.6 — Blockchain anchoring for tamper-proof records

Try it
bashpip install signal-tracker

GitHub: github.com/Creneinc/signal-tracker
PyPI: pypi.org/project/signal-tracker
Production version: crene.com (see the Signals tab)

40 tests passing. MIT licensed. Contributions welcome.
The framework is free. The data is the moat.

Top comments (0)