DEV Community

greymoth
greymoth

Posted on

Building an Inference OS: deterministic-first router for prediction markets

Building an Inference OS for prediction markets

Most AI agent stacks default to "throw the prompt at GPT-4o, hope for the best." For prediction markets that's expensive AND wrong — most market questions don't need a paid LLM at all. Here's how we built a 6-hook deterministic-first inference router on top of Kairon Forge.

The 6 hooks (in priority order)

  1. Market Regime classifier — 5 deterministic regimes (whale_dominant / meme_volatile / macro_anchored / panic_liquidation / dead_liquidity). Confident classification short-circuits the entire router. Zero LLM call.
  2. Anomaly detector — 3σ price spike + sentiment divergence. Confident anomaly FORCES Tier-2 (paid Claude/Anthropic), bypassing the viability cost cap on rare-and-important markets.
  3. Time-to-Resolution decay — exponential confidence decay vs event horizon. Low decayed confidence forces Tier-1 (Haiku-only).
  4. Persona overlay — 5 archetype priors (calibrated_researcher / whale_mimic / panic_seller / momentum_trader / contrarian) adjust baseline confidence.
  5. Panic mode circuit breaker — 60s rolling burn-rate σ. >2σ from baseline → force Ollama-only.
  6. Economic Viability Filter — per-tier hard cost cap (Free $0.05 / Pro $0.50 / Elite $5 / Enterprise $100). >cap → 402 quotaExhausted.

Cost-aware Cognition

Before every paid call, EIG / cost ratio gate (shouldEscalate(eig, cost, threshold=0.5)). Information gain ÷ inference cost. Below threshold → collapse to Tier-1 + budget consumption note.

Test coverage

350+ inference tests covering router decision boundaries. Components: budget consumption gate, complexity classifier (trivial / medium / rare_hard), Tier-2 dispatch, recursion-depth + context-bloat guards, reflection-loop + duplicate-prompt detection.

Why this matters

Cursor's silent auto-upgrade on quota exhaustion triggered viral brand backlash + US state class-action allegations. We engineered a structural answer: tier caps, panic mode, no-auto-charge — all enforced at the router layer.

Source: github.com/greymoth-jp · Live: kairon.trade


This is part of the API Kernel work at services/kairon-guardian/ — happy to answer architecture questions.

Top comments (0)