Look-ahead bias is quietly breaking your AI agent backtests

deepak deepak — Sun, 07 Jun 2026 16:01:45 +0000

You can't trust a backtest that can see the future. In 2026 everyone is wiring LLM agents into trading and decision loops and backtesting them - and almost every setup leaks future data into a 'historical' decision. A backtest with look-ahead bias looks incredible and loses money live.

What look-ahead bias actually is

At time t, your agent should only see data up to and including bar t. The moment it can peek at bar t+1 - or computes an indicator over the entire series, or normalizes using stats from the whole dataset - its decisions are cheating and the equity curve is fiction.

It is the single most common way backtests lie.

Why AI-agent frameworks make it worse

CrewAI, LangGraph, AutoGen and friends are built for conversations and tasks, not point-in-time simulation. The path of least resistance is to hand the agent the whole dataframe (or a tool that can query anything), and nothing stops it from reading tomorrow. The leak is invisible: your tests pass, your backtest rips, and live trading quietly bleeds.

The fix: make look-ahead a hard error

Instead of hoping your agent does not peek, make peeking impossible. bar-by-bar feeds your agent a frozen, point-in-time view that only exposes bars up to t. Read the future and you get an exception, not a silent bug:

LookaheadError: look-ahead blocked: requested bar 6 but only bars 0..5 are visible at t=5

bar-by-bar in 60 seconds

A framework-agnostic harness: feed any agent point-in-time bars, it returns a Decision per bar, and you get PnL / Sharpe / Sortino / max-drawdown - with look-ahead impossible by construction.

from bar_by_bar import Harness, momentum_agent, synthetic_series

result = Harness().run(synthetic_series(), momentum_agent)
print(result.metrics)   # total_return, sharpe, max_drawdown, ...

pip install -e .
bar-by-bar run --agent momentum
bar-by-bar lookahead-demo     # watch the guard block a cheating agent

Any agent works - wrap your LLM or strategy as a callable that takes the frozen view and returns a Decision. If it tries to look ahead, the run fails loudly instead of lying.

Why it matters

If you are putting an LLM agent anywhere near money, an honest backtest is the difference between a real edge and an expensive illusion. Make look-ahead a loud error, not a hope.

It is MIT and open source: https://github.com/Viprasol-Tech/bar-by-bar

Not financial advice - backtests are not guarantees. If it helps you ship a more honest backtest, a star means a lot. Built by Viprasol Tech.

The math of finding edge in prediction markets (and the open-source tools I built)

deepak deepak — Sun, 07 Jun 2026 14:42:54 +0000

Prediction markets (Kalshi, Polymarket) went mainstream in 2026. What most people miss is that the edge is real and measurable — but it's scattered, and the fees quietly kill the naive version of it.

I build trading systems, and I kept running into the same three problems. So I open-sourced two tools to solve them. This post is the math behind them.

Problem 1: "Free money" that isn't (fee-adjusted arbitrage)

A binary market has a YES and a NO contract. If they pay $1 on resolution, then in an efficient market:

YES_price + NO_price = $1.00

Sometimes they don't. If you can buy YES at $0.48 and NO at $0.49, that's $0.97 for a guaranteed $1.00 payout — a locked-in $0.03. Free money!

Except… fees. Kalshi's trading fee is roughly:

fee = ceil(0.07 * contracts * price * (1 - price))   # in cents

Run the numbers and a 3-cent gross "arbitrage" can become a loss after both legs' fees. The naive scanners that just check YES + NO < 1 will happily hand you negative-EV trades.

The fix is to compute everything fee-adjusted:

from edgehunt.arbitrage import single_market_arbitrage
op = single_market_arbitrage(market, fee_model=KalshiFeeModel())
print(op.fee_adjusted_profit)

Problem 2: the same event, two prices (cross-venue arbitrage)

The same event can trade at 47c YES on Kalshi and imply 51c on Polymarket. Buy YES on the cheaper venue, NO on the dearer one, and — fee-adjusted — you can lock a spread regardless of outcome. Doing this by hand across venues in real time is impossible; it's a scan-and-rank problem:

from edgehunt.scanner import Scanner
board = Scanner(feeds=[kalshi, polymarket]).scan()

Problem 3: you have an edge - how much do you bet? (EV + Kelly)

Your model says an outcome is 60% likely but the market prices it at 47c. Expected value (fee-adjusted) tells you if to bet; the Kelly criterion tells you how much so you grow long-run without going bust:

from edgehunt.ev import expected_value
from edgehunt.kelly import kelly_fraction, stake

ev = expected_value(price=0.47, edge_prob=0.60)
f  = kelly_fraction(price=0.47, edge_prob=0.60)
size = stake(f * 0.5, bankroll=1000)   # half-Kelly is safer

Half-Kelly is usually the sane default — full Kelly is mathematically optimal but brutally volatile.

Putting it together: edgehunt scan

edgehunt ties all of this into one command — a live terminal dashboard that ranks every opportunity, fee-adjusted, with Kelly sizing:

              edgehunt :: ranked opportunities  (bankroll $1,000)
 # | Venue             | Market                       | Type          | Edge/EV   | Fee-adj profit | Kelly stake
 1 | kalshi/polymarket | incumbent wins popular vote? | ARB (x-venue) | $0.131/pr |        +15.1%  |       $500
 2 | kalshi            | BTC above $100k on Dec 31?   | ARB (1-mkt)   | $0.030/pr |         +3.1%  |       $500
 3 | kalshi            | Fed cuts rates in March?     | MISPRICE      |  +22.0pts |        +60.6%  |       $164

It runs offline in 60 seconds with a built-in fake feed (no API keys); wire a real Kalshi/Polymarket feed with a tiny adapter. Pure Python, MIT, 85 tests.

git clone https://github.com/Viprasol-Tech/edgehunt
cd edgehunt && pip install -e .
edgehunt scan

Bonus: give your AI agent the same edge (marketsmith)

The obvious next step in 2026 was to expose this to AI agents over the Model Context Protocol. marketsmith is an MCP server with tools like search_markets, get_odds, find_arbitrage, compute_ev, and suggest_kelly — so you can ask Claude or Cursor to find a fee-adjusted arb and it actually can.

A note on honesty

These are analytics tools, not execution bots, and nothing here is financial advice. Backtests and scans are not guarantees — fees, slippage, and resolution risk are real. Start read-only and understand every number before risking capital.

Both are MIT and open source:

edgehunt: https://github.com/Viprasol-Tech/edgehunt
marketsmith: https://github.com/Viprasol-Tech/marketsmith

If the math or the tools help you find an edge, a star genuinely helps me prioritize what to build next. Built by Viprasol Tech.

DEV Community: deepak deepak