DEV Community: Matthieu David

Your backtest is lying to you: 6 ways future data leaks in

Matthieu David — Fri, 22 May 2026 14:51:30 +0000

If you have ever built a strategy with a gorgeous equity curve that fell apart the moment you traded it live, you probably did not have a bad strategy. You had a leak.

Look-ahead bias is the bug that makes a backtest lie. Future information sneaks into a decision the strategy could not have known at the time, the historical results look incredible, and live trading is where reality collects the bill. ML people call it data leakage. Traders call it repainting. Same disease.

I build a backtesting engine, so I have stepped on every one of these. Here are the six ways future data leaks in, from the obvious to the ones that quietly destroy you.

1. Signaling on the bar that has not closed yet

The classic. You evaluate a condition on close[0], the current bar, and act on it. But in real time that bar is not finished. Its close is still moving. You are effectively deciding with information from the future of that candle.

The fix is boring and absolute: evaluate on close[1], the last confirmed bar. Every signal, every indicator input. If your backtest uses the forming bar anywhere, the numbers are fiction.

2. Filling inside the bar you used to decide

Subtler. Say your rule is "enter when price breaks the high of the previous range." You detect the break using a bar's high, then you fill the entry at a price inside that same bar. But you only knew the high existed because the bar already printed it. In live trading you would have been filled on the way up, at a worse price, or not at all.

A candle gives you O, H, L, C and nothing about the path between them. If your entry and your detection both depend on the same candle's extremes, you are reading the future. Decide on the closed bar, fill on the next bar's open, and the lie disappears.

3. Repainting indicators

Some indicators revise their own history. A value that showed X two bars ago now shows Y because new data arrived. ZigZag, certain pivot detectors, anything that anchors to a later confirmed point, and a lot of the flashier "smart money" tools.

If you backtest against the repainted (final) values, you are testing on data that did not exist at decision time. The indicator looks psychic because it is. Test only against the value the indicator would have shown live, on the bar it would have shown it.

4. Normalizing with statistics from the whole series

This is the one that catches the ML crowd. You z-score your feature: subtract the mean, divide by the standard deviation. Easy. Except you computed the mean and standard deviation over the entire dataset, including data that comes after each point.

Every normalized value before today now carries a whisper of the future. Your model trains on it and looks brilliant in backtest. The fix is rolling, point-in-time statistics: at each bar, use only what was known up to that bar. Slower, correct.

5. Optimizing on the same data you report

You try 500 parameter combinations on 2016 to 2026, pick the best Sharpe, and report that Sharpe. You did not find an edge. You found the combination that best fit the noise of that exact decade. Run it forward and it reverts to mediocre.

This is just overfitting wearing a backtest costume. Walk-forward analysis is the honest answer: optimize on a window, test on the next unseen window, roll forward, and report only the out-of-sample results.

6. Survivorship bias in the universe

You backtest a stock strategy on the instruments that exist today. But the ones that delisted, went to zero, or got acquired are missing. Your universe is pre-filtered for survivors, so your results skew up. You need point-in-time universe data, the list as it was on each date, not the list as it is now.

Why you cannot fix this with discipline

The tempting conclusion is "just be careful." That fails, because look-ahead bias is the default, not the exception. Every shortcut leaks. The current bar is right there. The full-series mean is one function call away. Careful breaks down the moment you are tired or moving fast.

The only thing that holds is architecture. In the engine I work on, a strategy literally cannot reference the forming bar. The evaluation step is causal by construction: it hands a block only the data that existed at that timestamp, so there is nothing future to leak. Anti-repainting is not a checkbox you tick, it is a property of the system that you make impossible to violate. That is also why I built the no-code backtesting tool I am building to enforce close[1] at the engine level rather than trusting the user to remember.

If you take one thing from this: a backtest is only as honest as the information boundary it respects. Make that boundary impossible to cross, and the curve you see is the curve you can actually trade.

What is the worst leak you have shipped to production? Mine was the z-score one. Looked like genius for a week.

The hardest part of building a no-code backtester wasn't the backtest. It was the export.

Matthieu David — Fri, 22 May 2026 14:42:47 +0000

Two backtest engines will never agree by default. I learned that the slow way.

I build a no-code backtester. You drag blocks (indicators, conditions, entries, exits), hit run, and get an equity curve in about 30 seconds on 10 years of minute data. The fun part was never the backtest itself. The part that ate months was the export: turning that visual strategy into Pine Script that, when you paste it into TradingView, produces the same trades my own engine did.

It almost never did at first. Same rules, same data, two completely different equity curves. If you have ever ported a strategy from one platform to another and watched the numbers fall apart, you already know the feeling.

Here is what actually causes the divergence, and how I got it under 2%.

Why two engines disagree

The naive assumption is that a strategy is a set of rules, and rules are rules. They are not. A strategy is rules plus an execution model, and the execution model is full of decisions nobody writes down.

1. Bar timing (this is the big one). When does a signal fire? On the close of the current, still-forming bar, or on the close of the last confirmed bar? If you evaluate close[0] (the current bar), your backtest looks amazing and your live results are garbage, because in real time that bar is not closed yet. This is repainting. The only honest answer is to evaluate on close[1], the previous confirmed bar. If your engine uses close[0] and your Pine code uses close[1] (or the reverse), the two will diverge on every single signal.

2. Indicator warmup and seeding. An EMA needs a starting value. Some implementations seed it with the first price, some with an SMA of the first N bars, some with zero. RSI has the same issue with its first average gain and loss. Run the same EMA(200) through two libraries and the first few hundred bars will not match. On a 10-year backtest that tail is small, but if your entries cluster early it matters.

3. Order fill assumptions. A market order fired on a bar close: does it fill at that close, or at the next bar open? When stop loss and take profit sit inside the same candle, which one fills first? The candle only gives you O, H, L, C, so you cannot know the real path. You have to pick a rule (worst case: assume the stop hits first) and apply the exact same rule in both engines. Pick differently and your win rate shifts by several points.

4. Floating point and rounding. Tick size, price rounding, position sizing rounded to lots. Tiny per-trade, but it compounds across thousands of trades.

5. Sessions and gaps. Where does a daily bar start in UTC? How do you handle the weekend gap in forex? A one-hour offset in session boundaries silently shifts every intraday signal.

None of these are bugs. They are unwritten choices, and two engines made them independently.

The fix: stop trying to match Pine, and define one model both can express

My first instinct was to make my Python engine reproduce TradingView exactly. Wrong direction. Pine is a black box that changes, and you cannot unit test someone else's cloud.

What worked was the opposite. I defined a single execution model that is the lowest common denominator both engines can express without ambiguity, then forced both sides to obey it:

Signals evaluate on confirmed bars only. close[1], never close[0]. The engine enforces this at the system level, so a strategy literally cannot reference the forming bar. Repainting stops being a discipline problem and becomes impossible.
Market orders fill at the next bar open. No "fill at this close" shortcut.
Same-candle stop and target resolve worst case first.
One indicator implementation, with the warmup recurrence written to match Pine's documented behavior (not a generic library default).

Once the model is pinned down, the export becomes a compiler problem instead of a guessing game.

Deterministic codegen, not string templating

Each visual block maps to a small, pure Pine fragment. An RSI block is the same Pine every time. A "crosses below" condition is the same Pine every time. Strategy export is just composing those fragments in topological order and wiring the inputs. No branching on "what did the user probably mean," because the model already removed the ambiguity.

That determinism is what makes the result testable, which leads to the only part that actually guarantees anything.

The parity harness is the whole product

A "2% divergence guarantee" means nothing if you measure it once by hand. So the real work was a test harness:

Generate a batch of strategies covering the block library (trend, mean reversion, SMC patterns, multi-condition).
Run each through my engine on a fixed dataset.
Export each to Pine, run it on the same symbol and timeframe in TradingView, pull the results.
Assert that trade count, win rate, and final equity diverge by less than 2%. If any strategy breaks the threshold, the build fails.

Most of the early failures were not in the codegen. They were in the model assumptions above. Every time a strategy blew past 2%, it pointed at one more unwritten decision I had not pinned down. The harness was less a test and more a way to find the assumptions I did not know I was making.

What I would tell anyone building cross-engine anything

The rules are the easy part. The execution model is the product.
Repainting is not a feature you add. It is a default you have to actively forbid. Enforce close[1] at the system level or your users will footgun themselves.
Do not chase parity by reverse engineering the other engine. Define a model both can express and make both obey it.
If you claim a number, gate it in CI. A guarantee you cannot reproduce automatically is marketing, not engineering.

I am building this as a visual tool so traders who cannot code can still get an honest backtest and walk out with clean Pine Script. If you want to see the export side of it, it is here: the visual backtester I work on. But the parity lessons above apply whether you use it or roll your own.

If you have shipped cross-platform strategy export and solved the same-candle stop/target problem differently, I would genuinely like to hear how. That one still keeps me up.

Why I Built a No-Code Backtesting Engine (And What I Learned About Anti-Repainting)

Matthieu David — Thu, 02 Apr 2026 14:20:43 +0000

I'm a trader and full-stack dev. I spent 5 years trading Forex, got funded on FTMO, lost it all when market conditions changed. The one thing that could have saved me? Proper backtesting before going live.

The problem is, backtesting tools suck for non-coders. TradingView requires Pine Script. MetaTrader requires MQL. QuantConnect requires Python/C#. If you're a visual trader using SMC/ICT concepts, you're stuck manually scrolling through charts bar by bar.

So I built Backtrex - a no-code backtesting platform where you drag visual blocks instead of writing code.

The anti-repainting problem nobody talks about

Here's something I discovered while building the backtesting engine: most indicators repaint and nobody warns you about it.

Repainting means an indicator changes its past values based on future data. Your backtest shows a perfect entry signal, but in live trading that signal never existed at that candle. Your backtest is lying to you.

Common offenders:

Zigzag indicators - they redraw when a new pivot forms
Renko charts - bar boundaries shift as new data arrives
Community Pine Script indicators - many use close (current bar) instead of close[1] (confirmed bar)

Our engine enforces a simple rule: every indicator only uses close[1] (the previous confirmed bar). Current bar data is never used in signal generation. This alone eliminates the most common source of false backtest results.

The architecture decision that saved us

We considered two approaches for the backtest engine:

Event-driven (process each tick/bar sequentially)
Vectorized (compute all signals at once using numpy arrays)

We went with event-driven despite being slower, because it accurately models what happens in live trading. You can't know the next bar's close when you're deciding to enter on the current bar. Vectorized backtests often introduce subtle lookahead bias.

The tradeoff: we had to optimize heavily to keep backtests under 30 seconds on 10 years of M1 data. Cython for hot paths, pre-computed indicator caches, and a custom candle aggregation pipeline.

What I'd do differently

Start with fewer indicators. We built 50+ blocks. 80% of users use the same 10. Ship the core fast, add the rest later.
Don't underestimate Pine Script export. Users want to deploy on TradingView after validating in our tool. The export parity guarantee (<2% divergence) took 3x longer than expected.
Build for the community you're in. Our best early users are SMC/ICT traders because nobody else builds native Order Block and FVG detection. Find your niche before going broad.

If you're interested, the landing page is at backtrex.com. Still in early access, feedback welcome.

Building in public. AMA in the comments.