You’re Probably Backtesting Forex with Too Short History — Here’s How We Verify

#webdev #productivity

We’re a brokerage advisory team, and we spend a lot of time stress-testing forex strategies for our clients. If there’s one silent killer we’ve identified over the years, it’s this: forex API data history that’s too short.

Let’s walk through how we detect this problem and how we now structure our validation process.

What Our Clients Want

Traders who come to us want strategies that hold up in live conditions, not just in a perfect backtest. They need to know whether a model can survive a flash crash, a central bank surprise, or a prolonged low-volatility grind.

When a client shows us a strategy with stellar metrics, our immediate question is: How many years of data did you use? If the answer is two or three, we suspect the strategy hasn’t been stress-tested enough.

Where We Got Burned

We’ve been on both sides of the table. In our early days, we built strategies on APIs that provided only a few years of minute bars. The backtests were beautiful. When we later plugged in a decade of data, the performance imploded. That taught us the hard way: history length is not a detail, it’s a pillar of robustness. Short data windows give you the illusion of consistency by hiding the ugly parts.

The Invisible Differences Between APIs

Even when APIs claim “historical data,” the offerings differ in subtle ways:

Some provide data only from 2018, others from 2000;
Tick vs. K-line granularity mixes can distort entry/exit simulations;
High-volatility periods are often trimmed or smoothed;
The way the mid-price is calculated affects spread modeling.

These silent variations change your backtest distribution without ever throwing an error.

What Happens to Your Backtest

When we lengthen the history, we routinely observe:

The equity curve reshapes — smoothness turns into jaggedness;
Maximum drawdown is re-rated, often doubling;
Win rate adjusts downward;
Trade frequency and slippage models break.

If your strategy is short-term or high-frequency, insufficient history makes it memorize one specific micro-regime. Out of sample means out of luck.

Our Go-To Verification: Time Slicing

We now segment the historical data into windows: 1 year, 3 years, 5 years, and run the strategy on each. A strategy that only thrives in the 1-year window is considered regime-dependent.

To gather the raw ticks for this, we’ve used interfaces like AllTick API, which provide long tick histories via WebSocket. We store the data and then slice it. The core code snippet we use is:

import websocket
import json
import pandas as pd

data = []

def on_message(ws, message):
    msg = json.loads(message)
    data.append({
        "time": msg["ts"],
        "price": msg["price"],
        "volume": msg["volume"]
    })

ws = websocket.WebSocketApp(
    "wss://stream.alltick.co/ws",
    on_message=on_message
)
ws.run_forever()

df = pd.DataFrame(data)

# Slice by different time windows for backtesting
df_1y = df[df["time"] > "2026-06-01"]
df_3y = df[df["time"] > "2024-06-01"]

Seeing the performance divergence across these slices has saved us — and our clients — from deploying fragile strategies.

Our Advisory Upgrade

Our selection criteria for forex data sources have shifted from latency-first to depth-first. We care about how many market cycles are in the data, not just how recent it is. Short history can validate a strategy in a greenhouse, but only long history tests it in the wild.

If a strategy shines only in a narrow historical window, we tag it as a curve-fit artifact, not a real trading solution. That’s the standard we now hold for every strategy that carries a client’s capital.