2026 Quantitative Strategy Backtesting Historical Data API

#tutorial #python #api

I. Core Value of Historical Data API in Quantitative Backtesting

In the world of quantitative trading, “data is the foundation, speed is the lifeline.” Especially in 2026, as quantitative strategies evolve toward higher frequency and multi-asset approaches, the choice of data source directly determines the accuracy of backtesting results and the feasibility of strategy deployment.

One often-overlooked fact is that a significant portion of the profit deviation between live trading and backtesting stems from data source issues — millisecond-level delays, timestamp misalignment, and data gaps. These technical details frequently become critical bottlenecks restricting strategy profitability.

For minute-level statistical arbitrage strategies, a delay of a few seconds may still be acceptable. However, for high-frequency market-making or cross-exchange arbitrage strategies, even a microsecond difference can determine success or failure.

Moreover, the cost of switching data sources is extremely high. Once a strategy becomes deeply coupled with a specific data source’s field definitions, timestamp formats, and error-handling logic, migrating to a new data source can require weeks or even months of refactoring.

Therefore, selecting a high-quality data API at the project initiation stage delivers far greater value than fixing issues later.

For quantitative developers, an ideal backtesting data API should meet the following core requirements:

Historical Data Completeness: Covers a sufficiently long time span and supports multiple granularities, including daily, minute, and even Tick-level data.
Data Consistency: Historical data and real-time market data maintain the same structure, reducing adaptation costs when switching between backtesting and live trading.
Ease of Use: Provides standard REST API or WebSocket interfaces that integrate seamlessly with Python/Pandas and other quantitative toolkits.
Controllable Cost: Offers reasonable free quotas and transparent pay-as-you-go pricing.

In actual technical selection, the iTick API provides a highly valuable solution. It covers major global markets including Hong Kong stocks, U.S. stocks, A-shares, forex, futures, and cryptocurrencies. It supports both RESTful and WebSocket interfaces to meet different scenario needs — REST API is suitable for batch historical data queries, while WebSocket provides low-latency data streams for real-time trading scenarios. Its historical data backtracking function supports up to 15 years of daily-level data downloads, providing reliable support for strategy backtesting.

II. Quick Overview of API Technical Capabilities

Before diving into code practice, here is a quick look at several core technical indicators of this API to help you determine whether it suits your strategy scenario.

Market Coverage: Covers multiple major global markets, including forex (GB market), stocks (Hong Kong HK, Shenzhen SZ, Shanghai SH, U.S. US, etc.), futures (US, HK, CN), and funds (US, etc.). A single API can meet the data needs of multi-asset strategies.

Data Granularity: Supports full-granularity K-line data from Tick-level tick-by-tick transactions to minute, hour, daily, weekly, and monthly lines, satisfying needs ranging from high-frequency backtesting to long-term trend strategies.

Real-time Performance: In WebSocket real-time push mode, forex data latency is as low as 30ms, and major market quotes are kept within 100ms. Combined with global node acceleration networks, stable data transmission is maintained even in cross-market scenarios.

Interface Protocols: Provides both RESTful HTTP GET requests and WebSocket real-time push. REST API is suitable for batch historical data queries, while WebSocket is ideal for low-latency real-time data stream subscriptions. Both use the same authentication system and data format, resulting in very low switching costs.

III. Code Practice for Backtesting Data Pipeline

Below, we use the iTick API as an example to build a complete quantitative backtesting data pipeline — covering historical data acquisition, local caching, and integration with backtesting frameworks. All code is written in Python and can be copied and modified according to your needs.

3.1 Basic REST API: Retrieving Historical K-line Data

The REST API is the most commonly used method for acquiring historical data and is suitable for batch downloads and offline backtesting scenarios.

import requests
import pandas as pd
import time
from typing import Optional, List

class HistoricalDataClient:
    """
    Historical data client based on iTick REST API
    Documentation: https://itick.org
    """

    def __init__(self, api_token: str, base_url: str = "https://api.itick.org"):
        self.api_token = api_token
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "accept": "application/json",
            "token": api_token
        })

    def get_kline(
        self,
        symbol: str,
        region: str,
        ktype: str = "8",
        limit: int = 100,
        end_time: Optional[int] = None,
        max_retries: int = 3
    ) -> pd.DataFrame:
        """
        Get historical K-line data

        Args:
            symbol: Stock/Forex code (e.g. "AAPL", "EURUSD")
            region: Market region (e.g. "US", "HK", "SH", "SZ", "GB")
            ktype: K-line type, "1"-"10" represent 1/5/10/30 min, 1/2/4 hour, daily/weekly/monthly
            limit: Number of K-lines to retrieve
            end_time: End timestamp (milliseconds), defaults to current time
            max_retries: Maximum retry attempts

        Returns:
            pandas DataFrame containing OHLCV data
        """
        endpoint = f"{self.base_url}/stock/kline"

        if end_time is None:
            end_time = int(time.time() * 1000)

        params = {
            "region": region,
            "code": symbol,
            "kType": ktype,
            "limit": limit,
            "et": end_time
        }

        for attempt in range(max_retries):
            try:
                resp = self.session.get(endpoint, params=params, timeout=30)
                resp.raise_for_status()
                data = resp.json()

                if data.get("code") != 0:
                    raise RuntimeError(f"API error: {data.get('msg')}")

                candles = data.get("data", [])
                if not candles:
                    return pd.DataFrame()

                df = pd.DataFrame(candles)
                # iTick returns timestamp field as 't' (milliseconds)
                df["datetime"] = pd.to_datetime(df["t"], unit="ms")
                df.set_index("datetime", inplace=True)
                # Rename columns to standard OHLCV naming
                df.rename(columns={
                    "o": "open",
                    "h": "high",
                    "l": "low",
                    "c": "close",
                    "v": "volume"
                }, inplace=True)
                return df[["open", "high", "low", "close", "volume"]]

            except requests.exceptions.RequestException as e:
                if attempt == max_retries - 1:
                    raise RuntimeError(f"Failed to fetch data: {e}")
                time.sleep(2 ** attempt)  # Exponential backoff

# Usage example
client = HistoricalDataClient(api_token="{{YOUR_API_TOKEN}}")
df = client.get_kline(
    symbol="AAPL",
    region="US",
    ktype="8",      # Daily
    limit=200       # Last 200 daily bars
)
print(f"Retrieved {len(df)} daily bars")
print(df.head())

3.2 Batch Retrieval and Local Caching

For large-scale backtesting, frequent API calls are not only limited by rate quotas but also slow down backtesting speed. It is strongly recommended to implement a local data caching mechanism:

import sqlite3
import json
from pathlib import Path
from datetime import datetime

class CachedDataClient(HistoricalDataClient):
    """Enhanced client with local SQLite caching"""

    def __init__(self, api_token: str, cache_dir: str = "./data_cache"):
        super().__init__(api_token)
        self.cache_dir = Path(cache_dir)
        self.cache_dir.mkdir(exist_ok=True)
        self._init_db()

    def _init_db(self):
        """Initialize SQLite database"""
        self.db_path = self.cache_dir / "market_data.db"
        with sqlite3.connect(self.db_path) as conn:
            conn.execute("""
                CREATE TABLE IF NOT EXISTS kline_cache (
                    symbol TEXT NOT NULL,
                    region TEXT NOT NULL,
                    ktype TEXT NOT NULL,
                    end_time INTEGER NOT NULL,
                    limit_num INTEGER NOT NULL,
                    data_json TEXT NOT NULL,
                    cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                    PRIMARY KEY (symbol, region, ktype, end_time, limit_num)
                )
            """)

    def get_kline_cached(
        self,
        symbol: str,
        region: str,
        ktype: str = "8",
        limit: int = 100,
        end_time: Optional[int] = None,
        force_refresh: bool = False
    ) -> pd.DataFrame:
        """Get from cache first; call API and cache if missed"""

        if end_time is None:
            end_time = int(time.time() * 1000)

        # Generate cache key
        cache_key = (symbol, region, ktype, end_time, limit)

        # 1. Check cache (unless force refresh)
        if not force_refresh:
            with sqlite3.connect(self.db_path) as conn:
                cursor = conn.execute(
                    """SELECT data_json FROM kline_cache
                       WHERE symbol=? AND region=? AND ktype=?
                       AND end_time=? AND limit_num=?""",
                    cache_key
                )
                row = cursor.fetchone()
                if row:
                    print(f"✅ Cache hit: {symbol} ({region})")
                    return pd.read_json(row[0], orient="split")

        # 2. Cache miss — call API
        print(f"⏳ Calling API: {symbol} ({region})")
        df = super().get_kline(symbol, region, ktype, limit, end_time)

        if df.empty:
            return df

        # 3. Save to cache
        with sqlite3.connect(self.db_path) as conn:
            conn.execute(
                """INSERT OR REPLACE INTO kline_cache
                   (symbol, region, ktype, end_time, limit_num, data_json)
                   VALUES (?, ?, ?, ?, ?, ?)""",
                (*cache_key, df.to_json(orient="split"))
            )

        return df

# Usage example
cached_client = CachedDataClient(api_token="{{YOUR_API_TOKEN}}")
df = cached_client.get_kline_cached("600519", "SH", ktype="8", limit=100)
print(f"Retrieved {len(df)} daily bars")

3.3 Integration with Backtesting Frameworks

After obtaining the data, it needs to be integrated into a quantitative backtesting framework. Here is an example using Backtrader:

import backtrader as bt

class PandasDataFeed(bt.feeds.PandasData):
    """Convert pandas DataFrame to Backtrader data feed"""
    params = (
        ('datetime', None),
        ('open', 'open'),
        ('high', 'high'),
        ('low', 'low'),
        ('close', 'close'),
        ('volume', 'volume'),
        ('openinterest', -1),
    )

def run_backtest_with_iTick_data():
    """Run backtest using iTick historical data"""
    cerebro = bt.Cerebro()

    # Initialize client
    client = CachedDataClient(api_token="{{YOUR_API_TOKEN}}")

    # Get historical data for multiple stocks
    symbols = [
        {"symbol": "AAPL", "region": "US", "name": "Apple"},
        {"symbol": "MSFT", "region": "US", "name": "Microsoft"},
        {"symbol": "600519", "region": "SH", "name": "Kweichow Moutai"}
    ]

    for item in symbols:
        df = client.get_kline_cached(
            symbol=item["symbol"],
            region=item["region"],
            ktype="8",    # Daily
            limit=500     # Approximately 2 years of data
        )
        if not df.empty:
            data = PandasDataFeed(dataname=df)
            cerebro.adddata(data, name=item["name"])
            print(f"✅ Loaded {item['name']} data, {len(df)} bars")

    # Set initial capital
    cerebro.broker.setcash(100000.0)
    print(f"Initial capital: {cerebro.broker.getvalue():.2f}")

    # Add strategy (using built-in simple moving average crossover as example)
    cerebro.addstrategy(bt.strategies.SMA_CrossOver)

    # Run backtest
    results = cerebro.run()
    print(f"Capital after backtest: {cerebro.broker.getvalue():.2f}")

if __name__ == "__main__":
    run_backtest_with_iTick_data()

3.4 WebSocket Real-time Data Subscription

For scenarios requiring real-time strategy signal validation, iTick’s WebSocket interface provides low-latency data push capabilities. In real tests, forex data latency is as low as 30ms, and stock quotes are controlled within 100ms.

import websocket
import json
import threading
import time

class iTickWebSocketClient:
    """iTick WebSocket real-time market data client"""

    def __init__(self, api_token: str):
        self.api_token = api_token
        self.ws_url = "wss://api.itick.org/sws"
        self.ws = None
        self.is_connected = False
        self.subscribed_symbols = set()
        self.on_quote_callback = None

    def connect(self):
        """Establish WebSocket connection"""
        self.ws = websocket.WebSocketApp(
            self.ws_url,
            on_open=self._on_open,
            on_message=self._on_message,
            on_error=self._on_error,
            on_close=self._on_close,
            header={"token": self.api_token}
        )
        threading.Thread(target=self.ws.run_forever, daemon=True).start()

    def _on_open(self, ws):
        print("✅ WebSocket connection established")
        self.is_connected = True

        # Send authentication message
        auth_msg = {"ac": "auth", "params": self.api_token}
        ws.send(json.dumps(auth_msg))

    def _on_message(self, ws, message):
        """Process received market data"""
        try:
            data = json.loads(message)
            if self.on_quote_callback:
                self.on_quote_callback(data)
        except json.JSONDecodeError:
            print(f"⚠️ Failed to parse message: {message}")

    def _on_error(self, ws, error):
        print(f"❌ WebSocket error: {error}")

    def _on_close(self, ws, close_status_code, close_msg):
        print("🔌 WebSocket connection closed")
        self.is_connected = False

    def subscribe(self, symbols: list, data_types: list = None):
        """
        Subscribe to real-time market data

        Args:
            symbols: List of symbols, e.g. ["600519$SH", "AAPL$US", "EURUSD$GB"]
            data_types: Data types, optional "quote", "depth", "tick", default ["quote"]
        """
        if not self.is_connected:
            raise RuntimeError("WebSocket not connected. Please call connect() first.")

        if data_types is None:
            data_types = ["quote"]

        params = ",".join(symbols)
        types = ",".join(data_types)

        subscribe_msg = {
            "ac": "subscribe",
            "params": params,
            "types": types
        }
        self.ws.send(json.dumps(subscribe_msg))
        print(f"📡 Subscribed to: {params}")

# Usage example
def on_quote_received(data):
    """Market data callback function"""
    if data.get("ac") == "quote":
        print(f"Received quote: {data}")

# Create client and subscribe
ws_client = iTickWebSocketClient(api_token="{{YOUR_API_TOKEN}}")
ws_client.on_quote_callback = on_quote_received
ws_client.connect()

# Wait for connection
time.sleep(2)

# Subscribe to real-time quotes for multiple instruments
ws_client.subscribe([
    "600519$SH",   # Kweichow Moutai (A-share)
    "AAPL$US",     # Apple (U.S. stock)
    "EURUSD$GB"    # EUR/USD (Forex)
])

IV. Three Major Technical Considerations for Backtesting Data Quality

4.1 Look-Ahead Bias Protection

One of the most common mistakes when writing trading models is look-ahead bias — accidentally using future data in the code. For example, using the current day’s closing price when calculating technical indicators, while in real trading the closing price is unknown before the market closes. When using historical data APIs, always ensure the timestamp of the data represents the moment the trade occurred, not the moment the data was published. The K-line data returned by iTick includes millisecond-level timestamps, helping developers precisely control time logic in backtesting frameworks.

4.2 Adjusted Data Handling

Stock dividends, splits, and rights issues directly affect the continuity of price series. Using unadjusted historical prices for backtesting may generate false trading signals — for instance, a sudden “halving” of price after a split may trigger erroneous stop-loss signals. Professional-grade APIs usually provide adjustment options. When using them, confirm whether the data has been adjusted and understand the exact adjustment methodology.

4.3 Cross-Market Timestamp Alignment

In cross-exchange arbitrage strategies, timestamp deviations between different exchanges’ data can lead to “false arbitrage signals.” For example, if the quote time difference for the same instrument between Market A and Market B exceeds 50ms, the price spread observed in backtesting may not exist in actual trading. iTick achieves millisecond-level synchronization of multi-market data through its global node acceleration network — something free data sources rarely guarantee.

V. Summary and Recommendations

For quantitative developers, the following path is recommended for efficient use of historical data APIs:

Validation Phase: Start with the free data quota to quickly validate strategy logic without early cost commitment. iTick’s free plan already includes basic real-time quotes and historical K-line queries, sufficient for initial strategy validation.

Optimization Phase: Once the strategy performs stably on basic data, use the high-quality historical data provided by the API for refined backtesting. Pay special attention to switching data granularities — upgrading from daily to minute or even Tick-level backtesting often reveals issues hidden at the daily level.

Live Trading Preparation: Ensure the historical data used in backtesting has exactly the same data structure, timestamp format, and field definitions as the live market data. iTick’s REST and WebSocket interfaces use the same authentication system and data format, making the transition from backtesting to live trading nearly seamless.

Cost Control: Make full use of the local caching mechanism to avoid repeatedly downloading the same data. After a one-time batch download and storage in SQLite, subsequent backtests can read directly from local storage — saving both cost and time. For long-term quantitative projects, it is recommended to build an automated data update script to periodically pull incremental data.

Reference Documents:

https://blog.itick.org/itick-ema12-strategy-backtesting-tutorial

GitHub: https://github.com/itick-org/