San Si wu

Posted on Jun 6

Quant Backtesting: Stock/Forex APIs with iTick

#tutorial #python #api

I. From "Beautiful Backtests" to "Live Trading Failures": The First Lesson in Quantitative Data Interfaces

A common dilemma in quantitative hedge fund development is that strategies perform brilliantly in backtesting but frequently fail in live trading. Severe slippage, signal lag, and even complete divergence between backtest curves and live trading returns—these issues are not uncommon in the industry. According to industry observations, over 85% of quantitative strategy failures stem primarily from delayed market data or unstable API interfaces. More bluntly put, candlestick bars are merely sampled price data, while tick data represents the market's holographic recording. Candlesticks are "result-oriented" data, reflecting only the price range within a specific time period; whereas tick data is "process-oriented," clearly displaying the complete price fluctuation process from open to close, including intermediate price movements, trading concentration zones, and buy-sell order book dynamics.

For quantitative hedge funds, data source selection for backtesting systems is not simply about "choosing an API"—it determines the credibility of strategy validation and the migration cost from research to production. The switching cost of data sources is extremely high—once strategies are deeply coupled with field definitions from a particular data source, migration means weeks or even months of refactoring work.

This article approaches historical data API selection from a quantitative development perspective, decomposes evaluation criteria, compares mainstream solutions, and provides complete Python integration code examples using iTick API to help you avoid pitfalls in building backtesting systems.

II. Selection Criteria: Five Core Dimensions for Evaluating Historical Data APIs

2.1 Data Granularity

Different strategies have vastly different requirements for data granularity. Daily-level historical backtesting versus tick-level live high-frequency strategies impose dramatically different demands on API design. A quantitative hedge fund's backtesting system should at least cover the following granularities:

Tick-level data: Trade-by-trade records for high-frequency strategy validation and order flow analysis;
Minute-level candlesticks: Foundation for intraday strategy backtesting;
Daily-level and longer periods: Medium-to-low frequency strategy validation and factor mining.

When selecting, confirm whether the API natively supports tick-level data downloads and the maximum historical lookback period available.

2.2 Data Quality and Completeness

Poor-quality data can be more harmful than no data at all: 0.1% data loss can distort backtest results, causing traders to deploy erroneous strategies in live trading. Data quality issues mainly include:

Gaps and missing data: Self-recorded data often has gaps due to network fluctuations;
Adjustments and corporate actions: Handling historical contract rollovers is cumbersome; inaccurate adjustments for dividends and stock splits can severely distort backtests;
Inconsistent formats: Field definitions vary wildly across different exchanges.

Therefore, choosing API providers with direct exchange connections or official authorization, validated by the market, is crucial.

2.3 Multi-Asset Coverage and Unified Interface

Quantitative hedge funds often involve multiple strategies and asset classes—equity strategies, forex strategies, and commodity futures strategies running in parallel. If each asset class requires a different API, before you finish coding, your collection of API keys already forms a full set, with each interface having different data formats. Just field mapping alone can take days.

Therefore, APIs supporting cross-asset unified access can significantly reduce system complexity. Ideally, equities, forex, futures, and precious metals should be accessible through the same interface specification, with unified field definitions and millisecond-precision timestamps, eliminating the need for extensive data cleansing and alignment code.

2.4 Historical Data Depth and Lookback Period

Backtesting systems have hard requirements for historical data coverage. Daily-level data needs at least 10+ years to cover complete bull-bear cycles, minute-level data should have 3+ years, and tick-level depends on strategy frequency. When selecting,务必 confirm the maximum lookback period supported by the API and the single request quantity limit.

2.5 Protocol and Integrability

In 2026, any financial API that doesn't support WebSocket can basically be excluded from serious trading. For historical data backtesting, REST API is sufficient for batch queries; but for real-time validation and subsequent live deployment, WebSocket's millisecond-level push capability is indispensable. When selecting, pay attention to whether the API provides both REST and WebSocket protocols, and whether it has a comprehensive Python SDK.

III. Horizontal Comparison of Mainstream Equity/Forex Historical Data APIs

The following is a core comparison of mainstream financial data APIs in the market as of 2026 (synthesized from multiple review sources):

API	Data Coverage	Latency	Historical Data	Protocol Support	Use Cases
iTick	US/HK/A-shares/Forex/Crypto/Indices/Futures/Funds, 27,000+ instruments	<50ms (WebSocket)	Up to 15 years daily, Tick support	REST + WebSocket	Multi-asset unified access, quant backtesting
Polygon.io	Primarily US equities, tick-level data	<20ms (WebSocket)	Decades of historical ticks	REST + WebSocket	Pure US equity high-frequency strategies
Tushare	Primarily A-shares, limited HK coverage	200-500ms	Long history, points-based	REST primarily	A-share medium-low frequency research, educational scenarios
Alpha Vantage	Equities/Forex/Cryptocurrency	100-300ms (REST polling)	Long history	REST primarily	AI financial assistants, learning backtesting

IV. Complete Practice: From REST to WebSocket

Professional financial data service providers offer real-time and historical market data covering global mainstream markets, supporting multiple asset types including forex, equities, futures, precious metals, and funds. Their interfaces are primarily RESTful APIs, supplemented by WebSocket real-time streaming. Their free tier is sufficient for basic strategy building, real-time data integration, and historical backtesting.

4.1 Preparation and Authentication

Supports basic real-time quotes and historical candlestick data from minute-level to daily-level across multiple markets including A-shares, HK stocks, and US equities, with a rate limit of 10 requests/minute, very friendly for quantitative teams just getting started. After obtaining your Token, the authentication method in code is as follows:

import requests
import pandas as pd
import json

# Configure authentication information
API_TOKEN = "your_api_token_here"  # Replace with your actual Token
BASE_URL = "https://api.itick.org"
HEADERS = {
    "accept": "application/json",
    "token": API_TOKEN
}

4.2 Forex Historical Data Retrieval

The forex market trades continuously 24 hours, spanning four major trading sessions—Sydney, Tokyo, London, and New York—therefore APIs must have all-weather data retrieval capabilities. The forex API focuses on major currency pairs like EUR/USD and GBP/USD, with the market code fixed as GB.

Real-time trade data (Tick-level):

def get_forex_tick(code):
    """Get real-time forex trade data"""
    url = f"{BASE_URL}/forex/tick?region=GB&code={code}"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        if data.get("code") == 0:
            tick_data = data.get("data", {})
            return {
                "symbol": tick_data.get("s"),
                "latest_price": tick_data.get("ld"),
                "volume": tick_data.get("v"),
                "timestamp": tick_data.get("t")
            }
    return None

# Example: Get EURUSD real-time trade data
eurusd_tick = get_forex_tick("EURUSD")
print(f"EURUSD latest price: {eurusd_tick['latest_price']}")

Historical candlestick data batch retrieval:

def get_forex_kline(code, ktype=1, limit=100):
    """
    Get historical forex candlestick data
    kType: 1-minute, 2-5min, 3-15min, 4-30min,
           5-60min, 6-daily, 7-weekly, 8-monthly
    """
    url = f"{BASE_URL}/forex/kline?region=GB&code={code}&kType={ktype}&limit={limit}"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        if data.get("code") == 0:
            kline_data = data.get("data", [])
            # Convert to DataFrame for analysis
            df = pd.DataFrame(kline_data)
            if not df.empty:
                df.rename(columns={
                    't': 'timestamp', 'o': 'open', 'h': 'high',
                    'l': 'low', 'c': 'close', 'v': 'volume'
                }, inplace=True)
                df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
                return df
    return pd.DataFrame()

# Example: Get the latest 200 daily bars for EURUSD
eurusd_df = get_forex_kline("EURUSD", ktype=6, limit=200)
print(eurusd_df.head())

Code highlights: The ld field in the response is the latest price, t is a millisecond-precision timestamp, and v is volume. Each candlestick item includes open (o), high (h), low (l), and close (c) prices.

4.3 Equity Historical Data Retrieval

Supports multiple global exchanges, including US (US equities), HK (HK stocks), SH/SZ (A-shares), etc.

Get single stock real-time quote:

def get_stock_quote(region, code):
    """Get real-time stock quote"""
    url = f"{BASE_URL}/stock/quote?region={region}&code={code}"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        if data.get("code") == 0:
            quote = data.get("data", {})
            return {
                "symbol": quote.get("s"),
                "latest_price": quote.get("ld"),
                "open": quote.get("o"),
                "high": quote.get("h"),
                "low": quote.get("l"),
                "prev_close": quote.get("p"),
                "volume": quote.get("v"),
                "timestamp": quote.get("t")
            }
    return None

# Example: Get Apple Inc. (US equity) real-time quote
aapl_quote = get_stock_quote("US", "AAPL")
if aapl_quote:
    print(f"AAPL latest price: {aapl_quote['latest_price']}, Change: {(aapl_quote['latest_price'] - aapl_quote['prev_close'])/aapl_quote['prev_close']*100:.2f}%")

Batch historical candlestick data retrieval (core backtesting logic):

def get_stock_kline(region, code, ktype=1, limit=500, end_timestamp=None):
    """
    Get historical stock candlestick data
    region: US/HK/SH/SZ
    code: Stock ticker symbol
    ktype: 1-minute, 2-5min, 3-15min, 4-30min,
           5-60min, 8-daily, 9-weekly, 10-monthly
    """
    url = f"{BASE_URL}/stock/kline"
    params = {
        "region": region,
        "code": code,
        "kType": ktype,
        "limit": limit
    }
    if end_timestamp:
        params["et"] = end_timestamp

    response = requests.get(url, headers=HEADERS, params=params)
    if response.status_code == 200:
        data = response.json()
        if data.get("code") == 0:
            kline_data = data.get("data", [])
            df = pd.DataFrame(kline_data)
            if not df.empty:
                df.rename(columns={
                    't': 'timestamp', 'o': 'open', 'h': 'high',
                    'l': 'low', 'c': 'close', 'v': 'volume'
                }, inplace=True)
                df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
                return df
    return pd.DataFrame()

def batch_get_multi_stock_kline(stock_list, ktype=6, limit=500):
    """Batch retrieve historical candlestick data for multiple stocks"""
    results = {}
    for region, code in stock_list:
        df = get_stock_kline(region, code, ktype, limit)
        if not df.empty:
            results[f"{region}:{code}"] = df
    return results

# Example: Batch retrieve historical daily data for multiple stocks for backtesting
stocks = [("US", "AAPL"), ("US", "MSFT"), ("HK", "00700")]
historical_data = batch_get_multi_stock_kline(stocks, ktype=6, limit=1000)
for symbol, df in historical_data.items():
    print(f"{symbol}: {len(df)} daily bars")

Code highlights: The kType parameter supports various timeframes from 1 (minute bars) to 10 (monthly bars), limit controls the number of returned records, and et optionally specifies an end timestamp (milliseconds).

4.4 WebSocket Real-Time Streaming (High-Precision Backtest Validation)

For strategies requiring high-frequency data validation, WebSocket is essential. WebSocket supports subscribing to quote, depth (order book depth), and tick (trade) data types. After establishing a WebSocket connection, data is actively pushed by the server, which not only improves retrieval efficiency compared to HTTP polling but also significantly reduces server request pressure.

import websocket
import json
import threading
import time

WS_URL = "wss://api.itick.org/stock"  # Stock WebSocket address
API_TOKEN = "your_api_token_here"

def on_message(ws, message):
    """Callback for receiving pushed messages"""
    data = json.loads(message)

    # Handle successful connection response
    if data.get("code") == 1 and data.get("msg") == "Connected Successfully":
        print("WebSocket connection successful")
        # Authenticate
        auth_msg = {
            "cmd": "auth",
            "token": API_TOKEN
        }
        ws.send(json.dumps(auth_msg))

    # Handle successful authentication
    elif data.get("resAc") == "auth" and data.get("code") == 1:
        print("Authentication successful")
        subscribe_data(ws)

    # Handle market data push
    elif data.get("data"):
        market_data = data["data"]
        data_type = market_data.get("type")
        symbol = market_data.get("s")
        print(f"{data_type.upper()} data [{symbol}]: {market_data}")

def subscribe_data(ws):
    """Subscribe to multi-asset real-time quotes"""
    subscribe_msg = {
        "cmd": "subscribe",
        "data": {
            "symbol_list": [
                {"region": "US", "code": "AAPL"},      # Apple stock
                {"region": "GB", "code": "EURUSD"},    # EUR/USD forex
                {"region": "US", "code": "MSFT"}       # Microsoft stock
            ],
            "type": ["quote", "tick"]                  # Subscribe to quotes and trade data
        }
    }
    ws.send(json.dumps(subscribe_msg))
    print("Subscribed to multi-asset real-time data")

def on_error(ws, error):
    print(f"WebSocket error: {error}")

def on_close(ws, close_status_code, close_msg):
    print("WebSocket connection closed")

def on_open(ws):
    print("WebSocket connection opened")

# Start WebSocket connection
ws = websocket.WebSocketApp(
    WS_URL,
    on_open=on_open,
    on_message=on_message,
    on_error=on_error,
    on_close=on_close
)

# Run WebSocket in background thread
wst = threading.Thread(target=ws.run_forever)
wst.daemon = True
wst.start()

# Keep connection running for a period
time.sleep(60)

4.5 Batch Operation Interface (Multi-Asset Data Synchronization for Backtesting)

In backtesting scenarios, requesting historical data for multiple instruments in a single request can significantly improve efficiency.

def batch_get_forex_klines(codes, ktype=6, limit=200):
    """
    Batch retrieve historical candlestick data for multiple forex currency pairs
    codes: Currency pair list, e.g., ["EURUSD", "GBPUSD", "USDJPY"]
    """
    codes_str = ",".join(codes)
    url = f"{BASE_URL}/forex/klines?region=GB&codes={codes_str}&kType={ktype}&limit={limit}"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        if data.get("code") == 0:
            batch_data = data.get("data", {})
            results = {}
            for code, kline_list in batch_data.items():
                if kline_list:
                    df = pd.DataFrame(kline_list)
                    df.rename(columns={
                        't': 'timestamp', 'o': 'open', 'h': 'high',
                        'l': 'low', 'c': 'close', 'v': 'volume'
                    }, inplace=True)
                    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
                    results[code] = df
            return results
    return {}

# Example: Batch retrieve historical daily data for multiple forex pairs
forex_pairs = ["EURUSD", "GBPUSD", "USDJPY", "AUDUSD"]
forex_data = batch_get_forex_klines(forex_pairs, ktype=6, limit=500)
for code, df in forex_data.items():
    print(f"{code}: {len(df)} records")

4.6 Simplify Integration Using Official Python SDK (itick-sdk)

iTick officially provides a fully-featured Python SDK that encapsulates underlying logic such as authentication, request retry, and data parsing, allowing developers to focus more on the strategy itself. Compared to directly calling REST APIs, the SDK provides simpler interfaces, automated connection management, and built-in WebSocket support.

Install SDK

pip install itick-sdk

Or clone from the official GitHub repository and install from source:

git clone https://github.com/itick-org/itick-sdk-python.git
cd itick-sdk-python
pip install -e .

Initialize Client

from itick.sdk import Client

# Initialize client with your API Token
token = "your_api_token_here"
client = Client(token)

Retrieve Forex Data

# Get forex real-time trade (market code fixed as GB)
tick = client.get_forex_tick("GB", "EURUSD")
print("Forex Tick:", tick)

# Get forex historical candlestick (ktype: 2=5min bars, limit=10)
kline = client.get_forex_kline("GB", "EURUSD", 2, 10)
print("Forex Kline:", kline)

Retrieve Equity Data

# Get stock real-time quote (region: US/HK/SH/SZ)
quote = client.get_stock_quote("US", "AAPL")
print("Stock Quote:", quote)

# Get stock historical candlestick
kline = client.get_stock_kline("US", "AAPL", 2, 10)
print("Stock Kline:", kline)

WebSocket Real-Time Subscription (SDK Encapsulated Version)

The SDK fully encapsulates WebSocket connections, with built-in automatic reconnection and heartbeat keepalive mechanisms, eliminating the need to manually handle low-level protocols.

# Set message and error handlers
def on_message(message):
    print(f"Received WebSocket message: {message}")

def on_error(error):
    print(f"WebSocket error: {error}")

client.set_message_handler(on_message)
client.set_error_handler(on_error)

# Connect to WebSocket and subscribe to data
client.connect_forex_websocket()          # Forex WebSocket
client.send_websocket_message('{"action": "subscribe", "codes": ["EURUSD"]}')

# Check connection status
if client.is_websocket_connected():
    print("WebSocket is connected")

# Close connection when program ends
# client.close_websocket()

V. Data Alignment and Cleansing in Backtesting System Architecture

After retrieving historical data for multiple assets, the next challenge is data alignment. Different markets have different trading sessions: A-shares 9:30-15:00, HK stocks 9:30-16:00, US equities 21:30-4:00 next day (daylight saving time), forex 24-hour continuous trading. In cross-asset backtesting, timestamps must be uniformly aligned.

Here is a general multi-asset data alignment example:

def align_multi_asset_data(dataframes, freq="1H"):
    """
    Align timestamps across multi-asset data, unify sampling frequency
    dataframes: dict {asset_name: df}, each df must contain timestamp column
    """
    from functools import reduce

    # Ensure all DataFrames have datetime index
    aligned_dfs = {}
    for name, df in dataframes.items():
        df_copy = df.copy()
        df_copy['timestamp'] = pd.to_datetime(df_copy['timestamp'])
        df_copy.set_index('timestamp', inplace=True)
        # Resample to unified frequency
        df_resampled = df_copy[['close']].resample(freq).last()
        df_resampled.ffill(inplace=True)  # Forward fill missing values
        aligned_dfs[name] = df_resampled

    # Merge all assets
    merged = pd.DataFrame()
    for name, df in aligned_dfs.items():
        merged[f"{name}_close"] = df['close']

    # Remove rows that are entirely empty
    merged.dropna(how='all', inplace=True)
    return merged

# Usage example: Align EUR/USD forex and Apple stock data
data_dict = {
    "EURUSD": eurusd_df,
    "AAPL": aapl_df
}
aligned = align_multi_asset_data(data_dict, freq="1H")
print(aligned.head())

Tick data cleansing highlights: Tick data, due to its fine-grained recording dimension, can easily reach millions of records per day, making it highly susceptible to various quality issues. Key considerations in backtesting:

Deduplication: Multiple duplicate records with the same timestamp need to be removed;
Outlier filtering: Ticks with prices outside reasonable ranges should be discarded;
Temporal alignment: Timestamp precision may vary across different data sources and needs to be unified to the same precision (millisecond-level).

VI. Selection Decision Guide: Key Questions from Requirements to Implementation

Before starting the selection process, quantitative teams should answer the following four questions:

1. What data granularity does the strategy require?

Daily-level strategies only → Most API daily data is usable, prioritize coverage and cost;
Intraday strategies (minute-level) → Need to support both minute-level candlestick retrieval and low-latency real-time streaming;
High-frequency/order flow strategies → Must natively support tick-level data, WebSocket push latency < 50ms, market depth at least 5-10 levels.

2. Which asset classes are involved?

Single market only (e.g., pure A-shares) → Consider specialized A-share data sources like Tushare;
Cross-market multi-asset (A-shares + HK stocks + US equities + forex) → Prioritize multi-asset APIs providing unified interfaces to avoid adaptation costs from stitching multiple data sources.

3. What are the requirements for historical data coverage period?

Long-period backtesting (10+ years daily) → Confirm whether the API provides ultra-long historical data archives;
Short-period strategy validation (2-3 years minute-level) → Free tiers of most commercial APIs are sufficient.

4. What are the team's technical capabilities and budget?

Individual/small team starting out → Free tier APIs (such as iTick free tier, Alpha Vantage free tier) are sufficient for prototype validation;
Institutional production environment → Require paid commercial APIs, focusing on SLA guarantees, multi-node deployment, data redundancy, and other enterprise-grade capabilities.

If the team needs to run strategies across different exchanges and asset classes, using a single API to cover multiple markets is currently the most cost-effective solution—a unified interface multi-asset approach can reduce data integration module development workload by over 60%, significantly shortening the iteration cycle from strategy research to backtesting.

VII. Summary and Recommendations

For quantitative hedge funds, the value of a backtesting system lies not in how beautiful its UI is, but in how faithfully it reflects actual market conditions. The data interface is the most fundamental component of this mirror. A solid selection path recommendation is as follows:

Phase	Core Tasks	Recommended Solutions
Early R&D	Strategy prototype validation, historical backtesting	Use free tiers like iTick/Tushare to quickly validate strategy logic
Factor Mining Phase	Large-scale historical data retrieval	Upgrade to paid plans, leverage batch APIs and WebSocket to accelerate factor computation
Live/Simulated Trading	Real-time market monitoring, signal generation	Switch to WebSocket real-time streaming, ensure data latency < 50ms
Multi-Strategy Parallel	Multi-asset unified access and monitoring	Consider unified API architecture to cover all asset types with a single interface

There is no "best" API for data selection, only the one most "suitable" for your strategy and data requirements. As the industry saying goes: Don't waste time on low-value labor like data cleansing—mature API services can smooth out differences across markets. Whether US equities or forex, you receive uniformly formatted data, which provides great convenience for cross-market strategy migration. For quantitative hedge funds, choosing stable, standardized data interfaces is the first step from "data-driven" to "strategy-driven."

Reference Documentation: https://docs.itick.org/sdk/python-sdk
GitHub: https://github.com/itick-org/

DEV Community