How to Build a Multi-Market Unified Financial Data API: A Step-by-Step Practical Guide

#python #tutorial

Introduction: Why a Single Market API Can't Solve Everything?

Suppose you are developing a quantitative trading system that needs to monitor A-shares, Hong Kong stocks, U.S. stocks, and cryptocurrencies simultaneously. Your code might look something like this:

# The messy U.S. stock part
response = requests.get("https://us-market.com/quote?symbol=AAPL")
price_us = response.json()["last"]

# The distorted Hong Kong stock data
response = requests.get("https://hk-market.com/api/stock_detail?code=00700")
price_hk = response.json()["current_price"]  # Field names are completely different

# The weird A-share errors
response = requests.get("http://a-share.cn/query?ticker=600036")
if response.status_code == 429:
    time.sleep(5)  # Why 5 seconds? Nobody knows
    retry()

Each market uses a different API, different field names, different error codes, and different rate-limiting policies. The codebase balloons to three times its size, and maintenance costs grow exponentially. This is in the best-case scenario—when one market's API goes down entirely, your entire strategy collapses with it.

This is not an isolated case but a widespread pain point that the industry has faced for a long time. Fragmentation is the most deeply felt word for every multi-market quantitative developer.

Industry Pain Points: Overlooked Data Silos

In the global financial markets, developers never lack data—they lack integration capabilities. Data sources across markets use different data structures, timestamp precisions, and symbol naming conventions. This directly leads to backtesting distortions, model signal biases, and a series of systemic risks.

Specifically, fragmentation creates three typical problems that torment developers every day:

1. Inconsistent data formats make Schema adaptation maddening. Each API has its own design philosophy: U.S. stocks use the field symbol, while Hong Kong stocks might use code; A-shares call the price field close, but Hong Kong stocks call it current_price. Even a simple real-time quote query can fill an entire file with format adaptations for different markets.

2. Error handling logic is difficult to unify. A 429 from one market's API means "rate limit," while the same code from another API might mean "account expired." Some WebSocket connections auto-reconnect after disconnection, while others silently fail without any indication. Different error codes require different handling strategies, resulting in code full of conditional branches and extremely low maintainability.

3. Connection management complexity spirals out of control. WebSocket keep-alives, HTTP rate-limit controls, exponential backoff retries—these infrastructure-level issues should be generic, but each data source requires separate implementation. This leads to repeated code and easy omissions.

The Solution: Unified Interface + Standard Data Model

Solving the fragmentation problem with a single data source is not feasible. The truly effective approach is to introduce a middleware architecture—an access layer with a unified multi-market data structure that ensures every piece of data flowing into the database has a completely consistent Schema.

In terms of architectural design, the core value of a unified financial data API is to converge the access logic of multiple markets into a single unified entry point. Standardized symbols: Instrument identifiers from different markets are normalized into a unified naming system. Unified data structures: Core data types such as real-time quotes, K-lines, and order books adopt a consistent JSON schema. Built-in connection management: WebSocket heartbeats, rate-limit controls, exponential backoff retries, and other underlying mechanisms are handled uniformly by the platform, so developers do not need to reinvent the wheel.

From the evolution of modern financial systems, "APIs are no longer just access points but integration layers within financial systems." This means replacing independent ETL jobs for different markets with a single unified data access layer to normalize symbols, timestamps, and quote formats is the superior engineering choice.

The power of this architecture becomes even clearer when expressed in code. Assuming you use a cross-market unified API, fetching real-time quotes for A-shares, Hong Kong stocks, and U.S. stocks becomes extremely simple:

import requests

# The same set of headers configuration; one token works across all markets
API_TOKEN = "your_api_token_here"
headers = {
    "accept": "application/json",
    "token": API_TOKEN
}
BASE_URL = "https://api.itick.org"

# Get U.S. stock real-time quote
url = f"{BASE_URL}/stock/quote?region=US&code=AAPL"
response = requests.get(url, headers=headers)
if response.status_code == 200:
    quote_us = response.json().get("data", {})
    print(f"Apple latest price: {quote_us.get('ld')}")

# Get Hong Kong stock real-time quote — completely identical interface, just change the region parameter
url = f"{BASE_URL}/stock/quote?region=HK&code=00700"
response = requests.get(url, headers=headers)
quote_hk = response.json().get("data", {})

# Get A-share real-time quote
url = f"{BASE_URL}/stock/quote?region=SH&code=600036"
response = requests.get(url, headers=headers)
quote_cn = response.json().get("data", {})

# The returned data structures for all three markets are completely consistent; the data field contains core fields such as ld (latest price), v (volume), etc.

In the traditional approach, three markets required three completely different sets of code. Now, only one set is needed. This unification not only reduces code volume but also fundamentally lowers architectural complexity. A single data pipeline can handle market data from different sources without distinction. Strategy code no longer needs to care about the data source—only about business logic.

From "Usable" to "Well-Usable": Key Design Elements of Production-Grade Architecture

A truly production-ready multi-market unified data platform must address the following key issues at the underlying architecture level, beyond API-level unification.

3.1 Connection Keep-Alive: WebSocket Must Not "Silently Die"

WebSocket long connections are the core method for real-time quote access, but their biggest problem is "appearing alive while dead"—the connection looks open, but no data is actually being received. In quantitative trading scenarios, once a silent disconnection occurs, the strategy may make decisions based on stale data for a period of time, with disastrous consequences.

Solving this requires implementing an application-layer heartbeat mechanism on the client side. After the WebSocket connection is established, the client periodically sends Ping frames, and the server replies with Pong frames. If no Pong is received within the timeout window, the connection is immediately deemed abnormal and reconnection is triggered. At the same time, the heartbeat interval should not be fixed but dynamically adjusted according to network conditions—relax the frequency during stable periods to reduce bandwidth overhead, and tighten it during volatile periods to detect faults quickly.

3.2 Intelligent Retry: Exponential Backoff + Jitter

Rate limiting is another common but often overlooked issue. When the API server returns HTTP 429 (Too Many Requests), it indicates that the request frequency has exceeded the limit. If the client continues to retry at fixed intervals, it will only exacerbate the conflict and may even lead to IP bans.

The correct approach is to implement an exponential backoff + jitter retry strategy:

import time
import random
import requests

def fetch_with_retry(url, max_retries=5):
    for i in range(max_retries):
        response = requests.get(url)
        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            # Get suggested wait time from response headers
            wait_time = response.headers.get("Retry-After")
            if wait_time is None:
                # Exponential backoff + random jitter
                wait_time = min(2 ** i + random.uniform(0, 1), 60)
            time.sleep(wait_time)
            continue

    raise Exception("Max retries exceeded")

The key is: prioritize the wait time specified in the service's Retry-After header; if absent, use exponential backoff, where retry intervals grow exponentially, and add random jitter to avoid "retry storms"—where many clients retry simultaneously, causing secondary impact.

3.3 WebSocket Real-Time Subscription: The Key Channel for Reducing Latency

In addition to the above fault-tolerance mechanisms, another core capability of a unified financial data API is WebSocket real-time push. Compared to HTTP polling, WebSocket uses a persistent full-duplex channel that allows the server to proactively push quote updates to the client in real time, eliminating the overhead of repeated requests. It is the preferred solution for high-frequency strategies to obtain tick-by-tick trades and order book depth.

Here is a standard WebSocket real-time subscription example, demonstrating the subscription and reception logic for cross-market quotes:

import websocket
import json
import threading

API_KEY = "your_api_key_here"
ws_url = "wss://api.itick.org/stock"

# Authentication message (sent after connection succeeds)
auth_message = {
    "ac": "auth",
    "params": API_KEY
}

# Subscription message: params field format is "code$market", multiple separated by commas
# types specifies subscription types: depth (order book depth), quote (real-time quote)
subscribe_message = {
    "ac": "subscribe",
    "params": "AAPL$US,00700$HK,600036$SH",
    "types": "depth,quote"
}

def on_message(ws, message):
    data = json.loads(message)
    # Unified push format; data contains fields such as s (code), ld (latest price), t (timestamp), v (volume), etc.
    print(f"Received push: {data.get('data', {})}")

def on_error(ws, error):
    print(f"WebSocket error: {error}")

def on_close(ws, close_status_code, close_msg):
    print("Connection closed")

def on_open(ws):
    # Send authentication first after connection succeeds
    ws.send(json.dumps(auth_message))
    # Then send subscription request
    ws.send(json.dumps(subscribe_message))

if __name__ == "__main__":
    ws = websocket.WebSocketApp(
        ws_url,
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )
    ws.run_forever()

Through this design, HTTP snapshot retrieval and WebSocket real-time push complement each other: the former is suitable for timed batch pulls of historical data, while the latter handles real-time quote subscriptions. Working together under a unified data structure, they cover the full data requirements from backtesting to live trading.

Practical Scenarios: Applications of a Multi-Market Unified Data API

Scenario 1: Cross-Market Factor Calculation

The core of multi-factor strategies is extracting unified features from different markets. With a unified interface, factor calculation becomes exceptionally simple:

import requests
import pandas as pd

API_TOKEN = "your_api_token_here"
headers = {"accept": "application/json", "token": API_TOKEN}
BASE_URL = "https://api.itick.org"

def compute_momentum(symbols, market, lookback=20):
    # Use the same API to fetch historical K-lines for symbols from different markets
    # kType: 1-10, corresponding to minute to monthly K-lines (1=1min, 2=5min, 3=10min, 4=30min, 5=1h, 6=2h, 7=4h, 8=1day, 9=1week, 10=1month)
    results = {}
    for symbol in symbols:
        url = f"{BASE_URL}/stock/kline?region={market}&symbol={symbol}&kType=8&limit={lookback + 1}"
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            data = response.json().get("data", [])
            df = pd.DataFrame(data)
            # Each item in the data array contains fields such as o (open), h (high), l (low), c (close), etc.
            if len(df) > lookback:
                df["momentum"] = df["c"] / df["c"].shift(lookback) - 1
            results[symbol] = df
    return results

# Get 20-day momentum for Kweichow Moutai (A-share)
momentum = compute_momentum(["600036"], "SH", lookback=20)

Scenario 2: Cross-Market Arbitrage Monitoring

Arbitrage strategies have extremely high requirements for data synchronization. The unified interface ensures that quote data from different markets uses the same time benchmark, avoiding false arbitrage signals caused by timestamp alignment issues:

import requests

API_TOKEN = "your_api_token_here"
headers = {"accept": "application/json", "token": API_TOKEN}
BASE_URL = "https://api.itick.org"

def monitor_arbitrage(us_symbol, hk_symbol, fx_rate=7.8, threshold=0.01):
    # Fetch real-time quotes from the two markets separately
    us_url = f"{BASE_URL}/stock/quote?region=US&code={us_symbol}"
    hk_url = f"{BASE_URL}/stock/quote?region=HK&code={hk_symbol}"

    us_response = requests.get(us_url, headers=headers)
    hk_response = requests.get(hk_url, headers=headers)

    if us_response.status_code == 200 and hk_response.status_code == 200:
        us_price = us_response.json().get("data", {}).get("ld")  # ld = latest deal price
        hk_price = hk_response.json().get("data", {}).get("ld")

        if us_price and hk_price:
            spread = us_price * fx_rate - hk_price
            if abs(spread) > threshold:
                print(f"Arbitrage opportunity! Spread between {us_symbol} and {hk_symbol}: {spread:.4f}")
                return spread
    return None

# Monitor the price spread between Apple (AAPL) and a Hong Kong stock
monitor_arbitrage("AAPL", "00700")

Data Source Selection Reference

Various data service providers in the market offer multi-market coverage with different emphases. For quantitative teams, the following core dimensions should be focused on during selection:

Data coverage breadth: Does it simultaneously cover mainstream markets such as A-shares, Hong Kong stocks, U.S. stocks, futures, and forex?
Data granularity and precision: Does it support tick-level tick-by-tick data? How long is the historical data backfill period?
Interface specification uniformity: Are the return structures consistent across different markets? Is the documentation complete?
Real-time assurance: Can the latency meet strategy requirements? Does it provide both REST and WebSocket access methods?
Connection management capabilities: Does it include built-in production-grade features such as WebSocket keep-alives, intelligent reconnection, and rate-limit handling?

From industry trends, the global financial data API market is expanding at a compound annual growth rate of 6.09%, with the market size expected to reach 61,905.9 billion USD by 2026. At the same time, financial market data consumption is shifting from traditional post-close reporting models to real-time analysis and AI-driven intelligence. This means that reliable, unified, low-latency multi-market data infrastructure is no longer a "nice-to-have" for quantitative trading, but a "must-have."

Conclusion: Unification Is the Trend, but Implementation Methods Can Vary

From an architectural design perspective, establishing a unified multi-market data access layer has become an irreversible trend in quantitative development. Whether choosing a mature full-featured data service provider or self-developing a middleware layer that meets business needs, the core objective remains the same: free developers from the tedious work of adaptation and allow them to focus on strategy logic itself.

I have always believed that good technical architecture is unobtrusive. It should silently handle all the dirty and heavy work in the background, enabling developers to concentrate on what truly creates value—designing better strategies and discovering more market opportunities.

Reference documentation: https://blog.itick.org/market-data/free-stock-price-api-comparison

GitHub: https://github.com/itick-org/