San Si wu

Posted on Mar 21

Practical Guide to A-share Level-2 Market Data API

#tutorial #python #api

Millisecond-level latency, ten-level order book, tick-by-tick transactions – an in-depth interpretation of the technical architecture and access practices of Level-2 market data

In the field of quantitative trading, data is the soul of strategies. If Level-1 market data lets you see the "surface" of the market, then Level-2 market data allows you to insight into the "pulse" of the market – the game of every order, the depth of each price level, and the active direction of every transaction. This article systematically introduces the technical characteristics, access solutions, and practical applications of the A-share Level-2 market data API, helping developers build a solid data foundation on the path of quantitative trading.

I. Level-2 Market Data: A Microscopic Perspective Beyond Traditional Market Data

1.1 What is Level-2 Market Data?

Level-2 market data is the most complete and fine-grained trading information data in China's domestic securities market to date. Compared with traditional Level-1 market data (which displays the best five bid and ask price levels), Level-2 provides richer data dimensions, mainly including the following categories:

Order Book Snapshot (updated every 3 seconds): Provides ten levels of bid and ask order books, revealing market depth and allowing investors to see the full order prices and quantities from bid 1 to bid 10 and ask 1 to ask 10.
Order Queue (updated every 3 seconds): Displays the first 50 order details at the bid 1 and ask 1 positions, enabling observation of whether large orders are concentrated at the head of the queue to identify potential support or resistance levels.
Tick-by-Tick Transactions (real-time, millisecond-level): Records the price, volume, transaction time, and active trading direction of each transaction, serving as core data for analyzing the movement of major capital flows.
Tick-by-Tick Orders (real-time, millisecond-level): Records the order placement and cancellation information of each order, enabling tracking of the intention behind capital order placement and order cancellation behaviors.
Minute K-line (1-minute interval): Advanced K-line data containing the number of transactions, providing more information on trading activity than ordinary K-lines.

In terms of data volume, the daily increment of A-share Level-2 market data is approximately 30-45GB, and historical data can reach the 10TB level. This means that processing Level-2 data requires not only an efficient API access solution but also powerful data storage and computing capabilities.

1.2 Core Application Scenarios of Level-2 Data

Level-2 data has a wide range of applications in quantitative trading:

Order Flow Analysis: Identify the active buying and selling directions of major capital through tick-by-tick transaction data to judge the balance of buying and selling power.
Market Microstructure Research: Analyze indicators such as bid-ask spread, order book slope, and order imbalance to understand the dynamic game process of the market.
High-Frequency Signal Triggering: Generate trading signals within milliseconds based on order book changes to capture short-term trading opportunities.
Algorithmic Trading Execution: Optimize order-splitting strategies based on ten-level order book information to reduce the market impact cost of large orders.
Intraday T+0 Trading: Capture spread opportunities using order book depth information to improve capital utilization efficiency.

II. Selection Guide for Level-2 Market Data API

2.1 Mainstream Data Sources

Based on market research, mainstream Level-2 data sources can be categorized into several types, each with distinct characteristics:
Brokerage Official API: Latency is usually within 100 milliseconds, covering the entire market. The use of such APIs typically requires opening an account with the brokerage and meeting certain asset thresholds, making them suitable for live trading scenarios.

iFinD (Flush Information): Latency ranges from 100 to 200 milliseconds, covering full-market A-share and Hong Kong stock data. Adopting an annual fee model, it offers stable data quality and rich functions, suitable for professional investment institutions.

Tushare Pro: Latency is approximately 500 milliseconds, covering A-share and Hong Kong stock markets. Adopting a pay-per-call pricing model with free quotas for developers to trial, it is the preferred entry choice for individual developers.

iTick API: Provides millisecond-level real-time data and supports multi-market access. It offers free packages, suitable for quantitative researchers to develop and test strategies.

For individual developers, Tushare Pro and iTick are cost-effective options; for institutional users, professional service providers such as Flush Information and Wind provide more comprehensive data support and compliance guarantees.

2.2 Data Access Methods

Level-2 data access mainly adopts two protocols:
RESTful API is suitable for historical data query and low-frequency data acquisition. Developers obtain market data for specific time ranges and specific stocks through HTTP requests, which is simple to operate and easy to integrate. For example, to obtain historical minute K-line data, it is only necessary to construct a GET request containing the stock code, time cycle, and start/end dates.

WebSocket Connection is suitable for real-time market data push. By establishing a long connection, the server can actively push real-time data to the client with extremely low latency. When using WebSocket, identity authentication is usually required first, followed by subscribing to interested stock codes, after which real-time data such as tick-by-tick transactions and order book snapshots can be received continuously.

2.3 Data Differences Between Shanghai and Shenzhen Stock Exchanges

It is important to note that the Level-2 data structures of the Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) have differences, and developers need to implement differentiated processing when designing data processing logic:

SZSE: Tick-by-tick transaction data includes complete order placement and cancellation information. When encountering market orders, their prices are marked as 0, requiring special handling during data processing.
SSE: Market order information is only stored in the tick-by-tick transaction table, with no market order records in the tick-by-tick order table. This means that analyzing only tick-by-tick order data may lead to missing market order information.

III. Practical Practices for Level-2 Data Storage and Processing

3.1 Storage Challenges of Massive Data

Level-2 data adds more than 30GB per day, and traditional relational databases struggle to cope. Production environments typically adopt a combined solution of time-series databases and columnar storage.

Taking DolphinDB as an example, query performance can be optimized through partitioned storage. The partitioning strategy usually adopts a combination of "partitioning by trading date + hash partitioning by stock code". Partitioning by date facilitates archiving and cleaning historical data, while hash partitioning by stock code enables parallel query of multiple stocks, improving computing efficiency.

When creating an order book snapshot table, the trading date and stock code can be used as partition keys, and data can be sorted by stock code and trading time. This design offers multiple advantages: multiple partitions can participate in calculations simultaneously to achieve parallel queries; columnar storage significantly reduces storage costs and improves compression rates; precise positioning of data by stock and time achieves high filtering efficiency.

3.2 Downsampling from High-Frequency to Low-Frequency Data

Quantitative strategies often need to run on different time scales. Generating minute-level data from Level-2 high-frequency data is a typical ETL process. In specific implementation, the time field of tick-by-tick transaction data can first be converted to Datetime type and set as an index, then resampled by minute using resampling functions. During resampling, it is necessary to separately calculate the opening price, highest price, lowest price, closing price, total trading volume, and total number of transactions for each minute interval, and finally merge these results into a complete minute K-line dataframe.

3.3 Example of Factor Calculation Based on Level-2 Data

The order book imbalance factor, a commonly used microstructure indicator, can be calculated based on order book snapshots. The calculation logic is as follows: first calculate the total bid volume and total ask volume for the first ten price levels respectively, then calculate the imbalance rate as the difference between bid volume and ask volume divided by their sum. When bid volume is significantly greater than ask volume, the imbalance rate is positive, indicating a potential price increase; conversely, it is negative, indicating a potential price decrease.

Furthermore, the weighted imbalance rate can be calculated by assigning higher weights to price levels closer to the order book. For example, set decreasing weight coefficients in the order of price levels, calculate the local imbalance rate for the bid and ask volume of each level, multiply by the weight of that level, and finally normalize the weighted results of all levels. This weighting method can more sensitively reflect changes in order pressure at the nearest price levels of the order book.

IV. Practical Combat: Building a Real-Time Quantitative Trading System

4.1 System Architecture Design

A complete Level-2 quantitative trading system usually adopts a microservice architecture, mainly including the following layers:
Data Collection Layer: Receives Level-2 real-time market data through WebSocket connections, pushes raw data to the Kafka message queue, and then the data cleaning service performs exception filtering and format standardization.

Computing Engine Layer: This layer contains multiple parallel computing services. The factor calculation service is responsible for real-time calculation of microstructure indicators such as order book imbalance and capital flow direction; the signal generation service generates buy/sell signals based on these factors; the risk monitoring service real-time evaluates strategy exposure and account risks.

Execution Layer: After receiving trading instructions from the signal layer, the order management service generates specific orders, which are submitted to the exchange through the brokerage trading API after passing the risk control interceptor check.

4.2 Code Examples for Real-Time Market Data Access

The following provides complete access examples for iTick API, including both REST API and WebSocket methods, covering markets such as A-shares, Hong Kong stocks, and US stocks.

Environment Preparation

First, install the necessary Python libraries:

pip install requests websocket-client pandas

REST API Example: Suitable for Historical Data Query and Low-Frequency Acquisition

Obtain Real-Time A-share Quotes

import requests
import json

# Configuration Information
API_TOKEN = "your_token_here"  # Replace with your actual Token
BASE_URL = "https://api.itick.org"

headers = {
    "accept": "application/json",
    "token": API_TOKEN
}

def get_stock_quote(region, code):
    """
    Obtain real-time stock quotes
    region: SH (Shanghai), SZ (Shenzhen), HK (Hong Kong), US (US Stocks)
    code: Stock code, e.g., 600519, 000001, 700, AAPL
    """
    url = f"{BASE_URL}/stock/quote"
    params = {
        "region": region,
        "code": code
    }

    try:
        response = requests.get(url, params=params, headers=headers)
        response.raise_for_status()
        data = response.json()

        if data.get("code") == 0:
            quote = data["data"]
            print(f"Stock: {quote['s']}")
            print(f"Last Price: {quote['ld']}")
            print(f"Change Percentage: {quote['chp']}%")
            print(f"Volume: {quote['v']}")
            print(f"Highest Price: {quote['h']}")
            print(f"Lowest Price: {quote['l']}")
            return quote
        else:
            print(f"API Error: {data.get('msg')}")
            return None
    except Exception as e:
        print(f"Request Failed: {str(e)}")
        return None

# Call Example: Obtain real-time market data for Kweichow Moutai
get_stock_quote("SH", "600519")

# Obtain real-time market data for Tencent Holdings
get_stock_quote("HK", "700")

# Obtain real-time market data for Apple Inc.
get_stock_quote("US", "AAPL")

Obtain Historical K-line Data

def get_stock_kline(region, code, ktype, limit=100):
    """
    Obtain historical K-line data
    ktype: 1-1min, 2-5min, 3-15min, 4-30min, 5-60min, 8-Daily K, 9-Weekly K, 10-Monthly K
    limit: Number of K-line bars to return
    """
    url = f"{BASE_URL}/stock/kline"
    params = {
        "region": region,
        "code": code,
        "kType": ktype,
        "limit": limit
    }

    try:
        response = requests.get(url, params=params, headers=headers)
        response.raise_for_status()
        data = response.json()

        if data.get("code") == 0:
            klines = data["data"]
            print(f"Obtained {len(klines)} K-line data entries")

            # Convert to DataFrame for easy analysis
            import pandas as pd
            df = pd.DataFrame(klines)
            df.columns = ["timestamp", "open", "high", "low", "close", "volume", "turnover"]
            df["datetime"] = pd.to_datetime(df["timestamp"], unit="ms")

            return df
        else:
            print(f"API Error: {data.get('msg')}")
            return None
    except Exception as e:
        print(f"Request Failed: {str(e)}")
        return None

# Call Example: Obtain the last 100 daily K-lines for Kweichow Moutai
df = get_stock_kline("SH", "600519", ktype=8, limit=100)
if df is not None:
    print(df.head())

Obtain Level-2 Ten-Level Order Book Data

def get_stock_depth(region, code):
    """Obtain ten-level order book data for stocks"""
    url = f"{BASE_URL}/stock/depth"
    params = {
        "region": region,
        "code": code
    }

    try:
        response = requests.get(url, params=params, headers=headers)
        response.raise_for_status()
        data = response.json()

        if data.get("code") == 0:
            depth = data["data"]
            print("Bid Book (Top 5 Levels):")
            for i, bid in enumerate(depth.get("b", [])[:5]):
                print(f"  Bid {i+1}: Price {bid['p']} Volume {bid['v']}")

            print("Ask Book (Top 5 Levels):")
            for i, ask in enumerate(depth.get("a", [])[:5]):
                print(f"  Ask {i+1}: Price {ask['p']} Volume {ask['v']}")
            return depth
        else:
            print(f"API Error: {data.get('msg')}")
            return None
    except Exception as e:
        print(f"Request Failed: {str(e)}")
        return None

# Call Example
get_stock_depth("SH", "600519")

WebSocket Real-Time Push: Suitable for High-Frequency Trading Scenarios

WebSocket is the optimal way to obtain real-time market data, with latency controllable at the millisecond level. The following code implements complete connection management, heartbeat keep-alive, and automatic reconnection mechanisms.

import websocket
import json
import threading
import time

# WebSocket Configuration
WS_URL = "wss://api.itick.org/stock"  # Stock market; use wss://api.itick.org/future for futures
API_TOKEN = "your_token_here"

# Subscription Configuration
SUBSCRIBE_SYMBOLS = "600519$SH,000001$SZ"  # Multiple symbols separated by commas, format: code$market
DATA_TYPES = "quote,tick,depth"  # Subscription types: quote, tick, depth

def on_message(ws, message):
    """Process received messages"""
    try:
        data = json.loads(message)

        # Connection success prompt
        if data.get("code") == 1 and data.get("msg") == "Connected Successfully":
            print("✅ Connection successful, waiting for authentication...")

        # Authentication result handling
        elif data.get("resAc") == "auth":
            if data.get("code") == 1:
                print("✅ Authentication passed, starting to subscribe to data...")
                subscribe(ws)
            else:
                print(f"❌ Authentication failed: {data.get('msg')}")
                ws.close()

        # Subscription result handling
        elif data.get("resAc") == "subscribe":
            if data.get("code") == 1:
                print(f"✅ Subscription successful! Symbols: {SUBSCRIBE_SYMBOLS}, Types: {DATA_TYPES}")
            else:
                print(f"❌ Subscription failed: {data.get('msg')}")

        # Real-time data processing
        elif data.get("data"):
            market_data = data["data"]
            data_type = market_data.get("type")
            symbol = market_data.get("s")

            if data_type == "quote":
                # Quote data
                print(f"📊 {symbol} Quote - Last Price: {market_data.get('ld')}, "
                      f"Change Percentage: {market_data.get('chp')}%")

            elif data_type == "tick":
                # Tick-by-tick transaction data
                print(f"💹 {symbol} Transaction - Price: {market_data.get('p')}, "
                      f"Volume: {market_data.get('v')}, Side: {market_data.get('side')}")

            elif data_type == "depth":
                # Order book depth data
                print(f"📚 {symbol} Order Book - Bid 1: {market_data.get('b')[0] if market_data.get('b') else 'N/A'}, "
                      f"Ask 1: {market_data.get('a')[0] if market_data.get('a') else 'N/A'}")

    except json.JSONDecodeError as e:
        print(f"❌ Data parsing failed: {e}")

def on_error(ws, error):
    """Error handling"""
    print(f"❌ Connection error: {error}")

def on_close(ws, close_status_code, close_msg):
    """Automatic reconnection when connection is closed"""
    print(f"🔌 Connection closed, reconnecting automatically in 3 seconds...")
    time.sleep(3)
    start_websocket()

def on_open(ws):
    """Triggered when connection is established"""
    print("🔗 WebSocket connection opened")

def subscribe(ws):
    """Send subscription request"""
    subscribe_msg = {
        "ac": "subscribe",
        "params": SUBSCRIBE_SYMBOLS,
        "types": DATA_TYPES
    }
    ws.send(json.dumps(subscribe_msg))

def send_ping(ws):
    """Send heartbeat packet every 30 seconds to maintain connection"""
    while True:
        time.sleep(30)
        try:
            ping_msg = {
                "ac": "ping",
                "params": str(int(time.time() * 1000))
            }
            ws.send(json.dumps(ping_msg))
            # print("📡 Heartbeat packet sent")  # Uncomment for debugging
        except Exception as e:
            print(f"❌ Failed to send heartbeat packet: {e}")

def start_websocket():
    """Start WebSocket connection"""
    ws = websocket.WebSocketApp(
        WS_URL,
        header={"token": API_TOKEN},
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close
    )

    # Start heartbeat thread
    ping_thread = threading.Thread(target=send_ping, args=(ws,))
    ping_thread.daemon = True
    ping_thread.start()

    # Run connection
    ws.run_forever()

if __name__ == "__main__":
    print("🚀 Starting real-time market data receiving program...")
    start_websocket()

A complete WebSocket market data client needs to implement functions such as connection management, message processing, and subscription management. After the connection is established, a login message is first sent for identity authentication. Upon successful authentication, a subscription message is sent to specify the list of stock codes for which market data is to be received.

In the message processing function, messages are distributed and processed according to their types. For tick-by-tick transaction data, it is necessary to extract the stock code, transaction price, transaction volume, and active direction (active buy or active sell). Based on this information, the net capital inflow can be calculated in real-time: if it is an active buy transaction, the transaction amount is added to the net inflow; if it is an active sell transaction, it is deducted from the net inflow. This real-time calculated net inflow indicator can intuitively reflect the movement of major capital flows.

4.3 Risk Control Mechanism Design

A quantitative system driven by Level-2 data needs to establish a multi-level risk control system:
Data Layer Risk Control: Monitor the packet loss rate of market data. When the packet loss rate exceeds a preset threshold (e.g., 1%), it indicates potential issues with data quality, and the system should trigger a circuit breaker to automatically switch to a backup data source.

Signal Layer Risk Control: Set a confidence threshold for each trading signal. When the confidence of the signal output by the model is lower than 0.7, even if the signal is triggered, opening new positions should be prohibited to avoid trading based on ambiguous signals.

Order Layer Risk Control: Restrict the single order size to no more than 5% of the account equity. For large orders exceeding the limit, they can be automatically split into multiple small orders for submission or directly rejected for execution.

Account Layer Risk Control: Monitor the daily maximum drawdown of the account. When the drawdown exceeds 10%, initiate a forced liquidation process and suspend all subsequent trading operations to prevent further losses.

System Layer Risk Control: Monitor the response latency of the trading API. When the response time exceeds 2 seconds, it indicates potential issues with the primary trading channel, and an immediate switch to the backup trading channel should be made.

V. Compliance and Thresholds for Level-2 Data API

5.1 Access Thresholds

According to the regulations of multiple brokerages and platforms, Level-2 data interfaces usually have the following restrictions:
Asset Threshold: Investors need to meet the condition of having an account asset of no less than 100,000 RMB to be eligible to apply for Level-2 market data access rights.

Permission Application: Level-2 permissions need to be applied for separately through channels provided by the brokerage official or data service providers. The application process and requirements may vary slightly among different brokerages.

Compliance Requirements: Level-2 data is for internal use by individuals or institutions only and shall not be privately forwarded, resold, or provided to third parties, otherwise legal liabilities may be incurred.

5.2 Notes on Data Usage

Data Timeliness: Level-2 data has strong timeliness, and real-time data for the day is usually automatically cleared by the system after the market closes. If post-market review and analysis are required, data caching and storage must be actively performed during the trading day.

Historical Data Completion: If historical Level-2 data needs to be queried, it is usually necessary to wait until T+1 to download the complete end-of-day data file. This means that the day's data cannot be queried historically on the same day.

Multi-Instance Deployment: When multiple processes need to be deployed to receive market data simultaneously, each process must initialize the connection independently. Reusing the same connection or establishing too many connections at the same time may trigger port conflict restrictions on the server side.

VI. Practical Recommendations for Developers

6.1 Progressive Development Path

For developers new to Level-2 data, it is recommended to follow this progressive path:
Phase 1: Use free data sources (such as Tushare Pro's free quota) to familiarize yourself with the basic data structure of Level-2, and build a local time-series database (such as DolphinDB or InfluxDB) for data storage practice.

Phase 2: Develop low-frequency strategies based on minute K-lines to verify the stability and accuracy of data access. This phase mainly tests the reliability of the data pipeline.

Phase 3: Introduce order book snapshot data, develop order book-based factors (such as order book imbalance, depth slope, etc.), and learn how to extract valuable information from order book data.

Phase 4: Access tick-by-tick transaction data, implement high-frequency signals and algorithmic trading, and deeply understand the dynamic changes of market microstructure.

6.2 Technical Selection Recommendations

In terms of technical component selection, the following are mature solutions:

Time-Series Database: DolphinDB or InfluxDB are recommended, both supporting columnar storage and vectorized computing, capable of efficiently processing massive time-series data.
Message Queue: Kafka is the industry standard, with high throughput, data persistence, and good ecological support.
Programming Language: Python can be used for prototype development and strategy research, and key performance-sensitive modules can be extended with C++ to improve execution efficiency, balancing development efficiency and runtime performance.
Containerization: Package applications using Docker and orchestrate them with Kubernetes to achieve elastic scaling and convenient deployment.
Monitoring System: Prometheus combined with Grafana is a mature monitoring solution that can real-time observe key indicators such as system operation status, data latency, and error rate.

VII. Conclusion

The A-share Level-2 market data API opens a door to market microstructure for quantitative developers. From in-depth analysis of ten-level order books to tracking capital flows through tick-by-tick transactions, Level-2 data carries ten times more signal value than traditional market data. Of course, the ability to process massive data, stable system architecture, and strict risk control mechanisms are all challenges that must be overcome on this path.

Whether you are an individual quantitative enthusiast or a developer in a professional institution, it is hoped that this article will help you establish a complete understanding of the Level-2 data API and enable you to walk more steadily and further on the path of quantitative trading.

Reference Documentation: https://docs.itick.org/rest-api/stocks/stock-depth
GitHub: https://github.com/itick-org/

DEV Community