DEV Community

didi yang
didi yang

Posted on

Why Do Stock Market APIs Return Duplicate Candlestick Data? A High-Frequency Trader’s Practical Breakdown

Having spent years building and maintaining real-time market data pipelines for my personal high-frequency trading systems, I’ve grown extremely familiar with one confusing quirk of stock data APIs: repeated candlestick records.
When I first noticed this issue, I immediately suspected network instability, client-side cache delays, or faulty API responses. I spent hours troubleshooting my connection and local logic before I fully unpacked the entire market data delivery chain. What I finally discovered surprised me: duplicate K-line entries are not an API error. They are a standard behavioral feature of real-time streaming market data.

Trading Scenarios Where Duplicate Candle Data Causes Problems

If you’re only doing daily market analysis or low-frequency strategy research, repeated candlestick data is basically harmless. A handful of duplicate entries won’t affect your overall trend judgment or backtest results.
But for high-frequency traders running intraday automated strategies, this minor data quirk becomes a critical bug. Unfiltered duplicate candles bloat local data arrays, trigger redundant trading signals, create inconsistent backtest and live trading results, and even cause unnecessary memory pressure on lightweight trading clients. This is why cleaning up duplicate candle data is a foundational step for stable HFT system operation.

How Trading APIs Generate Real-Time Candlestick Bars

Most new quantitative developers hold a wrong assumption: that each candlestick is a fixed, finalized piece of data generated once per time window. In reality, all K-line metrics are aggregated dynamically from raw tick-by-tick transaction data.
Different data providers adopt different aggregation workflows. Some complete data calculation on the exchange server side, some process data during middle-tier distribution, and others re-aggregate values at the final API service layer. For my daily real-time strategy development, I rely on AllTick API for stable and low-latency market stream delivery.
The core rule every trader needs to know is straightforward: any unclosed candlestick timeframe is dynamically updating.
Taking a 1-minute candle as an example, the OHLC values keep changing with every new market transaction before the minute window closes. To guarantee real-time data accuracy, API servers continuously push updated snapshots of the active time window. The duplicate records we observe on the client side are not multiple independent candles — they are iterative state updates of a single unfinished candlestick.

Three Key Reasons for Repeated Candlestick Delivery

Based on my long-term live debugging and pipeline optimization experience, there are three core factors that lead to duplicate K-line data in trading APIs:
1. Time desynchronization across data nodes
The full market data link includes three separate time sources: exchange server time, API backend time, and local client device time. Tiny millisecond-level time offsets are unavoidable in network transmission. These subtle deviations can cause the client system to misjudge updated candle data as brand-new records, resulting in visible duplication.
2. Hybrid convergence of multiple data streams
To ensure high availability and fault tolerance, mainstream market data systems adopt dual-stream deployment, synchronizing real-time original tick streams and cached backup streams simultaneously. Without unified server-side deduplication rules, updated candlestick data will be pushed repeatedly through different data channels.
3. Incremental updates for unfinished candles
This is the most common cause of duplicate visuals. Every price fluctuation in an open time window revises the candle’s open, high, low and close values. The server pushes the latest market status each time the data changes, forming a series of highly similar records on the consumer end that appear to be duplicates.

Client-Side Deduplication Solution for HFT Systems

Instead of fixating on optimizing server-side logic (which we cannot control as API users), the most efficient and reliable approach is to build complete deduplication rules on the data consumption side.
The most widely adopted industry solution is creating a unique identifier combining trading symbol + exact timestamp + timeframe. This composite key can uniquely locate every single candlestick bar in the market.
I always replace the traditional data appending logic with overwrite storage logic. Whenever new data carrying the same unique key is received, it overwrites the old local record. No matter how many updates the server pushes, only the latest and most accurate candle state is retained locally, ensuring single stable data output for each time window.
For ultra-high-frequency tick streaming scenarios, adding a short-term cache filtering layer can effectively block burst repeated pushes within a short period, preventing local array expansion and reducing program runtime overhead.

Practical Implementation Code

The following WebSocket subscription code implements real-time market access and client-side deduplication logic to eliminate duplicate candlestick accumulation:

import websocket
import json
import uuid

store = {}

def on_message(ws, message):
    data = json.loads(message)
    if data.get("cmd_id") == 22998:  # tick 数据
        tick = data["data"]
        key = f"{tick['code']}_{tick['tick_time']}"
        store[key] = tick  # 覆盖写入
        print(key, tick["price"])

def on_open(ws):
    req = {
        "cmd_id": 22004,
        "seq_id": 1,
        "trace": str(uuid.uuid4()),
        "data": {
            "symbol_list": [{"code": "AAPL"}, {"code": "TSLA"}]
        }
    }
    ws.send(json.dumps(req))

ws = websocket.WebSocketApp(
    "wss://stream.alltick.co/v1/stock",
    on_message=on_message,
    on_open=on_open
)

ws.run_forever()
Enter fullscreen mode Exit fullscreen mode

With this lightweight overwrite strategy, repeated updates for the same candlestick window no longer cause data accumulation. Your local dataset always maintains the latest market status, perfectly fitting the stability requirements of automated trading and quantitative analysis.

Final Thoughts & Trading Insights

After years of practicing quantitative trading and market data development, I’ve completely redefined my understanding of “duplicate candle data”.
Unclosed candlesticks are dynamic, evolving data structures rather than static fixed records. Every push from the server is a real-time market snapshot that records the latest price changes, not invalid redundant data.
The core of data processing is not simply avoiding repetition, but reconstructing the complete evolution track of each candlestick through standardized unique identification and timestamp calibration.
Once you master this logic, you will no longer treat duplicate candles as system errors. Instead, you’ll recognize them as real-time feedback of market volatility — subtle changes that precisely reflect the continuous breathing and movement of the financial market.

Top comments (0)