DEV Community

Prashant Iyenga
Prashant Iyenga

Posted on • Originally published at Medium

A Technical Guide to Downloading and Managing Binance Historical Crypto Market Data

Introduction

This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.

Binance Public Data Repository

Binance makes its market data publicly accessible via two channels:

All data is provided in two granularities:

  • Daily files (new files appear each day for the previous day’s data)
  • Monthly files (new files appear on the first Monday of each month and contain all days in that month)

Both daily and monthly files are available for all supported intervals (e.g., 1m, 5m, 1h, 1d, etc.) across Kline, Trade, and AggTrade datasets. This means you can download either daily or monthly archives for any interval, depending on your needs. For efficient data management, it’s recommended to use monthly files for historical periods (as they consolidate all daily data for the month), and supplement with daily files for the most recent days not yet included in the latest monthly archive.


Data Types: Kline, Trade, and AggTrade

1. Kline (Candlestick) Data

Kline files correspond to Binance’s /api/v3/klines REST endpoint and provide OHLCV for fixed time intervals. Each record includes:

Field Description
open_time Start timestamp of the interval
open, high, low, close Price metrics
volume Base-asset volume during the interval
close_time End timestamp of the interval
quote_asset_volume Quote-asset volume during the interval
num_trades Number of trades in the interval
taker_buy_base_asset_volume Volume bought by takers
taker_buy_quote_asset_volume Quote volume bought by takers
ignore Unused

All common intervals (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1mo, etc.) are supported. (github.com)

2. Raw Trade Data

Trade files come directly from /api/v3/historicalTrades. Each row is a single trade execution, including price, quantity, timestamp, and maker/taker flags. Use raw trades when you need every individual execution event for tick-by-tick backtesting and slippage modeling. (github.com, quantifiedstrategies.com)

3. Aggregate Trade (aggTrade) Data

AggTrades are derived from /api/v3/aggTrades. They bundle together consecutive trades at the same price into one record, reducing data volume while preserving essential trade information. (developers.binance.com)

Field Description
aggregateTradeId Internal ID for the aggregated record
price Price at which these trades occurred
quantity Total quantity across bundled trades
firstTradeId, lastTradeId Range of original trade IDs
timestamp Timestamp when the last trade in the bundle occurred
isBuyerMaker Whether the buyer of the last trade was maker
isBestMatch Whether the last trade matched at best price

Which Data to Use for Backtesting vs. Live Trading

Choosing the right dataset depends on your system requirements:

  • Backtesting (Historical Simulation)

    • Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
    • Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
  • Live Trading (Real-time Execution)

    • Efficiency: Subscribe to the aggTrade WebSocket stream (<symbol>@aggTrade) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com)
    • Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (<symbol>@trade). Be prepared for higher message rates and processing overhead. (developers.binance.com)

Note: Aggregated trades can sometimes exhibit small discrepancies in volumes or trade counts compared to raw trades or klines; always validate critical metrics against a secondary source. (dev.binance.vision)


Downloading Data

Data files are hosted on data.binance.vision. The general URL pattern for daily spot Klines is:

https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip
Enter fullscreen mode Exit fullscreen mode

For example, to download BTCUSDT 1-minute data for June 30, 2025:

curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip
Enter fullscreen mode Exit fullscreen mode

Monthly data uses a similar pattern under data/spot/monthly/klines/.... (github.com)


Example: Fetching and Parsing with Python

Binance provides an official Python utility for downloading and parsing historical spot data (Klines, Trades, AggTrades) from their public dataset. The scripts are available in the binance-public-data/python repository.

Prerequisites

Install the required dependencies:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Or, if you only need the basics:

pip install requests tqdm
Enter fullscreen mode Exit fullscreen mode

Clone the repository to access the scripts:

git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python
Enter fullscreen mode Exit fullscreen mode

Downloading Data

The main script is download_data.py, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.

Usage

python download_data.py --help
Enter fullscreen mode Exit fullscreen mode

Key options include:

  • --market-type (spot, um, cm)
  • --data-type (klines, trades, aggTrades)
  • --symbol (e.g., BTCUSDT)
  • --interval (for klines, e.g., 1m, 1h)
  • --start-date and --end-date (YYYY-MM-DD)
  • --frequency (daily, monthly)
  • --out-dir (output directory)
  • --workers (number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)

Example Commands

Download daily 1h Klines for BTCUSDT from July to August 2025:

python download_data.py \
  --market-type spot \
  --data-type klines \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2025-07-01 \
  --end-date 2025-08-31 \
  --frequency daily \
  --out-dir ./ohlcv_1h \
  --workers 2
Enter fullscreen mode Exit fullscreen mode

Download daily raw Trades for ETHUSDT for June 2025:

python download_data.py \
  --market-type spot \
  --data-type trades \
  --symbol ETHUSDT \
  --start-date 2025-06-01 \
  --end-date 2025-06-30 \
  --frequency daily \
  --out-dir ./trades_ethusdt
Enter fullscreen mode Exit fullscreen mode

Example: Smart Downloader and Merger for Binance OHLCV Data

The smart_binance_downloader.py script provides an intelligent solution for downloading, extracting, and merging Binance OHLCV (Kline) data into consolidated CSV files. It's built on modified versions of download_kline.py and utility.py from the Binance Public Data repository with several enhancements for efficiency and reliability.

Key Features

  • Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
  • Incremental Updates: Maintains a single merged CSV file per {symbol}_{interval} combination
  • Delta Downloads: Analyzes existing data to identify and download only missing date ranges
  • Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
  • Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs

Core Function: Rate Limit Handling

def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
  sleep_time = rate_limit_sleep
  while True:
    try:
      download_func(*args, **kwargs)
      return
    except Exception as e:
      # Check for HTTP 429 or rate limit in error message
      if "429" in str(e) or "rate limit" in str(e).lower():
        print(f"Rate limited. Sleeping for {sleep_time} seconds...")
        time.sleep(sleep_time)
        sleep_time = min(sleep_time * 2, max_backoff)
      else:
        raise
Enter fullscreen mode Exit fullscreen mode

This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.

Command Line Interface

python smart_binance_downloader.py \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2023-01-01 \
  --end-date 2023-02-28 \
  --rate-limit-sleep 2 \
  --max-backoff 32 \
  --data-dir ./custom_data_folder
Enter fullscreen mode Exit fullscreen mode

Implementation Notes

The script improves upon Binance's original tools by:

  1. Advanced Path Management: Creates required directories and handles file paths intelligently
  2. Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
  3. Error Recovery: Graceful handling of network issues and rate limits
  4. File Organization: Creates a clean, organized directory structure for downloaded files
  5. Multi-format Support: Handles both daily and monthly download formats seamlessly

The underlying code leverages modified versions of Binance's download_monthly_klines and download_daily_klines functions, adapting them to work with a more intelligent file management system. This ensures you always maintain a single, up-to-date CSV file for each {symbol}_{interval} pair, simplifying data management for backtesting and analysis.

The script uses an initial sleep (--rate-limit-sleep) between requests and an exponential backoff (--max-backoff) when encountering HTTP 429 to respect the CDN’s limits.

The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py.

Conclusion

Accessing and managing Binance’s historical OHLCV and trade-level data is essential for robust quantitative research, backtesting, and live trading in the crypto markets. By leveraging the public data repository, official scripts, and enhanced tools like smart_binance_downloader.py, users can efficiently acquire, update, and maintain high-quality datasets tailored to their strategy requirements. Always validate your data sources, handle rate limits responsibly, and choose the appropriate data granularity for your use case to ensure reliable and reproducible results.

Conclusion

Introduction

This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.

Binance Public Data Repository

Binance makes its market data publicly accessible via two channels:

All data is provided in two granularities:

  • Daily files (new files appear each day for the previous day’s data)
  • Monthly files (new files appear on the first Monday of each month and contain all days in that month)

Both daily and monthly files are available for all supported intervals (e.g., 1m, 5m, 1h, 1d, etc.) across Kline, Trade, and AggTrade datasets. This means you can download either daily or monthly archives for any interval, depending on your needs. For efficient data management, it’s recommended to use monthly files for historical periods (as they consolidate all daily data for the month), and supplement with daily files for the most recent days not yet included in the latest monthly archive.


Data Types: Kline, Trade, and AggTrade

1. Kline (Candlestick) Data

Kline files correspond to Binance’s /api/v3/klines REST endpoint and provide OHLCV for fixed time intervals. Each record includes:

Field Description
open_time Start timestamp of the interval
open, high, low, close Price metrics
volume Base-asset volume during the interval
close_time End timestamp of the interval
quote_asset_volume Quote-asset volume during the interval
num_trades Number of trades in the interval
taker_buy_base_asset_volume Volume bought by takers
taker_buy_quote_asset_volume Quote volume bought by takers
ignore Unused

All common intervals (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1mo, etc.) are supported. (github.com)

2. Raw Trade Data

Trade files come directly from /api/v3/historicalTrades. Each row is a single trade execution, including price, quantity, timestamp, and maker/taker flags. Use raw trades when you need every individual execution event for tick-by-tick backtesting and slippage modeling. (github.com, quantifiedstrategies.com)

3. Aggregate Trade (aggTrade) Data

AggTrades are derived from /api/v3/aggTrades. They bundle together consecutive trades at the same price into one record, reducing data volume while preserving essential trade information. (developers.binance.com)

Field Description
aggregateTradeId Internal ID for the aggregated record
price Price at which these trades occurred
quantity Total quantity across bundled trades
firstTradeId, lastTradeId Range of original trade IDs
timestamp Timestamp when the last trade in the bundle occurred
isBuyerMaker Whether the buyer of the last trade was maker
isBestMatch Whether the last trade matched at best price

Which Data to Use for Backtesting vs. Live Trading

Choosing the right dataset depends on your system requirements:

  • Backtesting (Historical Simulation)

    • Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
    • Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
  • Live Trading (Real-time Execution)

    • Efficiency: Subscribe to the aggTrade WebSocket stream (<symbol>@aggTrade) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com)
    • Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (<symbol>@trade). Be prepared for higher message rates and processing overhead. (developers.binance.com)

Note: Aggregated trades can sometimes exhibit small discrepancies in volumes or trade counts compared to raw trades or klines; always validate critical metrics against a secondary source. (dev.binance.vision)


Downloading Data

Data files are hosted on data.binance.vision. The general URL pattern for daily spot Klines is:

https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip
Enter fullscreen mode Exit fullscreen mode

For example, to download BTCUSDT 1-minute data for June 30, 2025:

curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip
Enter fullscreen mode Exit fullscreen mode

Monthly data uses a similar pattern under data/spot/monthly/klines/.... (github.com)


Example: Fetching and Parsing with Python

Binance provides an official Python utility for downloading and parsing historical spot data (Klines, Trades, AggTrades) from their public dataset. The scripts are available in the binance-public-data/python repository.

Prerequisites

Install the required dependencies:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Or, if you only need the basics:

pip install requests tqdm
Enter fullscreen mode Exit fullscreen mode

Clone the repository to access the scripts:

git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python
Enter fullscreen mode Exit fullscreen mode

Downloading Data

The main script is download_data.py, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.

Usage

python download_data.py --help
Enter fullscreen mode Exit fullscreen mode

Key options include:

  • --market-type (spot, um, cm)
  • --data-type (klines, trades, aggTrades)
  • --symbol (e.g., BTCUSDT)
  • --interval (for klines, e.g., 1m, 1h)
  • --start-date and --end-date (YYYY-MM-DD)
  • --frequency (daily, monthly)
  • --out-dir (output directory)
  • --workers (number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)

Example Commands

Download daily 1h Klines for BTCUSDT from July to August 2025:

python download_data.py \
  --market-type spot \
  --data-type klines \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2025-07-01 \
  --end-date 2025-08-31 \
  --frequency daily \
  --out-dir ./ohlcv_1h \
  --workers 2
Enter fullscreen mode Exit fullscreen mode

Download daily raw Trades for ETHUSDT for June 2025:

python download_data.py \
  --market-type spot \
  --data-type trades \
  --symbol ETHUSDT \
  --start-date 2025-06-01 \
  --end-date 2025-06-30 \
  --frequency daily \
  --out-dir ./trades_ethusdt
Enter fullscreen mode Exit fullscreen mode

Example: Smart Downloader and Merger for Binance OHLCV Data

The smart_binance_downloader.py script provides an intelligent solution for downloading, extracting, and merging Binance OHLCV (Kline) data into consolidated CSV files. It's built on modified versions of download_kline.py and utility.py from the Binance Public Data repository with several enhancements for efficiency and reliability.

Key Features

  • Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
  • Incremental Updates: Maintains a single merged CSV file per {symbol}_{interval} combination
  • Delta Downloads: Analyzes existing data to identify and download only missing date ranges
  • Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
  • Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs

Core Function: Rate Limit Handling

def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
  sleep_time = rate_limit_sleep
  while True:
    try:
      download_func(*args, **kwargs)
      return
    except Exception as e:
      # Check for HTTP 429 or rate limit in error message
      if "429" in str(e) or "rate limit" in str(e).lower():
        print(f"Rate limited. Sleeping for {sleep_time} seconds...")
        time.sleep(sleep_time)
        sleep_time = min(sleep_time * 2, max_backoff)
      else:
        raise
Enter fullscreen mode Exit fullscreen mode

This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.

Command Line Interface

python smart_binance_downloader.py \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2023-01-01 \
  --end-date 2023-02-28 \
  --rate-limit-sleep 2 \
  --max-backoff 32 \
  --data-dir ./custom_data_folder
Enter fullscreen mode Exit fullscreen mode

Implementation Notes

The script improves upon Binance's original tools by:

  1. Advanced Path Management: Creates required directories and handles file paths intelligently
  2. Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
  3. Error Recovery: Graceful handling of network issues and rate limits
  4. File Organization: Creates a clean, organized directory structure for downloaded files
  5. Multi-format Support: Handles both daily and monthly download formats seamlessly

The underlying code leverages modified versions of Binance's download_monthly_klines and download_daily_klines functions, adapting them to work with a more intelligent file management system. This ensures you always maintain a single, up-to-date CSV file for each {symbol}_{interval} pair, simplifying data management for backtesting and analysis.

The script uses an initial sleep (--rate-limit-sleep) between requests and an exponential backoff (--max-backoff) when encountering HTTP 429 to respect the CDN’s limits.

The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py.

Conclusion

Accessing and managing Binance’s historical OHLCV and trade-level data is essential for robust quantitative research, backtesting, and live trading in the crypto markets. By leveraging the public data repository, official scripts, and enhanced tools like smart_binance_downloader.py, users can efficiently acquire, update, and maintain high-quality datasets tailored to their strategy requirements. Always validate your data sources, handle rate limits responsibly, and choose the appropriate data granularity for your use case to ensure reliable and reproducible results.

Conclusion

Top comments (0)