Prashant Iyenga

Posted on Jul 20 • Originally published at Medium

A Technical Guide to Downloading and Managing Binance Historical Crypto Market Data

#binance #crypto #python #algorithmictrading

Introduction

This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.

Binance Public Data Repository

Binance makes its market data publicly accessible via two channels:

Website download at data.binance.vision
GitHub repo containing helper scripts and documentation: binance-public-data (github.com)

All data is provided in two granularities:

Daily files (new files appear each day for the previous day’s data)
Monthly files (new files appear on the first Monday of each month and contain all days in that month)

Both daily and monthly files are available for all supported intervals (e.g., 1m, 5m, 1h, 1d, etc.) across Kline, Trade, and AggTrade datasets. This means you can download either daily or monthly archives for any interval, depending on your needs. For efficient data management, it’s recommended to use monthly files for historical periods (as they consolidate all daily data for the month), and supplement with daily files for the most recent days not yet included in the latest monthly archive.

Data Types: Kline, Trade, and AggTrade

1. Kline (Candlestick) Data

Kline files correspond to Binance’s /api/v3/klines REST endpoint and provide OHLCV for fixed time intervals. Each record includes:

Field	Description
`open_time`	Start timestamp of the interval
`open`, `high`, `low`, `close`	Price metrics
`volume`	Base-asset volume during the interval
`close_time`	End timestamp of the interval
`quote_asset_volume`	Quote-asset volume during the interval
`num_trades`	Number of trades in the interval
`taker_buy_base_asset_volume`	Volume bought by takers
`taker_buy_quote_asset_volume`	Quote volume bought by takers
`ignore`	Unused

All common intervals (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1mo, etc.) are supported. (github.com)

2. Raw Trade Data

Trade files come directly from /api/v3/historicalTrades. Each row is a single trade execution, including price, quantity, timestamp, and maker/taker flags. Use raw trades when you need every individual execution event for tick-by-tick backtesting and slippage modeling. (github.com, quantifiedstrategies.com)

3. Aggregate Trade (aggTrade) Data

AggTrades are derived from /api/v3/aggTrades. They bundle together consecutive trades at the same price into one record, reducing data volume while preserving essential trade information. (developers.binance.com)

Field	Description
`aggregateTradeId`	Internal ID for the aggregated record
`price`	Price at which these trades occurred
`quantity`	Total quantity across bundled trades
`firstTradeId`, `lastTradeId`	Range of original trade IDs
`timestamp`	Timestamp when the last trade in the bundle occurred
`isBuyerMaker`	Whether the buyer of the last trade was maker
`isBestMatch`	Whether the last trade matched at best price

Which Data to Use for Backtesting vs. Live Trading

Choosing the right dataset depends on your system requirements:

Backtesting (Historical Simulation)
- Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
- Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
Live Trading (Real-time Execution)
- Efficiency: Subscribe to the aggTrade WebSocket stream (<symbol>@aggTrade) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com)
- Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (<symbol>@trade). Be prepared for higher message rates and processing overhead. (developers.binance.com)

Note: Aggregated trades can sometimes exhibit small discrepancies in volumes or trade counts compared to raw trades or klines; always validate critical metrics against a secondary source. (dev.binance.vision)

Downloading Data

Data files are hosted on data.binance.vision. The general URL pattern for daily spot Klines is:

https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip

For example, to download BTCUSDT 1-minute data for June 30, 2025:

curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip

Monthly data uses a similar pattern under data/spot/monthly/klines/.... (github.com)

Example: Fetching and Parsing with Python

Binance provides an official Python utility for downloading and parsing historical spot data (Klines, Trades, AggTrades) from their public dataset. The scripts are available in the binance-public-data/python repository.

Prerequisites

Install the required dependencies:

pip install -r requirements.txt

Or, if you only need the basics:

pip install requests tqdm

Clone the repository to access the scripts:

git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python

Downloading Data

The main script is download_data.py, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.

Usage

python download_data.py --help

Key options include:

--market-type (spot, um, cm)
--data-type (klines, trades, aggTrades)
--symbol (e.g., BTCUSDT)
--interval (for klines, e.g., 1m, 1h)
--start-date and --end-date (YYYY-MM-DD)
--frequency (daily, monthly)
--out-dir (output directory)
--workers (number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)

Example Commands

Download daily 1h Klines for BTCUSDT from July to August 2025:

python download_data.py \
  --market-type spot \
  --data-type klines \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2025-07-01 \
  --end-date 2025-08-31 \
  --frequency daily \
  --out-dir ./ohlcv_1h \
  --workers 2

Download daily raw Trades for ETHUSDT for June 2025:

python download_data.py \
  --market-type spot \
  --data-type trades \
  --symbol ETHUSDT \
  --start-date 2025-06-01 \
  --end-date 2025-06-30 \
  --frequency daily \
  --out-dir ./trades_ethusdt

Example: Smart Downloader and Merger for Binance OHLCV Data

The smart_binance_downloader.py script provides an intelligent solution for downloading, extracting, and merging Binance OHLCV (Kline) data into consolidated CSV files. It's built on modified versions of download_kline.py and utility.py from the Binance Public Data repository with several enhancements for efficiency and reliability.

Key Features

Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
Incremental Updates: Maintains a single merged CSV file per {symbol}_{interval} combination
Delta Downloads: Analyzes existing data to identify and download only missing date ranges
Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs

Core Function: Rate Limit Handling

def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
  sleep_time = rate_limit_sleep
  while True:
    try:
      download_func(*args, **kwargs)
      return
    except Exception as e:
      # Check for HTTP 429 or rate limit in error message
      if "429" in str(e) or "rate limit" in str(e).lower():
        print(f"Rate limited. Sleeping for {sleep_time} seconds...")
        time.sleep(sleep_time)
        sleep_time = min(sleep_time * 2, max_backoff)
      else:
        raise

This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.

Command Line Interface

python smart_binance_downloader.py \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2023-01-01 \
  --end-date 2023-02-28 \
  --rate-limit-sleep 2 \
  --max-backoff 32 \
  --data-dir ./custom_data_folder

Implementation Notes

The script improves upon Binance's original tools by:

Advanced Path Management: Creates required directories and handles file paths intelligently
Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
Error Recovery: Graceful handling of network issues and rate limits
File Organization: Creates a clean, organized directory structure for downloaded files
Multi-format Support: Handles both daily and monthly download formats seamlessly

The underlying code leverages modified versions of Binance's download_monthly_klines and download_daily_klines functions, adapting them to work with a more intelligent file management system. This ensures you always maintain a single, up-to-date CSV file for each {symbol}_{interval} pair, simplifying data management for backtesting and analysis.

The script uses an initial sleep (--rate-limit-sleep) between requests and an exponential backoff (--max-backoff) when encountering HTTP 429 to respect the CDN’s limits.

The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py.

Conclusion

Accessing and managing Binance’s historical OHLCV and trade-level data is essential for robust quantitative research, backtesting, and live trading in the crypto markets. By leveraging the public data repository, official scripts, and enhanced tools like smart_binance_downloader.py, users can efficiently acquire, update, and maintain high-quality datasets tailored to their strategy requirements. Always validate your data sources, handle rate limits responsibly, and choose the appropriate data granularity for your use case to ensure reliable and reproducible results.

Conclusion

Introduction

This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.

Binance Public Data Repository

Binance makes its market data publicly accessible via two channels:

Website download at data.binance.vision
GitHub repo containing helper scripts and documentation: binance-public-data (github.com)

All data is provided in two granularities:

Daily files (new files appear each day for the previous day’s data)
Monthly files (new files appear on the first Monday of each month and contain all days in that month)

Data Types: Kline, Trade, and AggTrade

1. Kline (Candlestick) Data

Kline files correspond to Binance’s /api/v3/klines REST endpoint and provide OHLCV for fixed time intervals. Each record includes:

Field	Description
`open_time`	Start timestamp of the interval
`open`, `high`, `low`, `close`	Price metrics
`volume`	Base-asset volume during the interval
`close_time`	End timestamp of the interval
`quote_asset_volume`	Quote-asset volume during the interval
`num_trades`	Number of trades in the interval
`taker_buy_base_asset_volume`	Volume bought by takers
`taker_buy_quote_asset_volume`	Quote volume bought by takers
`ignore`	Unused

All common intervals (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1mo, etc.) are supported. (github.com)

2. Raw Trade Data

3. Aggregate Trade (aggTrade) Data

Field	Description
`aggregateTradeId`	Internal ID for the aggregated record
`price`	Price at which these trades occurred
`quantity`	Total quantity across bundled trades
`firstTradeId`, `lastTradeId`	Range of original trade IDs
`timestamp`	Timestamp when the last trade in the bundle occurred
`isBuyerMaker`	Whether the buyer of the last trade was maker
`isBestMatch`	Whether the last trade matched at best price

Which Data to Use for Backtesting vs. Live Trading

Choosing the right dataset depends on your system requirements:

Backtesting (Historical Simulation)
- Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
- Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
Live Trading (Real-time Execution)
- Efficiency: Subscribe to the aggTrade WebSocket stream (<symbol>@aggTrade) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com)
- Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (<symbol>@trade). Be prepared for higher message rates and processing overhead. (developers.binance.com)

Downloading Data

Data files are hosted on data.binance.vision. The general URL pattern for daily spot Klines is:

https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip

For example, to download BTCUSDT 1-minute data for June 30, 2025:

curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip

Monthly data uses a similar pattern under data/spot/monthly/klines/.... (github.com)

Example: Fetching and Parsing with Python

Prerequisites

Install the required dependencies:

pip install -r requirements.txt

Or, if you only need the basics:

pip install requests tqdm

Clone the repository to access the scripts:

git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python

Downloading Data

The main script is download_data.py, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.

Usage

python download_data.py --help

Key options include:

--market-type (spot, um, cm)
--data-type (klines, trades, aggTrades)
--symbol (e.g., BTCUSDT)
--interval (for klines, e.g., 1m, 1h)
--start-date and --end-date (YYYY-MM-DD)
--frequency (daily, monthly)
--out-dir (output directory)
--workers (number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)

Example Commands

Download daily 1h Klines for BTCUSDT from July to August 2025:

python download_data.py \
  --market-type spot \
  --data-type klines \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2025-07-01 \
  --end-date 2025-08-31 \
  --frequency daily \
  --out-dir ./ohlcv_1h \
  --workers 2

Download daily raw Trades for ETHUSDT for June 2025:

python download_data.py \
  --market-type spot \
  --data-type trades \
  --symbol ETHUSDT \
  --start-date 2025-06-01 \
  --end-date 2025-06-30 \
  --frequency daily \
  --out-dir ./trades_ethusdt

Example: Smart Downloader and Merger for Binance OHLCV Data

Key Features

Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
Incremental Updates: Maintains a single merged CSV file per {symbol}_{interval} combination
Delta Downloads: Analyzes existing data to identify and download only missing date ranges
Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs

Core Function: Rate Limit Handling

def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
  sleep_time = rate_limit_sleep
  while True:
    try:
      download_func(*args, **kwargs)
      return
    except Exception as e:
      # Check for HTTP 429 or rate limit in error message
      if "429" in str(e) or "rate limit" in str(e).lower():
        print(f"Rate limited. Sleeping for {sleep_time} seconds...")
        time.sleep(sleep_time)
        sleep_time = min(sleep_time * 2, max_backoff)
      else:
        raise

This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.

Command Line Interface

python smart_binance_downloader.py \
  --symbol BTCUSDT \
  --interval 1h \
  --start-date 2023-01-01 \
  --end-date 2023-02-28 \
  --rate-limit-sleep 2 \
  --max-backoff 32 \
  --data-dir ./custom_data_folder

Implementation Notes

The script improves upon Binance's original tools by:

Advanced Path Management: Creates required directories and handles file paths intelligently
Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
Error Recovery: Graceful handling of network issues and rate limits
File Organization: Creates a clean, organized directory structure for downloaded files
Multi-format Support: Handles both daily and monthly download formats seamlessly

The script uses an initial sleep (--rate-limit-sleep) between requests and an exponential backoff (--max-backoff) when encountering HTTP 429 to respect the CDN’s limits.

The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py.