Introduction
This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.
Binance Public Data Repository
Binance makes its market data publicly accessible via two channels:
- Website download at data.binance.vision
- GitHub repo containing helper scripts and documentation: binance-public-data (github.com)
All data is provided in two granularities:
- Daily files (new files appear each day for the previous day’s data)
- Monthly files (new files appear on the first Monday of each month and contain all days in that month)
Both daily and monthly files are available for all supported intervals (e.g., 1m
, 5m
, 1h
, 1d
, etc.) across Kline, Trade, and AggTrade datasets. This means you can download either daily or monthly archives for any interval, depending on your needs. For efficient data management, it’s recommended to use monthly files for historical periods (as they consolidate all daily data for the month), and supplement with daily files for the most recent days not yet included in the latest monthly archive.
Data Types: Kline, Trade, and AggTrade
1. Kline (Candlestick) Data
Kline files correspond to Binance’s /api/v3/klines
REST endpoint and provide OHLCV for fixed time intervals. Each record includes:
Field | Description |
---|---|
open_time |
Start timestamp of the interval |
open , high , low , close
|
Price metrics |
volume |
Base-asset volume during the interval |
close_time |
End timestamp of the interval |
quote_asset_volume |
Quote-asset volume during the interval |
num_trades |
Number of trades in the interval |
taker_buy_base_asset_volume |
Volume bought by takers |
taker_buy_quote_asset_volume |
Quote volume bought by takers |
ignore |
Unused |
All common intervals (1m
, 3m
, 5m
, 15m
, 30m
, 1h
, 2h
, 4h
, 6h
, 8h
, 12h
, 1d
, 3d
, 1w
, 1mo
, etc.) are supported. (github.com)
2. Raw Trade Data
Trade files come directly from /api/v3/historicalTrades
. Each row is a single trade execution, including price, quantity, timestamp, and maker/taker flags. Use raw trades when you need every individual execution event for tick-by-tick backtesting and slippage modeling. (github.com, quantifiedstrategies.com)
3. Aggregate Trade (aggTrade) Data
AggTrades are derived from /api/v3/aggTrades
. They bundle together consecutive trades at the same price into one record, reducing data volume while preserving essential trade information. (developers.binance.com)
Field | Description |
---|---|
aggregateTradeId |
Internal ID for the aggregated record |
price |
Price at which these trades occurred |
quantity |
Total quantity across bundled trades |
firstTradeId , lastTradeId
|
Range of original trade IDs |
timestamp |
Timestamp when the last trade in the bundle occurred |
isBuyerMaker |
Whether the buyer of the last trade was maker |
isBestMatch |
Whether the last trade matched at best price |
Which Data to Use for Backtesting vs. Live Trading
Choosing the right dataset depends on your system requirements:
-
Backtesting (Historical Simulation)
- Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
- Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
-
Live Trading (Real-time Execution)
-
Efficiency: Subscribe to the
aggTrade
WebSocket stream (<symbol>@aggTrade
) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com) -
Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (
<symbol>@trade
). Be prepared for higher message rates and processing overhead. (developers.binance.com)
-
Efficiency: Subscribe to the
Note: Aggregated trades can sometimes exhibit small discrepancies in volumes or trade counts compared to raw trades or klines; always validate critical metrics against a secondary source. (dev.binance.vision)
Downloading Data
Data files are hosted on data.binance.vision
. The general URL pattern for daily spot Klines is:
https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip
For example, to download BTCUSDT 1-minute data for June 30, 2025:
curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip
Monthly data uses a similar pattern under data/spot/monthly/klines/...
. (github.com)
Example: Fetching and Parsing with Python
Binance provides an official Python utility for downloading and parsing historical spot data (Klines, Trades, AggTrades) from their public dataset. The scripts are available in the binance-public-data/python repository.
Prerequisites
Install the required dependencies:
pip install -r requirements.txt
Or, if you only need the basics:
pip install requests tqdm
Clone the repository to access the scripts:
git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python
Downloading Data
The main script is download_data.py
, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.
Usage
python download_data.py --help
Key options include:
-
--market-type
(spot, um, cm) -
--data-type
(klines, trades, aggTrades) -
--symbol
(e.g., BTCUSDT) -
--interval
(for klines, e.g., 1m, 1h) -
--start-date
and--end-date
(YYYY-MM-DD) -
--frequency
(daily, monthly) -
--out-dir
(output directory) -
--workers
(number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)
Example Commands
Download daily 1h Klines for BTCUSDT from July to August 2025:
python download_data.py \
--market-type spot \
--data-type klines \
--symbol BTCUSDT \
--interval 1h \
--start-date 2025-07-01 \
--end-date 2025-08-31 \
--frequency daily \
--out-dir ./ohlcv_1h \
--workers 2
Download daily raw Trades for ETHUSDT for June 2025:
python download_data.py \
--market-type spot \
--data-type trades \
--symbol ETHUSDT \
--start-date 2025-06-01 \
--end-date 2025-06-30 \
--frequency daily \
--out-dir ./trades_ethusdt
Example: Smart Downloader and Merger for Binance OHLCV Data
The smart_binance_downloader.py
script provides an intelligent solution for downloading, extracting, and merging Binance OHLCV (Kline) data into consolidated CSV files. It's built on modified versions of download_kline.py
and utility.py
from the Binance Public Data repository with several enhancements for efficiency and reliability.
Key Features
- Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
-
Incremental Updates: Maintains a single merged CSV file per
{symbol}_{interval}
combination - Delta Downloads: Analyzes existing data to identify and download only missing date ranges
- Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
- Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs
Core Function: Rate Limit Handling
def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
sleep_time = rate_limit_sleep
while True:
try:
download_func(*args, **kwargs)
return
except Exception as e:
# Check for HTTP 429 or rate limit in error message
if "429" in str(e) or "rate limit" in str(e).lower():
print(f"Rate limited. Sleeping for {sleep_time} seconds...")
time.sleep(sleep_time)
sleep_time = min(sleep_time * 2, max_backoff)
else:
raise
This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.
Command Line Interface
python smart_binance_downloader.py \
--symbol BTCUSDT \
--interval 1h \
--start-date 2023-01-01 \
--end-date 2023-02-28 \
--rate-limit-sleep 2 \
--max-backoff 32 \
--data-dir ./custom_data_folder
Implementation Notes
The script improves upon Binance's original tools by:
- Advanced Path Management: Creates required directories and handles file paths intelligently
- Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
- Error Recovery: Graceful handling of network issues and rate limits
- File Organization: Creates a clean, organized directory structure for downloaded files
- Multi-format Support: Handles both daily and monthly download formats seamlessly
The underlying code leverages modified versions of Binance's download_monthly_klines
and download_daily_klines
functions, adapting them to work with a more intelligent file management system. This ensures you always maintain a single, up-to-date CSV file for each {symbol}_{interval}
pair, simplifying data management for backtesting and analysis.
The script uses an initial sleep (--rate-limit-sleep
) between requests and an exponential backoff (--max-backoff
) when encountering HTTP 429 to respect the CDN’s limits.
The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py
.
Conclusion
Accessing and managing Binance’s historical OHLCV and trade-level data is essential for robust quantitative research, backtesting, and live trading in the crypto markets. By leveraging the public data repository, official scripts, and enhanced tools like smart_binance_downloader.py
, users can efficiently acquire, update, and maintain high-quality datasets tailored to their strategy requirements. Always validate your data sources, handle rate limits responsibly, and choose the appropriate data granularity for your use case to ensure reliable and reproducible results.
Conclusion
Introduction
This article provides a technical guide to accessing and utilizing historical OHLCV (Open, High, Low, Close, Volume) and trade-level data for cryptocurrencies via Binance’s public data infrastructure. This documentation is intended for quantitative analysts, algorithmic traders, and developers seeking robust, reproducible workflows for crypto market data ingestion and analysis.
Binance Public Data Repository
Binance makes its market data publicly accessible via two channels:
- Website download at data.binance.vision
- GitHub repo containing helper scripts and documentation: binance-public-data (github.com)
All data is provided in two granularities:
- Daily files (new files appear each day for the previous day’s data)
- Monthly files (new files appear on the first Monday of each month and contain all days in that month)
Both daily and monthly files are available for all supported intervals (e.g., 1m
, 5m
, 1h
, 1d
, etc.) across Kline, Trade, and AggTrade datasets. This means you can download either daily or monthly archives for any interval, depending on your needs. For efficient data management, it’s recommended to use monthly files for historical periods (as they consolidate all daily data for the month), and supplement with daily files for the most recent days not yet included in the latest monthly archive.
Data Types: Kline, Trade, and AggTrade
1. Kline (Candlestick) Data
Kline files correspond to Binance’s /api/v3/klines
REST endpoint and provide OHLCV for fixed time intervals. Each record includes:
Field | Description |
---|---|
open_time |
Start timestamp of the interval |
open , high , low , close
|
Price metrics |
volume |
Base-asset volume during the interval |
close_time |
End timestamp of the interval |
quote_asset_volume |
Quote-asset volume during the interval |
num_trades |
Number of trades in the interval |
taker_buy_base_asset_volume |
Volume bought by takers |
taker_buy_quote_asset_volume |
Quote volume bought by takers |
ignore |
Unused |
All common intervals (1m
, 3m
, 5m
, 15m
, 30m
, 1h
, 2h
, 4h
, 6h
, 8h
, 12h
, 1d
, 3d
, 1w
, 1mo
, etc.) are supported. (github.com)
2. Raw Trade Data
Trade files come directly from /api/v3/historicalTrades
. Each row is a single trade execution, including price, quantity, timestamp, and maker/taker flags. Use raw trades when you need every individual execution event for tick-by-tick backtesting and slippage modeling. (github.com, quantifiedstrategies.com)
3. Aggregate Trade (aggTrade) Data
AggTrades are derived from /api/v3/aggTrades
. They bundle together consecutive trades at the same price into one record, reducing data volume while preserving essential trade information. (developers.binance.com)
Field | Description |
---|---|
aggregateTradeId |
Internal ID for the aggregated record |
price |
Price at which these trades occurred |
quantity |
Total quantity across bundled trades |
firstTradeId , lastTradeId
|
Range of original trade IDs |
timestamp |
Timestamp when the last trade in the bundle occurred |
isBuyerMaker |
Whether the buyer of the last trade was maker |
isBestMatch |
Whether the last trade matched at best price |
Which Data to Use for Backtesting vs. Live Trading
Choosing the right dataset depends on your system requirements:
-
Backtesting (Historical Simulation)
- Interval-based strategies: Kline data is sufficient for time-frame strategies (e.g., hourly breakouts) because it provides OHLCV in fixed windows and is lightweight to process. (logicinv.com)
- Tick-level accuracy: Use raw trade data if you need to simulate order execution, slippage, and order book impact at the individual trade level. This ensures your backtest closely mirrors real-world fills. (quantifiedstrategies.com)
-
Live Trading (Real-time Execution)
-
Efficiency: Subscribe to the
aggTrade
WebSocket stream (<symbol>@aggTrade
) for a balance between granularity and bandwidth. It updates every 100 ms and bundles trades by price. (developers.binance.com) -
Full detail: If your strategy requires every individual fill (e.g., ultra–high-frequency strategies), connect to the raw trade stream (
<symbol>@trade
). Be prepared for higher message rates and processing overhead. (developers.binance.com)
-
Efficiency: Subscribe to the
Note: Aggregated trades can sometimes exhibit small discrepancies in volumes or trade counts compared to raw trades or klines; always validate critical metrics against a secondary source. (dev.binance.vision)
Downloading Data
Data files are hosted on data.binance.vision
. The general URL pattern for daily spot Klines is:
https://data.binance.vision/data/spot/daily/klines/{interval}/{symbol}/{symbol}-{interval}-{YYYY-MM-DD}.zip
For example, to download BTCUSDT 1-minute data for June 30, 2025:
curl -s "https://data.binance.vision/data/spot/daily/klines/BTCUSDT/1h/BTCUSDT-1h-2025-07-13.zip" -o BTCUSDT-1h-2025-07-13.zip
Monthly data uses a similar pattern under data/spot/monthly/klines/...
. (github.com)
Example: Fetching and Parsing with Python
Binance provides an official Python utility for downloading and parsing historical spot data (Klines, Trades, AggTrades) from their public dataset. The scripts are available in the binance-public-data/python repository.
Prerequisites
Install the required dependencies:
pip install -r requirements.txt
Or, if you only need the basics:
pip install requests tqdm
Clone the repository to access the scripts:
git clone https://github.com/binance/binance-public-data.git
cd binance-public-data/python
Downloading Data
The main script is download_data.py
, which supports downloading Klines, Trades, and AggTrades for any symbol, interval, and date range.
Usage
python download_data.py --help
Key options include:
-
--market-type
(spot, um, cm) -
--data-type
(klines, trades, aggTrades) -
--symbol
(e.g., BTCUSDT) -
--interval
(for klines, e.g., 1m, 1h) -
--start-date
and--end-date
(YYYY-MM-DD) -
--frequency
(daily, monthly) -
--out-dir
(output directory) -
--workers
(number of parallel downloads; controls how many files are fetched concurrently, regardless of whether you are downloading multiple symbols, intervals, or dates—higher values speed up bulk downloads but may trigger rate limiting)
Example Commands
Download daily 1h Klines for BTCUSDT from July to August 2025:
python download_data.py \
--market-type spot \
--data-type klines \
--symbol BTCUSDT \
--interval 1h \
--start-date 2025-07-01 \
--end-date 2025-08-31 \
--frequency daily \
--out-dir ./ohlcv_1h \
--workers 2
Download daily raw Trades for ETHUSDT for June 2025:
python download_data.py \
--market-type spot \
--data-type trades \
--symbol ETHUSDT \
--start-date 2025-06-01 \
--end-date 2025-06-30 \
--frequency daily \
--out-dir ./trades_ethusdt
Example: Smart Downloader and Merger for Binance OHLCV Data
The smart_binance_downloader.py
script provides an intelligent solution for downloading, extracting, and merging Binance OHLCV (Kline) data into consolidated CSV files. It's built on modified versions of download_kline.py
and utility.py
from the Binance Public Data repository with several enhancements for efficiency and reliability.
Key Features
- Smart Data Acquisition Strategy: Intelligently uses monthly downloads for historical periods and daily downloads for recent days not yet available in monthly archives
-
Incremental Updates: Maintains a single merged CSV file per
{symbol}_{interval}
combination - Delta Downloads: Analyzes existing data to identify and download only missing date ranges
- Rate Limit Handling: Implements exponential backoff for API rate limits with configurable parameters
- Auto-extraction: Unzips downloaded files and merges them into consolidated CSVs
Core Function: Rate Limit Handling
def download_with_backoff(download_func, rate_limit_sleep, max_backoff, *args, **kwargs):
sleep_time = rate_limit_sleep
while True:
try:
download_func(*args, **kwargs)
return
except Exception as e:
# Check for HTTP 429 or rate limit in error message
if "429" in str(e) or "rate limit" in str(e).lower():
print(f"Rate limited. Sleeping for {sleep_time} seconds...")
time.sleep(sleep_time)
sleep_time = min(sleep_time * 2, max_backoff)
else:
raise
This function wraps the original Binance download functions with retry logic that uses exponential backoff when encountering rate limits.
Command Line Interface
python smart_binance_downloader.py \
--symbol BTCUSDT \
--interval 1h \
--start-date 2023-01-01 \
--end-date 2023-02-28 \
--rate-limit-sleep 2 \
--max-backoff 32 \
--data-dir ./custom_data_folder
Implementation Notes
The script improves upon Binance's original tools by:
- Advanced Path Management: Creates required directories and handles file paths intelligently
- Data Deduplication: Tracks timestamps already in merged CSV to avoid redundant downloads
- Error Recovery: Graceful handling of network issues and rate limits
- File Organization: Creates a clean, organized directory structure for downloaded files
- Multi-format Support: Handles both daily and monthly download formats seamlessly
The underlying code leverages modified versions of Binance's download_monthly_klines
and download_daily_klines
functions, adapting them to work with a more intelligent file management system. This ensures you always maintain a single, up-to-date CSV file for each {symbol}_{interval}
pair, simplifying data management for backtesting and analysis.
The script uses an initial sleep (--rate-limit-sleep
) between requests and an exponential backoff (--max-backoff
) when encountering HTTP 429 to respect the CDN’s limits.
The complete code for the smart downloader, including enhancements and CLI usage, can be found in the pyVision/ai-invest repository under src/crypto_bot/smart_binance_downloader.py
.
Conclusion
Accessing and managing Binance’s historical OHLCV and trade-level data is essential for robust quantitative research, backtesting, and live trading in the crypto markets. By leveraging the public data repository, official scripts, and enhanced tools like smart_binance_downloader.py
, users can efficiently acquire, update, and maintain high-quality datasets tailored to their strategy requirements. Always validate your data sources, handle rate limits responsibly, and choose the appropriate data granularity for your use case to ensure reliable and reproducible results.
Top comments (0)