Ingesting,processing and analyzing data from external sources has become common in Software Development.This is mainly achieved via the use of APIs.In this guide we'll talk about the process of building a basic but powerful data ingestion script using Python.
We'll be obtaining cryptocurrency pair prices from the publicly available Binance API and filter only responses we're interested in.This will be achieved by utilizing two popular Python libraries: pandas for data manipulation and requests for handling HTTP communication.
The Complete Python Script
import pandas as pd
import requests
# Configuration: Define constants for maintainability
BASE_URL = 'https://api.binance.com'
TARGET_PAIRS = ['BTCUSDT','ETHBTC','ETHUSDT','SOLUSDT']
def get_latest_prices():
"""
Fetches, parses, and filters price data from the Binance API.
"""
# 1. Construct the full API endpoint URL
endpoint = f'{BASE_URL}/api/v3/ticker/price'
# 2. Execute the HTTP GET request
response = requests.get(endpoint)
# For production, add error handling: response.raise_for_status()
# 3. Deserialize the JSON response into a Python object
data = response.json()
# 4. Load the raw data into a pandas DataFrame
price_df = pd.DataFrame(data)
# 5. Filter the DataFrame using boolean masking
filtered_df = price_df[price_df['symbol'].isin(TARGET_PAIRS)]
print(filtered_df)
return filtered_df
# 6. Define the script's entry point
if __name__ == "__main__":
get_latest_prices()
Now lets break down the script step by step.
1. Configuration and Structure
BASE_URL = 'https://api.binance.com'
TARGET_PAIRS = ['BTCUSDT','ETHBTC','ETHUSDT','SOLUSDT']
We define BASE_URL and TARGET_PAIRS at the top to separate configuration and logic making code easier to read and update without touching core function.
2. Interface with API using requests
The requests library is utilized to handle HTTP transaction with the Binance API.
endpoint = f'{BASE_URL}/api/v3/ticker/price'
response = requests.get(endpoint)
The requests.get() packages the server response containing status codes, headers and data payload into a single response object stored in response variable.
3. Deserializing the JSON Payload
The response(payload) from most modern APIs is formatted as JSON(JavaScript Object Notation).
Even though this is text-based and readable it is not a Python native object.It therefore requires to be parsed and "deserialized".
data = response.json()
4. Structuring Data with pandas
The data variable is now a large list of dictionaries.
We use pandas to transform this raw data into optimized, tabular structure called a DataFrame.
price_df = pd.DataFrame(data)
A DataFrame is an in-memory two-dimensional table with labeled axes(rows and columns).
5. Filtering with Boolean Masking
filtered_df = price_df[price_df['symbol'].isin(TARGET_PAIRS)]
price_df['symbol']: First you select the symbol column of the DataFrame which returns a pandas *Series * object which is a single column of the DataFrame.
.isin(TARGET_PAIRS): You then call the .isin() method on this Series.The method performs a fast element check, returning a series of Boolean values.True means symbol in that row exists in TARGET_PAIRS list and vice versa.
price_df[...]: Finally you use this boolean Series as a mask to index the original price_df.
This is known as boolean masking.It evaluates the mask and returns a new DataFrame containing only rows where the mask value is True.
We have now successfully completed implementing a data ingestion pipeline.You can later add Robust Error handling and Automation e.g., using scheduler like cron to create a historical price log.
Top comments (0)