DEV Community

Austin Oketch
Austin Oketch

Posted on

A Beginner's Guide to Python,APIs and Pandas

Ingesting,processing and analyzing data from external sources has become common in Software Development.This is mainly achieved via the use of APIs.In this guide we'll talk about the process of building a basic but powerful data ingestion script using Python.

We'll be obtaining cryptocurrency pair prices from the publicly available Binance API and filter only responses we're interested in.This will be achieved by utilizing two popular Python libraries: pandas for data manipulation and requests for handling HTTP communication.

The Complete Python Script

import pandas as pd
import requests

# Configuration: Define constants for maintainability
BASE_URL = 'https://api.binance.com'
TARGET_PAIRS = ['BTCUSDT','ETHBTC','ETHUSDT','SOLUSDT']

def get_latest_prices():
    """
    Fetches, parses, and filters price data from the Binance API.
    """
    # 1. Construct the full API endpoint URL
    endpoint = f'{BASE_URL}/api/v3/ticker/price'

    # 2. Execute the HTTP GET request
    response = requests.get(endpoint)
    # For production, add error handling: response.raise_for_status()

    # 3. Deserialize the JSON response into a Python object
    data = response.json()

    # 4. Load the raw data into a pandas DataFrame
    price_df = pd.DataFrame(data)

    # 5. Filter the DataFrame using boolean masking
    filtered_df = price_df[price_df['symbol'].isin(TARGET_PAIRS)]

    print(filtered_df)
    return filtered_df

# 6. Define the script's entry point
if __name__ == "__main__":
    get_latest_prices()
Enter fullscreen mode Exit fullscreen mode

Now lets break down the script step by step.

1. Configuration and Structure

BASE_URL = 'https://api.binance.com'
TARGET_PAIRS = ['BTCUSDT','ETHBTC','ETHUSDT','SOLUSDT']
Enter fullscreen mode Exit fullscreen mode

We define BASE_URL and TARGET_PAIRS at the top to separate configuration and logic making code easier to read and update without touching core function.

2. Interface with API using requests

The requests library is utilized to handle HTTP transaction with the Binance API.

endpoint = f'{BASE_URL}/api/v3/ticker/price'
response = requests.get(endpoint)
Enter fullscreen mode Exit fullscreen mode

The requests.get() packages the server response containing status codes, headers and data payload into a single response object stored in response variable.

3. Deserializing the JSON Payload

The response(payload) from most modern APIs is formatted as JSON(JavaScript Object Notation).

Even though this is text-based and readable it is not a Python native object.It therefore requires to be parsed and "deserialized".

data = response.json()
Enter fullscreen mode Exit fullscreen mode

4. Structuring Data with pandas

The data variable is now a large list of dictionaries.
We use pandas to transform this raw data into optimized, tabular structure called a DataFrame.

price_df = pd.DataFrame(data)
Enter fullscreen mode Exit fullscreen mode

A DataFrame is an in-memory two-dimensional table with labeled axes(rows and columns).

5. Filtering with Boolean Masking

filtered_df = price_df[price_df['symbol'].isin(TARGET_PAIRS)]
Enter fullscreen mode Exit fullscreen mode
  • price_df['symbol']: First you select the symbol column of the DataFrame which returns a pandas *Series * object which is a single column of the DataFrame.

  • .isin(TARGET_PAIRS): You then call the .isin() method on this Series.The method performs a fast element check, returning a series of Boolean values.True means symbol in that row exists in TARGET_PAIRS list and vice versa.

  • price_df[...]: Finally you use this boolean Series as a mask to index the original price_df.

This is known as boolean masking.It evaluates the mask and returns a new DataFrame containing only rows where the mask value is True.

We have now successfully completed implementing a data ingestion pipeline.You can later add Robust Error handling and Automation e.g., using scheduler like cron to create a historical price log.

Top comments (0)