DEV Community

Cover image for Extracting data from API using python
Dorin Jerotich
Dorin Jerotich

Posted on

Extracting data from API using python

In today’s data-driven world, APIs play a vital role in connecting services and applications. APIs allow access to real-time information from anywhere on the web.

We will learn how to extract data from an API using Python with the powerful requests library.


💡 What Is an API?

When we say "extracting data from an API," we're typically referring to making HTTP requests to an endpoint provided by a service, and getting back structured data .


1.📦 Getting Started: Installing the requests Library

pip install requests
Enter fullscreen mode Exit fullscreen mode

2.Python Script to fetch data

# importing the requests library

import requests

### Defining the url to fetch data

url = "https://jsonplaceholder.typicode.com/posts"

# Make the GET request to get data from the url  defined

response = requests.get(url)

# Check if the request was successful

if response.status_code == 200:
    data = response.json()  # parse the json data

else:
    print("Failed to retrieve data:", response.status_code)

Enter fullscreen mode Exit fullscreen mode

Understanding the Code
requests.get(url) – Sends an HTTP GET request to the specified URL.

response.status_code – Shows the result of the request (200 means OK).

response.json() – Parses the JSON response into a Python list/dictionary.

When we run requests to the API ,the status codes are essential in determining the result of our requests.

Here are some common http status code:

Status Code Meaning Description
200 OK The request was successful and the server returned the data.
400 Bad Request The server could not understand the request due to invalid syntax.
403 Forbidden You do not have permission to access the resource.
404 Not Found The requested resource does not exist.
500 Internal Server Error The server encountered an unexpected condition.

Handling Query Parameters

Many APIs require you to send additional information as query parameters in your URL. For example, let’s say we have an API that lets us search for users, and we can specify the user’s name as a query parameter, like this: https://api.example.com/users?name=john.
Here’s how you can do it with requests:

import requests
url = "https://api.example.com/users"
params = {'name': 'john'}
response = requests.get(url, params=params)
print(response.json())

Handling Errors

Things can go wrong; the server might be down. It's always good practise to include error handling:

try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()  # Raise exception for bad status codes
    data = response.json()
except requests.exceptions.RequestException as e:
    print("An error occurred:", e)
Enter fullscreen mode Exit fullscreen mode

Fetching Stock Data from Alpha Vantage

Alpha Vantage provides realtime and historical financial market data through a set of powerful data APIs and spreadsheets.
For this project we will use a free API to fetch the daily,weekly & monthly stock data and turn it into a clean, usable format.

We will define a python function fetch_data that connects to the API, requests stock data for a given symbol,formats the data into a neat pandas dataframe and returns the result.

import requests
import pandas as pd

API_KEY = 'your_api_key_here'  # Alpha Vantage API key
BASE_URL = 'https://www.alphavantage.co/query'  # Alpha Vantage API endpoint

def get_time_series(symbol, function):
    params = {
        'function': function,
        'symbol': symbol,
        'apikey': API_KEY,
        'datatype': 'json'
    }

    response = requests.get(BASE_URL, params=params)
    data = response.json()

    # Check for any API errors
    if 'Error Message' in data or 'Note' in data:
        raise ValueError(f"Error fetching data for {symbol}: {data.get('Error Message', data.get('Note', 'Unknown error'))}")

    # Determine the key for the time series data
    time_series_key = next(k for k in data.keys() if 'Time Series' in k)
    df = pd.DataFrame.from_dict(data[time_series_key], orient='index')
    df.index = pd.to_datetime(df.index)
    df = df.sort_index(ascending=False)  # Sort the data by most recent 
    return df
Enter fullscreen mode Exit fullscreen mode

🧑‍💻 Function Breakdown: get_time_series(symbol, function)

The function get_time_series(symbol, function) does the following:

  1. Takes Two Parameters:

    • symbol: The stock ticker (e.g., 'AAPL' for Apple).
    • function: The type of time series data (e.g., 'TIME_SERIES_DAILY', 'TIME_SERIES_WEEKLY').
  2. Sends a Request to Alpha Vantage:

    • Uses the provided symbol and function to request data from Alpha Vantage’s API.
  3. Extracts the Time Series Data:

    • Looks through the API response to find the key that holds the time series data (like "Time Series (Daily)").
  4. Converts to a pandas DataFrame:

    • The raw data from the API is turned into a DataFrame, which makes it easier to manipulate and analyze.
    • The DataFrame is sorted so that the most recent data comes first.
  5. Returns the Data:

    • The function returns the cleaned-up DataFrame containing the stock's time series data.

In essence, this function grabs stock data from Alpha Vantage, converts it to a readable format, and sorts it for easy use!

Here’s how you’d use the function:

apple_data = get_time_series('AAPL', 'TIME_SERIES_DAILY')
print(apple_data.head())
Enter fullscreen mode Exit fullscreen mode

This will show you Apple’s most recent daily stock data, with columns like open, high, low, close, and volume.

🧹 Conclusion

Extracting data from APIs using Python is an essential skill for developers, data scientists, and data engineers. With just a few lines of code using the requests library, you can tap into vast amounts of real-time information from across the internet.

Whether you're building something big or just experimenting, APIs open a gateway to endless possibilities.

Top comments (0)