DEV Community: Terer K. Gilbert

Extracting Data from an API using Python (requests)

Terer K. Gilbert — Tue, 20 May 2025 16:18:27 +0000

Introduction
In today's data-driven world, APIs(Application Programming Interfaces) play a central role in allowing applications and services to interact and share information. Whether you are building a data dashboard, training machine learning models, or automating reporting process, chances are you will need to retrieve data from an API at some point.

This article provides a guide to extract data from an API using Python's requests library - a popular and de facto standard for making HTTP requests in python. We'll explore what APIs are, how HTTP requests work, and then walk through step-by-step process of sending a requests and handling the response.

What is an API?
An API acts as a bridge between two systems, allowing them to communicate. In the context of web APIs, it typically involves sending HTTP requests(like GET or POST) to a server and receiving responses in return. often in JSON format.
APIs are commonly used to:

Fetch real-time data.
Interact with third party platforms
Perform actions remotely.

Getting Started With Python's Requests
Unlike most python's libraries, requests library is not included in python's standard library. Therefore, for you to use requests library, you have to install it so that you can use in your code. You can either install it into your virtual environment(good practice) or you can also install it into your global environment if you want to use it through multiple projects.

To install requests, run:

pip install requests

Once pip has finished the process of installing requests, then you cab use it in your application.

Step-by-Step Guide: Extracting Data from an API
After installing requests into your environment, let's walk through how to extract data from RESTful API using requests.

1.Choose an API to work With
Before you can start extracting data, you need to choose API that provides the kind of information you are looking for. APIs are available in almost every field. Some APIs are open and free to use i.e you can send a request and gat data without any setup.
Other APIs require you to register and use an API key - a unique code that identifies you and your usage. This helps the API provider manage traffic and prevent abuse.

2.Import the requests Library
The process begin by importing the required library:

import requests

Make a GET Request HTTP methods, e.g GET and POST, determine which actions you are trying to perform when making an HTTP request. One of the most common HTTP methods is GET. The GET method indicates that one is trying to get or retrieve data from a certain resource. To make a GET request using requests, you can invoke requests.get(). For example, we can make a GET request to JSONPlaceholder by calling get() with its URL as follows.

url = '(https://jsonplaceholder.typicode.com/posts)'
requests.get(url)

At this point you have made a request.

The Response A response is a powerful object that is used to inspect the results of the request made. Here were store the return value in a variable so that we can look at what we get.

url = '(https://jsonplaceholder.typicode.com/posts)'
response = requests.get(url)

At this point we have stored the value of our request in a variable called response. We will use response to see the results of our GET request.

5.Check the Response Status
The first set of information you can gather from response is the status code. A status code informs you of the status of the request.
For example, a 200 OK status means that your request was successful, while a 404 NOT FOUND status means that the resource you were looking for wasn't found. There are other codes that can be displaced.
This can be accessed by using .status_code:
response.status_code
Sometimes we can make this to be part of our code so that we can automatically know whether our request is successful or not.

if response.status_code == 200:
    print("Request was successful!")
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")

6.Parse the JSON Response

After the request is successful, we can convert the response to JSON format using .json(), so that we can use it in our python application. Most of the APIs return data in JSON format. This is a lightweight, human readable format that python can easily handle easily.
To parse the JSON response, we use .json() method as follows:

data = response.json()

This converts the json string into a python data structure - usually a dictionary or a list of dictionaries. From this point we can print the data to inspect and further deal with the data.

Conclusion
Working with APIs using Python's requests library opens the door to vast amounts of dynamic, real-time data. To summarize:

Use requests.get() to fetch data from an API.
Check the response status using .status_code.
Parse the returned data using .json().

Introduction to Data Version Control (DVC)

Terer K. Gilbert — Wed, 29 Mar 2023 15:21:42 +0000

Data Version Control (DVC)

Data Version Control (DVC) is an open-source version control tool specifically designed for machine learning (ML) projects. It allows data scientists and ML engineers to efficiently track changes in their data, code, and models throughout the development process. In this article, we will discuss what DVC is, how it works, and why it is essential for ML projects.

What is Data Version Control (DVC)?
Data Version Control (DVC) is a tool that provides version control for data science and machine learning projects. It allows you to track the changes made to your data, code, and models over time, just like version control systems such as Git do for software development.

DVC provides a simple command-line interface that allows you to manage your data versioning and collaborate with your team. DVC is designed to work with Git, so it can integrate seamlessly with existing Git repositories.

How does DVC work?

DVC is based on a few core concepts that make it easy to use and understand. The first concept is data versioning. DVC tracks changes to your data by creating a version control system that stores all the changes made to your data. Each version is stored in a separate file, making it easy to compare different versions of your data.

The second concept is data pipelines. A data pipeline is a set of steps that transform raw data into a form that can be used by ML models. DVC allows you to create and manage data pipelines, making it easy to track changes to your data processing code and ensure that your models are trained on the correct data.

The third concept is model versioning. DVC allows you to track changes to your ML models by creating a version control system that stores all the changes made to your models. Each version is stored in a separate file, making it easy to compare different versions of your models.

Why is DVC important for ML projects?

Data Version Control (DVC) is an essential tool for machine learning projects for several reasons. First, it provides version control for your data, code, and models, making it easy to track changes and collaborate with your team. Second, it allows you to create and manage data pipelines, ensuring that your models are trained on the correct data. Third, it allows you to track changes to your ML models, making it easy to compare different versions and understand how they are performing.

In addition, DVC helps to improve the reproducibility of your ML experiments. By tracking changes to your data, code, and models, you can ensure that your results are reproducible, even if you make changes to your code or data.

Conclusion

Data Version Control (DVC) is a powerful tool that provides version control for machine learning projects. It allows you to track changes to your data, code, and models, create and manage data pipelines, and track changes to your ML models. By using DVC, you can improve the reproducibility of your experiments and collaborate more effectively with your team.

Getting Started With Sentiment Analysis

Terer K. Gilbert — Thu, 23 Mar 2023 07:27:48 +0000

Introduction to Sentiment Analysis

Sentiment analysis is a technique used to determine the emotional tone of a piece of text. It is commonly used in natural language processing (NLP) and is used in a variety of applications, including social media monitoring, customer feedback analysis, and brand reputation management.

In this article, we will walk you through the basics of sentiment analysis, including how to get started with your own sentiment analysis project.

Step 1: Choose a Programming Language and Framework

To begin your sentiment analysis project, you will first need to choose a programming language and a framework. Some of the most popular programming languages for sentiment analysis include Python, R, and Java.

Python is a popular choice for sentiment analysis due to its simplicity, ease of use, and the availability of many useful libraries such as NLTK, spaCy, and TextBlob. R is another popular language for sentiment analysis due to its strong data analysis capabilities and the availability of many useful libraries such as tm, tidytext, and syuzhet.

Once you have chosen your programming language, you will need to choose a framework or library to work with. There are many frameworks and libraries available for sentiment analysis, including NLTK, TextBlob, VADER, and spaCy. Each framework has its own strengths and weaknesses, so it's important to choose the one that best fits your project's needs.

Step 2: Collect Data

Once you have chosen your programming language and framework, you will need to collect data for your sentiment analysis project. There are many sources of data that can be used for sentiment analysis, including social media, customer reviews, news articles, and blogs.

It's important to ensure that your data is clean and well-formatted before you begin your analysis. This may involve removing duplicates, correcting spelling errors, and removing irrelevant or non-textual data.

Step 3: Preprocess the Data

Before you can begin your sentiment analysis, you will need to preprocess your data. This involves cleaning and transforming your data to make it easier to work with. Preprocessing techniques may include tokenization, stemming, and stopword removal.

Tokenization is the process of breaking your text data into individual words or phrases, known as tokens. Stemming is the process of reducing words to their base form, such as converting "running" to "run". Stopword removal involves removing common words such as "the", "a", and "an" from your text data, as they do not typically provide much meaning.

Step 4: Perform Sentiment Analysis

Once you have preprocessed your data, you can begin your sentiment analysis. There are many techniques and algorithms that can be used for sentiment analysis, including rule-based approaches, machine learning approaches, and hybrid approaches.

Rule-based approaches involve defining a set of rules or heuristics that can be used to classify text as positive, negative, or neutral. Machine learning approaches involve training a model on a labeled dataset of text and sentiment values, and using the model to predict sentiment values for new text data. Hybrid approaches combine elements of both rule-based and machine learning approaches.

Step 5: Evaluate and Refine Your Model

After you have performed your sentiment analysis, you will need to evaluate and refine your model. This may involve comparing your model's predicted sentiment values to known ground truth values, or using techniques such as cross-validation or holdout testing to assess the accuracy of your model.

If your model's accuracy is not satisfactory, you may need to refine your model by adjusting its parameters or using a different algorithm or approach.

Conclusion

Sentiment analysis is a powerful technique for analyzing the emotional tone of text data. By following the steps outlined in this article

Essential SQL Commands For Data Science.

Terer K. Gilbert — Tue, 14 Mar 2023 06:03:52 +0000

Structured Query Language (SQL) is a programming language used to manage and manipulate data stored in relational database management systems (RDBMS). As a data scientist, mastering SQL is an essential skill to extract, transform and load data from databases. In this article, we'll cover some of the essential SQL commands for data science.

SELECT
The SELECT command is used to retrieve data from one or more tables. It is the most commonly used command in SQL. The basic syntax for SELECT command is:
SELECT column1, column2, …
FROM table_name;

For example, to retrieve all the data from a table called "customers," you would use the following command:

           SELECT *
           FROM customers;

WHERE
The WHERE command is used to filter data based on certain conditions. The basic syntax for WHERE command is:
SELECT column1, column2, …
FROM table_name
WHERE condition;

For example, to retrieve all the data from a table called "customers" where the country is 'USA,' you would use the following command:

          SELECT *
          FROM customers
          WHERE country = 'USA';

GROUP BY
The GROUP BY command is used to group the result set based on one or more columns. The basic syntax for GROUP BY command is:
SELECT column1, column2, …,
FROM table_name
GROUP BY column1, column2, …;

For example, to retrieve the count of customers by country from a table called "customers," you would use the following command:

         SELECT country, COUNT(*)
         FROM customers
         GROUP BY country;

JOIN
The JOIN command is used to combine two or more tables based on a related column. The basic syntax for JOIN command is:
SELECT column1, column2, …
FROM table1
JOIN table2
ON table1.column = table2.column;

For example, to retrieve the customer information along with their order information from two tables called "customers" and "orders" where the common column is "customer_id," you would use the following command:

            SELECT *
            FROM customers
            JOIN orders
            ON customers.customer_id = orders.customer_id;

ORDER BY
The ORDER BY command is used to sort the result set in ascending or descending order based on one or more columns. The basic syntax for ORDER BY command is:
SELECT column1, column2, …
FROM table_name
ORDER BY column1 ASC|DESC, column2 ASC|DESC, …;

For example, to retrieve the customer information from a table called "customers" sorted by the customer's name in ascending order, you would use the following command:

              SELECT *
              FROM customers
              ORDER BY customer_name ASC;

LIMIT
The LIMIT command is used to limit the number of rows returned by the SELECT command. The basic syntax for LIMIT command is:
SELECT column1, column2, …
FROM table_name
LIMIT number_of_rows;

For example, to retrieve the top 10 customers from a table called "customers," you would use the following command:

            SELECT *
            FROM customers

LIMIT 10;

In conclusion, SQL is a powerful tool for data manipulation and is a must-have skill for data scientists. Understanding and mastering these essential SQL commands will allow data scientists to effectively retrieve and analyze data from relational databases

Exploratory Data Analysis Ultimate Guide

Terer K. Gilbert — Sat, 25 Feb 2023 15:43:20 +0000

Exploratory Data Analysis(EDA) is one of the most important initial actions during data analysis process. This step helps one to understand the data that the have and make initial decisions that will make the process more easier and enable one to reach their specific goal.

EDA is applied so that you can investigate the data for any anomalies which may affect your data analysis process. It will give the analyst the overview of the data; distribution, null values and many more.

The basic EDA process involve the following general steps;

1.Loading data
2.Viewing data
3.Cleaning data
4.Analyzing data

1.LOADING DATA
This is an important step during EDA process. In this step, you find and upload your dataset into the platform you are going to perform the EDA. In this case python will be used as an example.
#import libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as pt

          #define working directory
          os.chdir("C:/users/~~LENOVO/Desktop/bootcamp~~")

         #import dataset
         salary=pd.read_csv("ITSalarySurveyEU2020.csv")

In the above description, the libraries that will be used in the process have been defined and uploaded. As well, the location of the dataset is defined and then it has been uploaded successfully. This step will bring the dataset into the platform so that it can be accessed and EDA is performed.

2.VIEWING DATA
In this step, you try to understand the overview of your dataset. This may include:
The size of your dataset in terms of number of rows and columns.
#data info
*salary.info()

RangeIndex: 1253 entries, 0 to 1252
Data columns (total 23 columns)
You may also check the first columns and rows of the dataset.
#top five obs
*salary.head()
dataset. This will enable your to see the overview, what
the dataset contains.>
You may also check the last rows and columns of your dataset.
#bottom five obs
*salary.tail()
your dataset.>

3.CLEANING DATA
This step involves finding and cleaning any anomalies in your dataset. This may include finding duplicate data, missing data and nulls in the dataset.
Finding Duplicate data.
salary.duplicated().sum()

If you have duplicated data in your dataset, you can decide to delete the duplicated data as follows;
salary.drop.duplicate(keep='first')
4.ANALYZING DATA
Once the dataset have been cleaned, its time to understand your dataset well. This involve exploring the dataset. Exploration may include the following:
You may decide to explore a specify column in your dataset to look at the amount of data in that column, mean, standard deviation, datatype and even maximum.
Example.
salary.Age.describe()
count 1226.000000
mean 32.509788
std 5.663804
min 20.000000
25% 29.000000
50% 32.000000
75% 35.000000
max 69.000000
Name: Age, dtype: float64

          *salary.Gender.describe()*

count 1243
unique 3
top Male
freq 1049
Name: Gender, dtype: object

         *salary.City.describe()*

count 1253
unique 119
top Berlin
freq 681
Name: City, dtype: object

After exploring the data, you can also can also plot graphs about the data. This will help you see if there is any outlier in your dataset.
Example.
salary.Age.plot()

This is therefore the easiest way to undertake EDA in a dataset.

SQL101:INTRODUCTION TO SQL FOR DATA ANALYSIS

Terer K. Gilbert — Sun, 19 Feb 2023 14:57:01 +0000

SQL stands for Structured Query Language. It is a language used in programming and used to access and manipulate data in databases. SQL was developed in 1970s, but over time it has become one of the most used query language
Why SQL?
The reason why SQL has become one of the most common language in programming and accessing data in databases are as follows;
1.SQL is standard and has been well documented. Therefore, people can easily learn the language.
2.SQL can as well analyze data of any size, be it small or large stacks of data, SQL can easily analyze them.
3.The language is also simple and non-procedural.

Some of the basic commands in SQL are as follows
SELECT this command extract data from a database.

DELETE this command delete data from a database

UPDATE this command update data in a database

INSERT INTO this command insert new data into a database

ALTER this command is used to modify a table.

SQL SYNTAX
Syntax is the structure of statement in computer language. Since SQL is a computer language, the are specify sets of rules that are followed when writing it. They include:
1.SQL keywords can either be written in uppercase or lowercase, it is not case sensitive.
2.SQL statement can be written in a single line or multiple lines.

SQL can be used to access and manipulate data in all relational databases which include;

-MySQL
-Oracle
-PostgreSQL
-MSSQL .etc

Therefore, SQL is a language that one should know in order to be successful as a data analyst, data scientist and even data engineer and many other careers.