Pandas is an open-source library that helps you analyze and manipulate data.
You should note that Pandas is a tool you will use if you’re getting into machine learning and data science.
In this article, you will learn how to use the Pandas commands in Jupyter Notebook for data analysis and manipulation.
Everything I share in this blog post is my interpretation of my journey into being a data scientist and machine learning expert using the tools that will make me proficient.
Review the code for this project.
Why Pandas?
The following is
why you should consider using Pandas:
- Simple to use: Like using functions to transform your data in a way that makes it usable
- Integrated with many other data science and ML Python tools
- It helps you get your data ready for machine learning
Installing and using Pandas
Using an environment like Conda will make Pandas and other packages available. Check this resource to get your computer ready with Conda.
In my previous post, I learned about the introduction of machine learning.
Let’s begin.
How to Import Pandas
To get started using Pandas, first import it in your jupyter notebook using the command:
import pandas as pd
pd
: is represented as an alias of the package pandas, making sure you have all the functionality to use
To confirm Pandas availability, check its version:
# print the version
print(f'Pandas version: {pd.__version__}')
Pandas version: 2.1.1
To know more about pandas and read its documentation, type:
# pandas documentation
pd?
Data Types
Pandas have two main data types:
- Series - a 1-dimensional column of data
- DataFrame - a 2-dimensional table of data with rows and columns, which is the most common
Let’s create some data using these data types.
Series
You can create a Series
using pd.Series()
and passing in a Python list.
# Creating a series of the primary colors
colors = pd.Series(["Red", "Yellow", "Blue"])
colors
0 Red
1 Yellow
2 Blue
dtype: object
# Creating a Series of branded cars
cars = pd.Series(["Mercedes", "Toyota", "Dodge"])
cars
DataFrame: Remember that a Python dictionary is the component when using the pd.DataFrame()
.
# Creating a DataFrame of the cars and colors
car_data = pd.DataFrame({"Car type": cars, "Color": colors})
car_data
The above command combines the created Series data types into the DataFrame type.
Car type Color
0 Mercedes Red
1 Toyota Yellow
2 Dodge Blue
Note: You are not limited to using only text; you can use any data type in your DataFrame, like integers, floats, dates, and more.
Importing Data
In a work environment, you will import data as a .csv
(comma-separated value), a spreadsheet file, or something similar, such as an SQL database.
Pandas allow for data import using the functions pd.read_csv()
and pd.read_excel()
for Microsoft Excel files.
Download the car sales csv data and save it in the root directory of your working folder.
# import the car sales data
car_sales = pd.read_csv('car_sales.csv')
car_sales
Note: the
read_csv()
also reads data via a URL.
Anatomy of a DataFrame
As shown above, every row in a data frame starts from index 0. The row has an axis of 0 while the column has an axis of 1, which can be instrumental when you want to column from the table using the .drop()
. Be careful when performing this action. Each row value in the table is known as the data.
Alternatives to Pandas
These two tools are worth checking out.
- Polars: DataFrame built by developers for the new era that is compatible with Python and Rust.
- Ibis: The portable Python dataframe library
Conclusion
This article showed you the basic commands for using Pandas. To learn more about the possibilities of Pandas, check out this repository with the other code samples to get familiar with using this tool for your data.
Top comments (2)
If I were to initiate my Data Science endeavors today, I would opt for Polars rather than Pandas, given its superior speed and congruence with contemporary Python practices.
In the same vein, Ibis stands out as a preferable alternative to Pandas. Nonetheless, I persist with Pandas, as my comfort and proficiency with it are deeply ingrained.
Hey Prayson,
I am learning so much from your contributions. This is duly noted, and would research more on the tools you have mentioned.
Thank you so much for your support.