DEV Community

Diana Wachenje
Diana Wachenje

Posted on

# A BEGINNER'S JOURNEY INTO PANDAS

Introduction

Pandas is an open-source library used in python for data cleaning, manipulation, analysis and visualization. It allows users to work with structured data that is tables, spreadsheets and databases efficiently.
The name Pandas came from "Panel Data". It is a term used in statistics and economics.

Importance of Pandas

Organizations produce huge amounts of data every day. Pandas helps professionals to;

. Clean, messy data
. Analyze trends and patterns
. Calculate statistics
. Merge datasets
. Prepare data for machine learning and visualization

Main Data Structures in Pandas

1). Series
A series is like a column in a table. It is a one-dimensional array holding data of any type.

Example:

Python

import pandas as pd
age=pd. Series ([15, 25, 35,40])
print(ages)

Output
Plain text
0 15
1 25
2 35
3 40

2). Data Frame
A Data Frame is a two-dimensional data structure, like two-dimensional array, a table with rows and columns.

Example:

Python

import pandas as pd

lecturers: {

"Name": ["Harun", "Bridgit", "Navas", "Emmanuel"]
"Age": [30,24, 22, 25]
}
df=pd.DataFrame(lecturers)
print(df)

Output

Plain text
Name Age
0 Harun 30
1 Bridgit 24
2 Navas 22
3 Emmanuel 25

Real-Life Uses of Pandas

1). Education
Colleges and schools are able to analyze student performances.

Example:
Python
top_students=df[df["score"]>=80]
print(top_students)

2). Data Science and Machine Learning
Data Scientists are able to clean data, analyze and prepare the data before building new models.

Example:

Python
df.dropna(inplace=True)
This removes missing values from the dataset.

3). Healthcare
Hospitals are able to analyze the number of patients records and also improve the services.

Example:
Python
average_age=patients["age"]mean()
print(average_age)

Healthcare professionals are able to study diseases patterns and patient demographics.

4). Business and Sales Analysis
Companies are able to analyze sales performance and make decisions.

Example:
Python
sales=pd.read_csv("sales.csv")
total_sales=sales["amount"].sum()
print(total_sales)

This helps business to determine:
. Best-selling products
. Total revenue
. Monthly sales trends

Common Errors in Pandas

While working with Pandas we often make these mistakes;

1). Forgetting to import pandas
Python
df=pd.read_csv("data.csv")
Name error: Name 'pd' is not defined

2). Forgetting Parenthesis with Methods
3). Confusing loc and iloc
. Loc uses labels (names)
. Iloc uses positions(numbers)

# Conclusion

Pandas is one of the most powerful libraries in Python with data.
It is commonly used in healthcare, finance, business, education and data science to clean, analyze and interpret data for better decision making. It is very essential for someone learning data science and data analysis.

Top comments (0)