DEV Community

Cover image for Introduction to Pandas:Python Pandas library for data science(Part 1)
justkmike
justkmike

Posted on • Edited on

Introduction to Pandas:Python Pandas library for data science(Part 1)

What is Pandas?
Pandas is a Python library designed for data manipulation and analysis. It simplifies various data-related tasks, making them more efficient and accessible. Whether you're working with datasets, performing data cleaning, exploration, or statistical analysis, Pandas provides the tools to help you achieve your goals.

Why Use Pandas?
Pandas offer numerous advantages for data scientists and analysts:

  • Data Analysis: Pandas simplifies data analysis by providing powerful data structures and functions.
  • Data Cleaning: It offers tools for cleaning and preprocessing data, such as handling missing values and outliers.
  • Data Manipulation: Pandas allows you to reshape and transform data, making it suitable for your specific analysis needs.
  • Readability: It enhances data readability through structured data frames and series.
  • Simplified Workflow: Pandas streamlines data-related tasks, saving time and effort in data projects. For more in-depth information, you can explore resources like W3Schools,geeksforgeeks for more info.

How to Install Pandas:
You can easily install Pandas using the Python package manager, pip. Open your command prompt or terminal and run the following command:

pip install pandas
Enter fullscreen mode Exit fullscreen mode

Usage:
Once Pandas is installed, you can import it into your Python script or notebook using the alias 'pd':

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

You can check the installed Pandas version with:

print(pd.__version__)
Enter fullscreen mode Exit fullscreen mode

Pandas Series:
A Pandas Series is a one-dimensional data structure representing a single column in a data frame. It is homogenous, meaning it contains elements of the same data type, and each element has a label (index). Here's an example:

import pandas as pd

my_list = [30, 20, 23, 34]
my_series = pd.Series(my_list)
print(my_series)
Enter fullscreen mode Exit fullscreen mode

Pandas Labels:
By default, Pandas assigns labels indexed from 0 to n-1, where n is the length of the series. However, you can customize the index as you prefer. Here's an example:

custom_index = ['a', 'b', 'c', 'd', 'e']
my_new_series = pd.Series(my_list, index=custom_index)
print(my_new_series)
Enter fullscreen mode Exit fullscreen mode

You can access a series item using its label, like this:

print(my_new_series['a'])
Enter fullscreen mode Exit fullscreen mode

Key-Value Objects in Pandas Series:
If you have a dictionary with key-value pairs, you can transform it into a Pandas Series. The keys will become the labels for the series.

DataFrames:
Pandas DataFrames are multidimensional tables with rows and columns. They can be thought of as collections of Pandas Series, and they are commonly used for structured data. Here's an example of creating a DataFrame from a dictionary:

my_dict = {
    "name": ["Mike", "John"],
    "age": [12, 23]
}
new_df = pd.DataFrame(my_dict)
Enter fullscreen mode Exit fullscreen mode

You can access specific rows using .loc[]. For example:

new_df.loc["row_index"]
Enter fullscreen mode Exit fullscreen mode

To access multiple rows, you can pass a list of indices:

new_df.loc[[0, 1]]
Enter fullscreen mode Exit fullscreen mode

You can also specify named indexes when creating the DataFrame by providing a list of indexes to the index argument.

When you need to load data from sources like CSV files, Excel files, or JSON files into a DataFrame, Pandas provides built-in functions like pd.read_csv(), pd.read_excel(), and pd.read_json() to simplify the process.

I will be showing you how to use the functions in the next article feel free to go ahead and do some research on your own, see you on the next one.

Top comments (0)