likhitha manikonda

Posted on Oct 25

Pandas DataFrame And Series

🐼Pandas is a powerful Python library used for data analysis and manipulation. It helps you work with structured data like tables (rows and columns), similar to Excel or SQL.

📦 Installing Pandas

To use Pandas, you first need to install it.

pip install pandas

Then, import it in your Python code:

import pandas as pd

We use pd as a shortcut name for Pandas.

To install Pandas using conda, especially if you're using the Anaconda distribution or Miniconda, follow these steps:

🛠️ Installing Pandas with Conda

Open your terminal or Anaconda Prompt and run:

conda install pandas

This command will:

Install the latest compatible version of Pandas.
Automatically handle dependencies (like NumPy).
Ensure compatibility with your current environment.

✅ Optional: Create a New Environment (Recommended)

Creating a new environment helps avoid conflicts between packages.

conda create --name pandas_env python=3.10
conda activate pandas_env
conda install pandas

📝 Explanation:

conda create --name pandas_env python=3.10: Creates a new environment named pandas_env with Python 3.10.
conda activate pandas_env: Activates the new environment.
conda install pandas: Installs Pandas in that environment.

📊 What is a DataFrame?

A DataFrame is like a table with rows and columns. Each column can have different types of data (numbers, text, dates, etc.) similar to Excel or SQL..

✅ Creating a DataFrame from a Dictionary

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Paris', 'London']
}

df = pd.DataFrame(data)
print(df)

Output:

     Name  Age      City
0   Alice   25  New York
1     Bob   30     Paris
2  Charlie   35    London

📝 Explanation:

We created a dictionary (data) with keys as column names.
Each key has a list of values.
pd.DataFrame(data) converts the dictionary into a table.

📈 What is a Series?

A Series is like a single column of data. It’s a one-dimensional array with labels (called index).

✅ Creating a Series from a List

import pandas as pd

data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

Output:

0    10
1    20
2    30
3    40
dtype: int64

📝 Explanation:

pd.Series(data) creates a Series from a list.
The numbers on the left (0, 1, 2, 3) are the default index.
You can customize the index too.

✅ Creating a Series with Custom Index

data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)

Output:

a    10
b    20
c    30
d    40
dtype: int64

📝 Explanation:

We added custom labels (a, b, c, d) as index.
Now you can access values using labels like series['b'].

🔍 Accessing Data

✅ Accessing Data in DataFrame

print(df['Name'])  # Access the 'Name' column
print(df.loc[1])   # Access row with index 1
print(df.iloc[2])  # Access row at position 2

sample dataframe input

     Name  Age      City
0   Alice   25  New York
1     Bob   30     Paris
2  Charlie   35    London

Output:

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Name     Bob
Age        30
City    Paris
Name: 1, dtype: object

Name     Charlie
Age            35
City       London
Name: 2, dtype: object

🔄 Modifying Data

✅ Adding a New Column

df['Country'] = ['USA', 'France', 'UK']
print(df)

Output:

     Name  Age      City Country
0   Alice   25  New York     USA
1     Bob   30     Paris  France
2  Charlie   35    London      UK

📁 Reading Data from CSV

You can load data from a CSV file:

df = pd.read_csv('data.csv')
print(df.head())  # Shows first 5 rows

📝 Explanation:

read_csv() loads data from a file.
head() shows the top rows.

📐 Summary Statistics

print(df.describe())

📝 Explanation:

describe() gives you quick stats like mean, min, max, etc.

🔍 Filtering Data

# Show rows where Age > 28
print(df[df['Age'] > 28])

Output:

     Name  Age   City Country
1     Bob   30  Paris  France
2  Charlie   35  London     UK

📌 Conclusion

Series = One column of data.
DataFrame = Table with rows and columns.
Pandas makes data analysis easy and powerful.

🧮Data Types and Conversion

Understanding data types is crucial in Pandas.

print(df.dtypes)  # Check data types of each column

# Convert column to a different type
df['Age'] = df['Age'].astype(float)
print(df.dtypes)

Output:

Name       object
Age         int64
City       object
dtype: object

Name       object
Age       float64
City       object
dtype: object

📝 Explanation: astype() is used to change the data type of a column. This is useful when preparing data for analysis.

🧹Handling Missing Data

Real-world data often has missing values.

import numpy as np

df.loc[1, 'Age'] = np.nan  # Add a missing value
print(df)

# Check for missing values
print(df.isnull())

# Fill missing values
df['Age'].fillna(0, inplace=True)
print(df)

Output:

     Name   Age      City
0   Alice  25.0  New York
1     Bob   NaN     Paris
2  Charlie 35.0    London

     Name    Age   City
0   False  False  False
1   False   True  False
2   False  False  False

     Name   Age      City
0   Alice  25.0  New York
1     Bob   0.0     Paris
2  Charlie 35.0    London

📝 Explanation: isnull() checks for missing values. fillna() replaces them with a default value (like 0).

📌Sorting Data

# Sort by Age
sorted_df = df.sort_values(by='Age')
print(sorted_df)

Output:

     Name   Age      City
1     Bob   0.0     Paris
0   Alice  25.0  New York
2  Charlie 35.0    London

📝 Explanation: sort_values() sorts the DataFrame by a column. You can also sort in descending order using ascending=False.

🔢Filtering with Multiple Conditions

# Filter rows where Age > 20 and City is 'London'
filtered = df[(df['Age'] > 20) & (df['City'] == 'London')]
print(filtered)

Output:

     Name   Age   City
2  Charlie  35.0  London

📝 Explanation: Use & for AND and | for OR when combining conditions.

📊GroupBy and Aggregation

# Group by City and calculate average Age
grouped = df.groupby('City')['Age'].mean()
print(grouped)

Output:

City
London      35.0
New York    25.0
Paris        0.0
Name: Age, dtype: float64

📝 Explanation: groupby() groups data by a column, and you can apply functions like mean(), sum(), etc.

📋Renaming Columns

df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)
print(df)

Output:

  Full Name   Age Location
0     Alice  25.0 New York
1       Bob   0.0    Paris
2   Charlie  35.0   London

📝 Explanation: rename() lets you change column names for clarity.

📎Resetting and Setting Index

df.set_index('Full Name', inplace=True)
print(df)

# Reset index
df.reset_index(inplace=True)
print(df)

Output:

            Age Location
Full Name                
Alice      25.0 New York
Bob         0.0    Paris
Charlie    35.0   London

  Full Name   Age Location
0     Alice  25.0 New York
1       Bob   0.0    Paris
2   Charlie  35.0   London

📝 Explanation: You can set a column as the index (row labels) and reset it back to default.

DEV Community

Pandas DataFrame And Series

📦 Installing Pandas

🛠️ Installing Pandas with Conda

✅ Optional: Create a New Environment (Recommended)

📊 What is a DataFrame?

✅ Creating a DataFrame from a Dictionary

📈 What is a Series?

✅ Creating a Series from a List

✅ Creating a Series with Custom Index

🔍 Accessing Data

✅ Accessing Data in DataFrame

🔄 Modifying Data

✅ Adding a New Column

📁 Reading Data from CSV

📐 Summary Statistics

🔍 Filtering Data

📌 Conclusion

🧮Data Types and Conversion

🧹Handling Missing Data

📌Sorting Data

🔢Filtering with Multiple Conditions

📊GroupBy and Aggregation

📋Renaming Columns

📎Resetting and Setting Index

Top comments (0)