DEV Community

Cover image for Pandas DataFrame And Series
likhitha manikonda
likhitha manikonda

Posted on

Pandas DataFrame And Series

๐ŸผPandas is a powerful Python library used for data analysis and manipulation. It helps you work with structured data like tables (rows and columns), similar to Excel or SQL.

๐Ÿ“ฆ Installing Pandas

To use Pandas, you first need to install it.

pip install pandas
Enter fullscreen mode Exit fullscreen mode

Then, import it in your Python code:

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

We use pd as a shortcut name for Pandas.

To install Pandas using conda, especially if you're using the Anaconda distribution or Miniconda, follow these steps:


๐Ÿ› ๏ธ Installing Pandas with Conda

Open your terminal or Anaconda Prompt and run:

conda install pandas
Enter fullscreen mode Exit fullscreen mode

This command will:

  • Install the latest compatible version of Pandas.
  • Automatically handle dependencies (like NumPy).
  • Ensure compatibility with your current environment.

โœ… Optional: Create a New Environment (Recommended)

Creating a new environment helps avoid conflicts between packages.

conda create --name pandas_env python=3.10
conda activate pandas_env
conda install pandas
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • conda create --name pandas_env python=3.10: Creates a new environment named pandas_env with Python 3.10.
  • conda activate pandas_env: Activates the new environment.
  • conda install pandas: Installs Pandas in that environment.

๐Ÿ“Š What is a DataFrame?

A DataFrame is like a table with rows and columns. Each column can have different types of data (numbers, text, dates, etc.) similar to Excel or SQL..

โœ… Creating a DataFrame from a Dictionary

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Paris', 'London']
}

df = pd.DataFrame(data)
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

     Name  Age      City
0   Alice   25  New York
1     Bob   30     Paris
2  Charlie   35    London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • We created a dictionary (data) with keys as column names.
  • Each key has a list of values.
  • pd.DataFrame(data) converts the dictionary into a table.

๐Ÿ“ˆ What is a Series?

A Series is like a single column of data. Itโ€™s a one-dimensional array with labels (called index).

โœ… Creating a Series from a List

import pandas as pd

data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
Enter fullscreen mode Exit fullscreen mode

Output:

0    10
1    20
2    30
3    40
dtype: int64
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • pd.Series(data) creates a Series from a list.
  • The numbers on the left (0, 1, 2, 3) are the default index.
  • You can customize the index too.

โœ… Creating a Series with Custom Index

data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)
Enter fullscreen mode Exit fullscreen mode

Output:

a    10
b    20
c    30
d    40
dtype: int64
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • We added custom labels (a, b, c, d) as index.
  • Now you can access values using labels like series['b'].

๐Ÿ” Accessing Data

โœ… Accessing Data in DataFrame

print(df['Name'])  # Access the 'Name' column
print(df.loc[1])   # Access row with index 1
print(df.iloc[2])  # Access row at position 2
Enter fullscreen mode Exit fullscreen mode

sample dataframe input

     Name  Age      City
0   Alice   25  New York
1     Bob   30     Paris
2  Charlie   35    London
Enter fullscreen mode Exit fullscreen mode

Output:

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Name     Bob
Age        30
City    Paris
Name: 1, dtype: object

Name     Charlie
Age            35
City       London
Name: 2, dtype: object
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”„ Modifying Data

โœ… Adding a New Column

df['Country'] = ['USA', 'France', 'UK']
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

     Name  Age      City Country
0   Alice   25  New York     USA
1     Bob   30     Paris  France
2  Charlie   35    London      UK
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Reading Data from CSV

You can load data from a CSV file:

df = pd.read_csv('data.csv')
print(df.head())  # Shows first 5 rows
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • read_csv() loads data from a file.
  • head() shows the top rows.

๐Ÿ“ Summary Statistics

print(df.describe())
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation:

  • describe() gives you quick stats like mean, min, max, etc.

๐Ÿ” Filtering Data

# Show rows where Age > 28
print(df[df['Age'] > 28])
Enter fullscreen mode Exit fullscreen mode

Output:

     Name  Age   City Country
1     Bob   30  Paris  France
2  Charlie   35  London     UK
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Œ Conclusion

  • Series = One column of data.
  • DataFrame = Table with rows and columns.
  • Pandas makes data analysis easy and powerful.

๐ŸงฎData Types and Conversion

Understanding data types is crucial in Pandas.

print(df.dtypes)  # Check data types of each column

# Convert column to a different type
df['Age'] = df['Age'].astype(float)
print(df.dtypes)
Enter fullscreen mode Exit fullscreen mode

Output:

Name       object
Age         int64
City       object
dtype: object

Name       object
Age       float64
City       object
dtype: object
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: astype() is used to change the data type of a column. This is useful when preparing data for analysis.


๐ŸงนHandling Missing Data

Real-world data often has missing values.

import numpy as np

df.loc[1, 'Age'] = np.nan  # Add a missing value
print(df)

# Check for missing values
print(df.isnull())

# Fill missing values
df['Age'].fillna(0, inplace=True)
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

     Name   Age      City
0   Alice  25.0  New York
1     Bob   NaN     Paris
2  Charlie 35.0    London

     Name    Age   City
0   False  False  False
1   False   True  False
2   False  False  False

     Name   Age      City
0   Alice  25.0  New York
1     Bob   0.0     Paris
2  Charlie 35.0    London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: isnull() checks for missing values. fillna() replaces them with a default value (like 0).


๐Ÿ“ŒSorting Data

# Sort by Age
sorted_df = df.sort_values(by='Age')
print(sorted_df)
Enter fullscreen mode Exit fullscreen mode

Output:

     Name   Age      City
1     Bob   0.0     Paris
0   Alice  25.0  New York
2  Charlie 35.0    London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: sort_values() sorts the DataFrame by a column. You can also sort in descending order using ascending=False.


๐Ÿ”ขFiltering with Multiple Conditions

# Filter rows where Age > 20 and City is 'London'
filtered = df[(df['Age'] > 20) & (df['City'] == 'London')]
print(filtered)
Enter fullscreen mode Exit fullscreen mode

Output:

     Name   Age   City
2  Charlie  35.0  London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: Use & for AND and | for OR when combining conditions.


๐Ÿ“ŠGroupBy and Aggregation

# Group by City and calculate average Age
grouped = df.groupby('City')['Age'].mean()
print(grouped)
Enter fullscreen mode Exit fullscreen mode

Output:

City
London      35.0
New York    25.0
Paris        0.0
Name: Age, dtype: float64
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: groupby() groups data by a column, and you can apply functions like mean(), sum(), etc.


๐Ÿ“‹Renaming Columns

df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

  Full Name   Age Location
0     Alice  25.0 New York
1       Bob   0.0    Paris
2   Charlie  35.0   London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: rename() lets you change column names for clarity.


๐Ÿ“ŽResetting and Setting Index

df.set_index('Full Name', inplace=True)
print(df)

# Reset index
df.reset_index(inplace=True)
print(df)
Enter fullscreen mode Exit fullscreen mode

Output:

            Age Location
Full Name                
Alice      25.0 New York
Bob         0.0    Paris
Charlie    35.0   London

  Full Name   Age Location
0     Alice  25.0 New York
1       Bob   0.0    Paris
2   Charlie  35.0   London
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Explanation: You can set a column as the index (row labels) and reset it back to default.


Top comments (0)