๐ผPandas is a powerful Python library used for data analysis and manipulation. It helps you work with structured data like tables (rows and columns), similar to Excel or SQL.
๐ฆ Installing Pandas
To use Pandas, you first need to install it.
pip install pandas
Then, import it in your Python code:
import pandas as pd
We use pd as a shortcut name for Pandas.
To install Pandas using conda, especially if you're using the Anaconda distribution or Miniconda, follow these steps:
๐ ๏ธ Installing Pandas with Conda
Open your terminal or Anaconda Prompt and run:
conda install pandas
This command will:
- Install the latest compatible version of Pandas.
- Automatically handle dependencies (like NumPy).
- Ensure compatibility with your current environment.
โ Optional: Create a New Environment (Recommended)
Creating a new environment helps avoid conflicts between packages.
conda create --name pandas_env python=3.10
conda activate pandas_env
conda install pandas
๐ Explanation:
-
conda create --name pandas_env python=3.10: Creates a new environment namedpandas_envwith Python 3.10. -
conda activate pandas_env: Activates the new environment. -
conda install pandas: Installs Pandas in that environment.
๐ What is a DataFrame?
A DataFrame is like a table with rows and columns. Each column can have different types of data (numbers, text, dates, etc.) similar to Excel or SQL..
โ Creating a DataFrame from a Dictionary
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Paris
2 Charlie 35 London
๐ Explanation:
- We created a dictionary (data) with keys as column names.
- Each key has a list of values.
-
pd.DataFrame(data)converts the dictionary into a table.
๐ What is a Series?
A Series is like a single column of data. Itโs a one-dimensional array with labels (called index).
โ Creating a Series from a List
import pandas as pd
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
Output:
0 10
1 20
2 30
3 40
dtype: int64
๐ Explanation:
-
pd.Series(data)creates a Series from a list. - The numbers on the left (0, 1, 2, 3) are the default index.
- You can customize the index too.
โ Creating a Series with Custom Index
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)
Output:
a 10
b 20
c 30
d 40
dtype: int64
๐ Explanation:
- We added custom labels (
a,b,c,d) as index. - Now you can access values using labels like
series['b'].
๐ Accessing Data
โ Accessing Data in DataFrame
print(df['Name']) # Access the 'Name' column
print(df.loc[1]) # Access row with index 1
print(df.iloc[2]) # Access row at position 2
sample dataframe input
Name Age City
0 Alice 25 New York
1 Bob 30 Paris
2 Charlie 35 London
Output:
0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object
Name Bob
Age 30
City Paris
Name: 1, dtype: object
Name Charlie
Age 35
City London
Name: 2, dtype: object
๐ Modifying Data
โ Adding a New Column
df['Country'] = ['USA', 'France', 'UK']
print(df)
Output:
Name Age City Country
0 Alice 25 New York USA
1 Bob 30 Paris France
2 Charlie 35 London UK
๐ Reading Data from CSV
You can load data from a CSV file:
df = pd.read_csv('data.csv')
print(df.head()) # Shows first 5 rows
๐ Explanation:
-
read_csv()loads data from a file. -
head()shows the top rows.
๐ Summary Statistics
print(df.describe())
๐ Explanation:
-
describe()gives you quick stats like mean, min, max, etc.
๐ Filtering Data
# Show rows where Age > 28
print(df[df['Age'] > 28])
Output:
Name Age City Country
1 Bob 30 Paris France
2 Charlie 35 London UK
๐ Conclusion
- Series = One column of data.
- DataFrame = Table with rows and columns.
- Pandas makes data analysis easy and powerful.
๐งฎData Types and Conversion
Understanding data types is crucial in Pandas.
print(df.dtypes) # Check data types of each column
# Convert column to a different type
df['Age'] = df['Age'].astype(float)
print(df.dtypes)
Output:
Name object
Age int64
City object
dtype: object
Name object
Age float64
City object
dtype: object
๐ Explanation: astype() is used to change the data type of a column. This is useful when preparing data for analysis.
๐งนHandling Missing Data
Real-world data often has missing values.
import numpy as np
df.loc[1, 'Age'] = np.nan # Add a missing value
print(df)
# Check for missing values
print(df.isnull())
# Fill missing values
df['Age'].fillna(0, inplace=True)
print(df)
Output:
Name Age City
0 Alice 25.0 New York
1 Bob NaN Paris
2 Charlie 35.0 London
Name Age City
0 False False False
1 False True False
2 False False False
Name Age City
0 Alice 25.0 New York
1 Bob 0.0 Paris
2 Charlie 35.0 London
๐ Explanation: isnull() checks for missing values. fillna() replaces them with a default value (like 0).
๐Sorting Data
# Sort by Age
sorted_df = df.sort_values(by='Age')
print(sorted_df)
Output:
Name Age City
1 Bob 0.0 Paris
0 Alice 25.0 New York
2 Charlie 35.0 London
๐ Explanation: sort_values() sorts the DataFrame by a column. You can also sort in descending order using ascending=False.
๐ขFiltering with Multiple Conditions
# Filter rows where Age > 20 and City is 'London'
filtered = df[(df['Age'] > 20) & (df['City'] == 'London')]
print(filtered)
Output:
Name Age City
2 Charlie 35.0 London
๐ Explanation: Use & for AND and | for OR when combining conditions.
๐GroupBy and Aggregation
# Group by City and calculate average Age
grouped = df.groupby('City')['Age'].mean()
print(grouped)
Output:
City
London 35.0
New York 25.0
Paris 0.0
Name: Age, dtype: float64
๐ Explanation: groupby() groups data by a column, and you can apply functions like mean(), sum(), etc.
๐Renaming Columns
df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)
print(df)
Output:
Full Name Age Location
0 Alice 25.0 New York
1 Bob 0.0 Paris
2 Charlie 35.0 London
๐ Explanation: rename() lets you change column names for clarity.
๐Resetting and Setting Index
df.set_index('Full Name', inplace=True)
print(df)
# Reset index
df.reset_index(inplace=True)
print(df)
Output:
Age Location
Full Name
Alice 25.0 New York
Bob 0.0 Paris
Charlie 35.0 London
Full Name Age Location
0 Alice 25.0 New York
1 Bob 0.0 Paris
2 Charlie 35.0 London
๐ Explanation: You can set a column as the index (row labels) and reset it back to default.
Top comments (0)