Python Pandas Crash Course: Analyze Data in 20 Minutes
Pandas is the most popular Python library for data analysis. Here's what you need to know.
Load and Inspect Data
import pandas as pd
df = pd.read_csv('sales.csv', parse_dates=['date'])
print(df.shape) # (rows, cols)
print(df.info()) # Column types, null counts
print(df.describe()) # Statistical summary
print(df.head(5)) # First 5 rows
Filter and Select
# Filter rows
high_sales = df[df['revenue'] > 1000]
q4 = df[df['date'].dt.quarter == 4]
# Select columns
subset = df[['product', 'revenue', 'date']]
# Multiple conditions
result = df[(df['revenue'] > 500) & (df['status'] == 'completed')]
Group and Aggregate
# Monthly sales summary
monthly = df.groupby(df['date'].dt.to_period('M')).agg({
'revenue': 'sum',
'orders': 'count',
'profit': 'mean'
}).reset_index()
# Top 10 products
top = (df.groupby('product')['revenue']
.sum()
.sort_values(ascending=False)
.head(10))
Clean Messy Data
# Handle missing values
df['price'].fillna(df['price'].median(), inplace=True)
df.dropna(subset=['customer_id'], inplace=True)
# Fix data types
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['date'] = pd.to_datetime(df['date'])
# Remove duplicates
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)
# Standardize strings
df['status'] = df['status'].str.lower().str.strip()
Export Results
# To Excel with multiple sheets
with pd.ExcelWriter('report.xlsx', engine='openpyxl') as writer:
monthly.to_excel(writer, sheet_name='Monthly', index=False)
top.to_excel(writer, sheet_name='Top Products')
df.to_csv('cleaned_data.csv', index=False)
Pandas handles 90% of real-world data tasks. These patterns are all you need to get started.
Save hours of manual work! I built a complete Python Business Automation Toolkit with ready-to-use scripts for invoicing, reporting, data pipelines, and more.
Get the Python Business Automation Toolkit ($9)
Templates for invoices, email reports, file organization, database queries, and 20+ more automations.
Top comments (0)