DEV Community

Brad
Brad

Posted on

Python Pandas Crash Course: Analyze Data in 20 Minutes

Python Pandas Crash Course: Analyze Data in 20 Minutes

Pandas is the most popular Python library for data analysis. Here's what you need to know.

Load and Inspect Data

import pandas as pd

df = pd.read_csv('sales.csv', parse_dates=['date'])
print(df.shape)        # (rows, cols)
print(df.info())       # Column types, null counts
print(df.describe())   # Statistical summary
print(df.head(5))      # First 5 rows
Enter fullscreen mode Exit fullscreen mode

Filter and Select

# Filter rows
high_sales = df[df['revenue'] > 1000]
q4 = df[df['date'].dt.quarter == 4]

# Select columns
subset = df[['product', 'revenue', 'date']]

# Multiple conditions
result = df[(df['revenue'] > 500) & (df['status'] == 'completed')]
Enter fullscreen mode Exit fullscreen mode

Group and Aggregate

# Monthly sales summary
monthly = df.groupby(df['date'].dt.to_period('M')).agg({
    'revenue': 'sum',
    'orders': 'count',
    'profit': 'mean'
}).reset_index()

# Top 10 products
top = (df.groupby('product')['revenue']
    .sum()
    .sort_values(ascending=False)
    .head(10))
Enter fullscreen mode Exit fullscreen mode

Clean Messy Data

# Handle missing values
df['price'].fillna(df['price'].median(), inplace=True)
df.dropna(subset=['customer_id'], inplace=True)

# Fix data types
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['date'] = pd.to_datetime(df['date'])

# Remove duplicates
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)

# Standardize strings
df['status'] = df['status'].str.lower().str.strip()
Enter fullscreen mode Exit fullscreen mode

Export Results

# To Excel with multiple sheets
with pd.ExcelWriter('report.xlsx', engine='openpyxl') as writer:
    monthly.to_excel(writer, sheet_name='Monthly', index=False)
    top.to_excel(writer, sheet_name='Top Products')

df.to_csv('cleaned_data.csv', index=False)
Enter fullscreen mode Exit fullscreen mode

Pandas handles 90% of real-world data tasks. These patterns are all you need to get started.


Save hours of manual work! I built a complete Python Business Automation Toolkit with ready-to-use scripts for invoicing, reporting, data pipelines, and more.

Get the Python Business Automation Toolkit ($9)

Templates for invoices, email reports, file organization, database queries, and 20+ more automations.

Top comments (0)