A focused 2-day roadmap to master Seaborn for data analytics using the tips dataset.
This guide includes setup, explanations, code examples, pitfalls, checkpoints, and a mini-project.
By the end, you’ll have job-ready visualization skills.
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Seaborn Roadmap for Data Analytics (Using tips)
Goal: Master Seaborn’s essentials in 2 days (or one power-day).
This roadmap covers every necessary topic with explanations, code snippets, and practice tasks—focused purely on practical analytics.
Key Outcomes:
- Dataset:
sns.load_dataset("tips") - Focus: EDA • Storytelling • Clean visuals
- Result: Job-ready plotting skills
Tip: Don’t memorize. For each topic:
- Run the example
- Tweak 2–3 parameters
- Write one insight in plain English
Requirements:
- Python ≥ 3.9
- Libraries: pandas, numpy, matplotlib, seaborn
- IDE: Jupyter/Colab or any Python IDE
0) Quick Setup
pip install seaborn matplotlib pandas numpy
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set_theme(style="whitegrid", context="notebook")
tips = sns.load_dataset("tips")
tips.head()
Dataset Columns:
total_bill, tip, sex, smoker, day, time, size
📅 Day 1 — Foundations & Core EDA
Goal: Understand Seaborn’s API, explore distributions, compare categories, and scan pairwise relationships quickly.
1) Seaborn Basics: Figure-level vs Axes-level
-
Axes-level → e.g.,
sns.scatterplot(draws on Matplotlib Axes, returnsAxes) -
Figure-level → e.g.,
sns.catplot,sns.pairplot(manages own figure/layout) -
Common params:
data=,x=,y=,hue=,style=,size=
2) Univariate Distributions
Use these to understand shape, center, spread, and outliers:
-
histplot— histogram (+ KDE option) -
kdeplot— kernel density estimate -
ecdfplot— empirical CDF (great for medians & quantiles) -
countplot— frequency for categorical variables
# Histogram & KDE
sns.histplot(tips, x="total_bill", bins=20, kde=True)
plt.title("Distribution of Total Bill")
plt.show()
# ECDF
sns.ecdfplot(tips, x="tip")
plt.title("ECDF of Tip")
plt.show()
# Count
sns.countplot(data=tips, x="day")
plt.title("Count by Day")
plt.show()
When to use: sanity checks, skewness, choosing transforms, spotting outliers.
3) Categorical ↔ Numerical
Compare distributions across groups:
-
boxplot— median, IQR, whiskers, outliers -
violinplot— full distribution via KDE -
boxenplot— for large samples -
stripplot/swarmplot— raw points -
barplot/pointplot— aggregated means/CI
# Box vs Violin
fig, ax = plt.subplots(1,2, figsize=(10,4))
sns.boxplot(data=tips, x="day", y="total_bill", ax=ax[0])
sns.violinplot(data=tips, x="day", y="tip", ax=ax[1])
ax[0].set_title("Total Bill by Day")
ax[1].set_title("Tip by Day")
plt.tight_layout(); plt.show()
# Strip plot
sns.stripplot(data=tips, x="smoker", y="tip", jitter=True)
plt.title("Raw Tips by Smoker")
plt.show()
# Mean with CI
sns.barplot(data=tips, x="sex", y="tip", estimator=pd.Series.mean, ci=95)
plt.title("Avg Tip by Sex")
plt.show()
Combine violinplot + stripplot for distributions + raw data.
4) Numeric ↔ Numeric Relationships
Start with scatterplots, optionally add regression.
# Basic scatter
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.title("Total Bill vs Tip")
plt.show()
# Add hue/style/size
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex", style="smoker", size="size")
plt.title("Bill vs Tip by Sex/Smoker/Size")
plt.show()
# Trend line
sns.regplot(data=tips, x="total_bill", y="tip", scatter_kws={"alpha":0.6})
plt.title("Trend: Tip ~ Total Bill")
plt.show()
5) Fast Pairwise Scans
sns.pairplot(tips, hue="sex", diag_kind="hist")
plt.suptitle("Pairwise Relationships (tips)", y=1.02)
plt.show()
✅ Checkpoint (Day 1 done): You can read distributions, compare groups, and see pairwise trends. Write 3 insights from the dataset.
📅 Day 2 — Multivariate, Facets, Correlations & Pro Styling
Goal: Add faceting, correlations, palettes, and create presentation-ready visuals.
6) Faceting & Small Multiples
Split data into subplots by category.
# Facet by smoker
sns.catplot(data=tips, x="day", y="tip", hue="sex", col="smoker", kind="bar")
plt.suptitle("Tips by Day (faceted by Smoker)", y=1.02)
plt.show()
# Scatter with facets
sns.relplot(data=tips, x="total_bill", y="tip", hue="sex", col="time", kind="scatter")
plt.show()
Facets make comparisons obvious without clutter.
7) Correlations & Heatmaps
corr = tips[["total_bill","tip","size"]].corr()
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", square=True)
plt.title("Correlation (tips)")
plt.show()
# Clustered heatmap
sns.clustermap(corr, annot=True, fmt=".2f", cmap="coolwarm")
plt.show()
Read values: ±1 → strong linear relation, 0 → weak/none.
8) Time/Ordered Trends
avg = tips.groupby("size", as_index=False)["tip"].mean()
sns.lineplot(data=avg, x="size", y="tip")
plt.title("Average Tip by Party Size")
plt.show()
9) Styling, Palettes & Layout
sns.set_theme(style="whitegrid", context="talk", palette="deep")
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex")
ax.set_title("Tips vs Total Bill")
ax.set_xlabel("Total Bill ($)")
ax.set_ylabel("Tip ($)")
sns.despine()
plt.tight_layout(); plt.show()
10) Legends, Annotations & Saving
ax = sns.regplot(data=tips, x="total_bill", y="tip")
ax.annotate("Higher tips with higher bills",
xy=(40,7), xytext=(25,8.5),
arrowprops=dict(arrowstyle="->", color="white"))
ax.legend_.remove() if ax.legend_ else None
plt.tight_layout()
plt.savefig("tips_scatter.png", dpi=300, bbox_inches="tight", transparent=True)
plt.show()
11) Cheat-Sheet: Axes vs Figure Level
Axes-level:
scatterplot,lineplot,histplot,kdeplot,boxplot,violinplot,heatmap,regplot…
Use when you manage subplots manually.Figure-level:
relplot,catplot,jointplot,pairplot,lmplot…
Use for quick grids/facets and auto layouts.
12) Common Pitfalls
- Overplotting → use alpha,
hexbin, orkdeplot - Don’t rely on defaults → always set titles/labels
- For groups → prefer violin/box + strip over bar means
- Keep consistent color semantics
✅ Checkpoint (Day 2 done): You can facet, compare multivariate trends, style for clarity, and export.
Mini-Project (Deliverable)
Question: What factors drive higher tips?
Steps:
- Univariate: distribution of
total_bill,tip - Groups:
tipbyday,sex,smoker,time - Relationship:
total_bill↔tip(add hue & regression) - Correlation heatmap for numeric vars
- Facet by smoker/time
- Report: 5 insights + 2 charts for LinkedIn/portfolio
import numpy as np
tips = sns.load_dataset("tips").assign(tip_pct=lambda d: d["tip"] / d["total_bill"] * 100)
# 1) Distribution
sns.histplot(tips, x="tip_pct", bins=20, kde=True)
plt.title("Tip % Distribution"); plt.show()
# 2) Groups
sns.boxplot(tips, x="day", y="tip_pct", hue="smoker")
plt.title("Tip % by Day & Smoker"); plt.show()
# 3) Relationship with hue
sns.scatterplot(tips, x="total_bill", y="tip_pct", hue="time", style="sex")
plt.title("Tip % vs Total Bill by Time/Sex"); plt.show()
# 4) Correlation
num = tips[["total_bill","tip","size","tip_pct"]]
sns.heatmap(num.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation (with Tip %)"); plt.show()
Practice Checklist
- Plot hist+KDE for
total_bill; describe skewness - Compare
tipacrossdayusing box+strip - Scatter
total_billvstipwithhue=sex,style=smoker - Create
pairplotwithhue=time - Build correlation heatmap; write 2 interpretations
- Facet bar chart by
smokerandtime - Export one figure at 300 DPI with transparent background
Quick Reference
Most-used APIs:
-
scatterplot,lineplot,histplot,kdeplot,ecdfplot -
boxplot,violinplot,stripplot,barplot,pointplot -
pairplot,jointplot,relplot,catplot -
heatmap,clustermap,regplot
Styling:
sns.set_theme(style, palette, context)-
sns.despine(),plt.tight_layout() - Palettes:
deep,muted,pastel,bright,dark,colorblind
Written for: Nivesh Bansal — Data Analytics Journey, Day 10.
You can copy any code block and practice directly.
Happy plotting!
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Top comments (0)