A focused 2-day roadmap to master Seaborn for data analytics using the tips dataset.
This guide includes setup, explanations, code examples, pitfalls, checkpoints, and a mini-project.
By the end, you’ll have job-ready visualization skills.
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Seaborn Roadmap for Data Analytics (Using tips
)
Goal: Master Seaborn’s essentials in 2 days (or one power-day).
This roadmap covers every necessary topic with explanations, code snippets, and practice tasks—focused purely on practical analytics.
Key Outcomes:
- Dataset:
sns.load_dataset("tips")
- Focus: EDA • Storytelling • Clean visuals
- Result: Job-ready plotting skills
Tip: Don’t memorize. For each topic:
- Run the example
- Tweak 2–3 parameters
- Write one insight in plain English
Requirements:
- Python ≥ 3.9
- Libraries: pandas, numpy, matplotlib, seaborn
- IDE: Jupyter/Colab or any Python IDE
0) Quick Setup
pip install seaborn matplotlib pandas numpy
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set_theme(style="whitegrid", context="notebook")
tips = sns.load_dataset("tips")
tips.head()
Dataset Columns:
total_bill
, tip
, sex
, smoker
, day
, time
, size
📅 Day 1 — Foundations & Core EDA
Goal: Understand Seaborn’s API, explore distributions, compare categories, and scan pairwise relationships quickly.
1) Seaborn Basics: Figure-level vs Axes-level
-
Axes-level → e.g.,
sns.scatterplot
(draws on Matplotlib Axes, returnsAxes
) -
Figure-level → e.g.,
sns.catplot
,sns.pairplot
(manages own figure/layout) -
Common params:
data=
,x=
,y=
,hue=
,style=
,size=
2) Univariate Distributions
Use these to understand shape, center, spread, and outliers:
-
histplot
— histogram (+ KDE option) -
kdeplot
— kernel density estimate -
ecdfplot
— empirical CDF (great for medians & quantiles) -
countplot
— frequency for categorical variables
# Histogram & KDE
sns.histplot(tips, x="total_bill", bins=20, kde=True)
plt.title("Distribution of Total Bill")
plt.show()
# ECDF
sns.ecdfplot(tips, x="tip")
plt.title("ECDF of Tip")
plt.show()
# Count
sns.countplot(data=tips, x="day")
plt.title("Count by Day")
plt.show()
When to use: sanity checks, skewness, choosing transforms, spotting outliers.
3) Categorical ↔ Numerical
Compare distributions across groups:
-
boxplot
— median, IQR, whiskers, outliers -
violinplot
— full distribution via KDE -
boxenplot
— for large samples -
stripplot
/swarmplot
— raw points -
barplot
/pointplot
— aggregated means/CI
# Box vs Violin
fig, ax = plt.subplots(1,2, figsize=(10,4))
sns.boxplot(data=tips, x="day", y="total_bill", ax=ax[0])
sns.violinplot(data=tips, x="day", y="tip", ax=ax[1])
ax[0].set_title("Total Bill by Day")
ax[1].set_title("Tip by Day")
plt.tight_layout(); plt.show()
# Strip plot
sns.stripplot(data=tips, x="smoker", y="tip", jitter=True)
plt.title("Raw Tips by Smoker")
plt.show()
# Mean with CI
sns.barplot(data=tips, x="sex", y="tip", estimator=pd.Series.mean, ci=95)
plt.title("Avg Tip by Sex")
plt.show()
Combine violinplot
+ stripplot
for distributions + raw data.
4) Numeric ↔ Numeric Relationships
Start with scatterplots, optionally add regression.
# Basic scatter
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.title("Total Bill vs Tip")
plt.show()
# Add hue/style/size
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex", style="smoker", size="size")
plt.title("Bill vs Tip by Sex/Smoker/Size")
plt.show()
# Trend line
sns.regplot(data=tips, x="total_bill", y="tip", scatter_kws={"alpha":0.6})
plt.title("Trend: Tip ~ Total Bill")
plt.show()
5) Fast Pairwise Scans
sns.pairplot(tips, hue="sex", diag_kind="hist")
plt.suptitle("Pairwise Relationships (tips)", y=1.02)
plt.show()
✅ Checkpoint (Day 1 done): You can read distributions, compare groups, and see pairwise trends. Write 3 insights from the dataset.
📅 Day 2 — Multivariate, Facets, Correlations & Pro Styling
Goal: Add faceting, correlations, palettes, and create presentation-ready visuals.
6) Faceting & Small Multiples
Split data into subplots by category.
# Facet by smoker
sns.catplot(data=tips, x="day", y="tip", hue="sex", col="smoker", kind="bar")
plt.suptitle("Tips by Day (faceted by Smoker)", y=1.02)
plt.show()
# Scatter with facets
sns.relplot(data=tips, x="total_bill", y="tip", hue="sex", col="time", kind="scatter")
plt.show()
Facets make comparisons obvious without clutter.
7) Correlations & Heatmaps
corr = tips[["total_bill","tip","size"]].corr()
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", square=True)
plt.title("Correlation (tips)")
plt.show()
# Clustered heatmap
sns.clustermap(corr, annot=True, fmt=".2f", cmap="coolwarm")
plt.show()
Read values: ±1 → strong linear relation, 0 → weak/none.
8) Time/Ordered Trends
avg = tips.groupby("size", as_index=False)["tip"].mean()
sns.lineplot(data=avg, x="size", y="tip")
plt.title("Average Tip by Party Size")
plt.show()
9) Styling, Palettes & Layout
sns.set_theme(style="whitegrid", context="talk", palette="deep")
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="sex")
ax.set_title("Tips vs Total Bill")
ax.set_xlabel("Total Bill ($)")
ax.set_ylabel("Tip ($)")
sns.despine()
plt.tight_layout(); plt.show()
10) Legends, Annotations & Saving
ax = sns.regplot(data=tips, x="total_bill", y="tip")
ax.annotate("Higher tips with higher bills",
xy=(40,7), xytext=(25,8.5),
arrowprops=dict(arrowstyle="->", color="white"))
ax.legend_.remove() if ax.legend_ else None
plt.tight_layout()
plt.savefig("tips_scatter.png", dpi=300, bbox_inches="tight", transparent=True)
plt.show()
11) Cheat-Sheet: Axes vs Figure Level
Axes-level:
scatterplot
,lineplot
,histplot
,kdeplot
,boxplot
,violinplot
,heatmap
,regplot
…
Use when you manage subplots manually.Figure-level:
relplot
,catplot
,jointplot
,pairplot
,lmplot
…
Use for quick grids/facets and auto layouts.
12) Common Pitfalls
- Overplotting → use alpha,
hexbin
, orkdeplot
- Don’t rely on defaults → always set titles/labels
- For groups → prefer violin/box + strip over bar means
- Keep consistent color semantics
✅ Checkpoint (Day 2 done): You can facet, compare multivariate trends, style for clarity, and export.
Mini-Project (Deliverable)
Question: What factors drive higher tips?
Steps:
- Univariate: distribution of
total_bill
,tip
- Groups:
tip
byday
,sex
,smoker
,time
- Relationship:
total_bill
↔tip
(add hue & regression) - Correlation heatmap for numeric vars
- Facet by smoker/time
- Report: 5 insights + 2 charts for LinkedIn/portfolio
import numpy as np
tips = sns.load_dataset("tips").assign(tip_pct=lambda d: d["tip"] / d["total_bill"] * 100)
# 1) Distribution
sns.histplot(tips, x="tip_pct", bins=20, kde=True)
plt.title("Tip % Distribution"); plt.show()
# 2) Groups
sns.boxplot(tips, x="day", y="tip_pct", hue="smoker")
plt.title("Tip % by Day & Smoker"); plt.show()
# 3) Relationship with hue
sns.scatterplot(tips, x="total_bill", y="tip_pct", hue="time", style="sex")
plt.title("Tip % vs Total Bill by Time/Sex"); plt.show()
# 4) Correlation
num = tips[["total_bill","tip","size","tip_pct"]]
sns.heatmap(num.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation (with Tip %)"); plt.show()
Practice Checklist
- Plot hist+KDE for
total_bill
; describe skewness - Compare
tip
acrossday
using box+strip - Scatter
total_bill
vstip
withhue=sex
,style=smoker
- Create
pairplot
withhue=time
- Build correlation heatmap; write 2 interpretations
- Facet bar chart by
smoker
andtime
- Export one figure at 300 DPI with transparent background
Quick Reference
Most-used APIs:
-
scatterplot
,lineplot
,histplot
,kdeplot
,ecdfplot
-
boxplot
,violinplot
,stripplot
,barplot
,pointplot
-
pairplot
,jointplot
,relplot
,catplot
-
heatmap
,clustermap
,regplot
Styling:
sns.set_theme(style, palette, context)
-
sns.despine()
,plt.tight_layout()
- Palettes:
deep
,muted
,pastel
,bright
,dark
,colorblind
Written for: Nivesh Bansal — Data Analytics Journey, Day 10.
You can copy any code block and practice directly.
Happy plotting!
Source Code: Click here
Written By: Nivesh Bansal Linkedin GitHub Instagram
Top comments (0)