Akhilesh

Posted on Apr 26

Your First Plot: Matplotlib Without the Pain

#ai #python #programming #productivity

Numbers in a table tell you something.

A chart tells you the same thing in a second.

You can stare at 1000 rows of salary data and struggle to see the pattern. Or you can plot a histogram and instantly see that it is bimodal, two clusters, probably two different job levels in one dataset. The pattern was always there. The chart made it obvious.

Matplotlib is the foundation of Python visualization. It is not the prettiest library. It is not the easiest. But everything else in the Python visualization ecosystem either uses it underneath or was built as an alternative to it. Learn Matplotlib and the others make sense.

The Mental Model First

Before any code, understand the structure.

A Figure is the whole window. The blank canvas. You can have multiple plots inside one figure.

An Axes is one individual plot. One set of x and y axes. When you have a 2x2 grid of charts, you have one Figure with four Axes objects.

Most beginner tutorials use plt.plot() which creates a Figure and Axes automatically and hides the details. That works for simple cases. When you need control, you work with Figure and Axes objects directly.

Both approaches matter. You will use both.

The Quick Way: plt

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
sales  = [42000, 38000, 51000, 47000, 55000, 62000]

plt.plot(months, sales, marker="o", color="steelblue", linewidth=2)
plt.title("Monthly Sales 2024")
plt.xlabel("Month")
plt.ylabel("Sales (₹)")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("sales_line.png", dpi=150)
plt.show()
print("Line chart saved")

plt.plot() draws the line. marker="o" adds a dot at each data point. plt.tight_layout() prevents labels from getting cut off. plt.savefig() saves to disk. plt.show() displays it (in Jupyter or desktop, not in terminal scripts).

Always call plt.savefig() before plt.show(). After show(), the figure clears.

The Proper Way: Figure and Axes

fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(months, sales, marker="o", color="steelblue", linewidth=2, label="Actual")
ax.axhline(y=np.mean(sales), color="red", linestyle="--", label=f"Average: {np.mean(sales):,.0f}")

ax.set_title("Monthly Sales 2024", fontsize=14, pad=15)
ax.set_xlabel("Month", fontsize=11)
ax.set_ylabel("Sales (₹)", fontsize=11)
ax.legend()
ax.grid(True, alpha=0.3)

fig.tight_layout()
fig.savefig("sales_proper.png", dpi=150)
plt.show()

fig, ax = plt.subplots() creates both objects explicitly. Methods on ax control the plot. Methods on fig control the whole canvas.

Use this form for everything beyond quick exploratory plots. It gives you complete control and works cleanly when you need multiple subplots.

Bar Charts

departments = ["Engineering", "Marketing", "Sales", "HR"]
headcount   = [45, 28, 33, 12]
avg_salary  = [72000, 65000, 58000, 48000]

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

bars = axes[0].bar(departments, headcount, color=["steelblue", "coral", "mediumseagreen", "orange"])
axes[0].set_title("Headcount by Department")
axes[0].set_ylabel("Number of Employees")

for bar, count in zip(bars, headcount):
    axes[0].text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + 0.5,
        str(count),
        ha="center", va="bottom", fontweight="bold"
    )

axes[1].barh(departments, avg_salary, color="steelblue", alpha=0.8)
axes[1].set_title("Average Salary by Department")
axes[1].set_xlabel("Salary (₹)")
axes[1].axvline(x=np.mean(avg_salary), color="red", linestyle="--", alpha=0.7)

fig.tight_layout()
fig.savefig("bar_charts.png", dpi=150)
plt.show()

Two charts side by side. Left shows vertical bars with value labels on top. Right shows horizontal bars with a reference line for the overall average.

axes[0] and axes[1] access each subplot independently. When you have a grid, index into the axes array.

Histograms: See Your Distribution

np.random.seed(42)
salaries = np.concatenate([
    np.random.normal(55000, 8000, 200),
    np.random.normal(95000, 10000, 100)
])

fig, ax = plt.subplots(figsize=(10, 5))

ax.hist(salaries, bins=40, color="steelblue", edgecolor="white", alpha=0.8)
ax.axvline(np.mean(salaries), color="red", linestyle="--", linewidth=2, label=f"Mean: {np.mean(salaries):,.0f}")
ax.axvline(np.median(salaries), color="orange", linestyle="--", linewidth=2, label=f"Median: {np.median(salaries):,.0f}")

ax.set_title("Salary Distribution", fontsize=14)
ax.set_xlabel("Salary (₹)")
ax.set_ylabel("Count")
ax.legend()
ax.grid(True, alpha=0.3, axis="y")

fig.tight_layout()
fig.savefig("histogram.png", dpi=150)
plt.show()

Two peaks. Bimodal distribution. The mean and median are different because there are two distinct clusters of employees at different salary levels. A histogram made this visible in one second.

Always plot your data before modeling. Bimodal distributions, outliers, skewed data, all of these affect which algorithms work and how you need to prepare the data.

Scatter Plots: See Relationships

np.random.seed(42)
experience = np.random.uniform(0, 15, 100)
salary = 40000 + 3500 * experience + np.random.normal(0, 8000, 100)

fig, ax = plt.subplots(figsize=(9, 6))

scatter = ax.scatter(experience, salary, alpha=0.6, c=salary, cmap="viridis", s=60)
plt.colorbar(scatter, ax=ax, label="Salary (₹)")

m, b = np.polyfit(experience, salary, 1)
x_line = np.linspace(0, 15, 100)
ax.plot(x_line, m * x_line + b, color="red", linewidth=2, label=f"Trend: ₹{m:,.0f} per year")

ax.set_title("Experience vs Salary", fontsize=14)
ax.set_xlabel("Years of Experience")
ax.set_ylabel("Salary (₹)")
ax.legend()
ax.grid(True, alpha=0.3)

fig.tight_layout()
fig.savefig("scatter.png", dpi=150)
plt.show()

Color-coded by salary value. A fitted trend line showing the relationship. np.polyfit fits a polynomial (degree 1 is a line) to the data. This is the visual version of linear regression before you run the actual model.

Multiple Lines: Comparing Groups

quarters = ["Q1", "Q2", "Q3", "Q4"]
north  = [42000, 48000, 55000, 51000]
south  = [38000, 41000, 47000, 62000]
east   = [51000, 53000, 49000, 58000]

fig, ax = plt.subplots(figsize=(10, 6))

for region, values, color in zip(
    ["North", "South", "East"],
    [north, south, east],
    ["steelblue", "coral", "mediumseagreen"]
):
    ax.plot(quarters, values, marker="o", color=color, linewidth=2, label=region)
    ax.annotate(
        f"₹{values[-1]:,}",
        xy=(3, values[-1]),
        xytext=(3.05, values[-1]),
        fontsize=9, color=color, va="center"
    )

ax.set_title("Regional Sales by Quarter", fontsize=14)
ax.set_xlabel("Quarter")
ax.set_ylabel("Sales (₹)")
ax.legend()
ax.grid(True, alpha=0.3)

fig.tight_layout()
fig.savefig("multi_line.png", dpi=150)
plt.show()

Three lines, one per region. Labels on the right end of each line. This is cleaner than a legend for small numbers of lines because your eye does not have to travel back and forth.

Styling: Make It Look Professional

print(plt.style.available[:10])

Output:

['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-polished', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot']

Apply a style:

plt.style.use("ggplot")

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(months, sales, marker="o", linewidth=2)
ax.set_title("Sales with ggplot style")
fig.savefig("styled.png", dpi=150)
plt.show()

plt.style.use("default")

ggplot and fivethirtyeight are clean and professional. dark_background works well for presentations. Always reset to default after applying a style if you are working in a shared notebook.

The Cheat Sheet

# figure setup
fig, ax = plt.subplots(figsize=(width, height))
fig, axes = plt.subplots(rows, cols, figsize=(w, h))

# chart types
ax.plot(x, y)                        # line
ax.bar(categories, values)           # vertical bar
ax.barh(categories, values)          # horizontal bar
ax.scatter(x, y)                     # scatter
ax.hist(data, bins=30)               # histogram
ax.boxplot(data)                     # box plot
ax.pie(values, labels=labels)        # pie (use sparingly)

# labels and text
ax.set_title("title", fontsize=14)
ax.set_xlabel("x label")
ax.set_ylabel("y label")
ax.legend()
ax.text(x, y, "annotation")
ax.annotate("text", xy=(x, y))

# reference lines
ax.axhline(y=value)                  # horizontal line
ax.axvline(x=value)                  # vertical line

# styling
ax.grid(True, alpha=0.3)
ax.set_facecolor("#f8f8f8")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

# saving
fig.tight_layout()
fig.savefig("output.png", dpi=150, bbox_inches="tight")

A Resource Worth Bookmarking

Nicolas P. Rougier created a freely available book called "Scientific Visualization: Python + Matplotlib" available on GitHub at github.com/rougier/scientific-visualization-book. Over 500 pages of Matplotlib techniques from basic to publication-quality. Used by researchers and engineers worldwide. Everything with code. Best free Matplotlib reference that exists.

Also, the Matplotlib gallery at matplotlib.org/stable/gallery is the fastest way to find code for any chart type. Every example has downloadable code. When you know what you want to make but not how to make it, start there.

Try This

Create matplotlib_practice.py.

Use the Titanic dataset from previous posts.

Build a figure with four subplots in a 2x2 grid.

Top left: a bar chart showing survival count by passenger class. Label each bar with the count.

Top right: a histogram of passenger ages. Mark the mean and median with vertical lines. Different colors.

Bottom left: a scatter plot of fare vs age, colored by survival status. Survivors one color, non-survivors another.

Bottom right: a horizontal bar chart of the top 5 embarkation ports by average fare paid. Only three ports exist in Titanic data so adjust accordingly.

Give every subplot a clear title, labeled axes, and a grid. Save the entire figure as one PNG called titanic_analysis.png at 150 dpi.

What's Next

Matplotlib gives you control. Seaborn gives you beauty. The next post is about Seaborn, which sits on top of Matplotlib and makes statistical visualizations, distributions, correlations, and comparisons look excellent with far less code.

DEV Community