DEV Community

Apiumhub
Apiumhub

Posted on • Originally published at apiumhub.com on

Getting Started with Matplotlib – Lesson 1

Introduction

Visualization as a tool takes part of the analysis coming from the data scientist in order to extract conclusions from a dataset. In today’s article we are going to go through Matplotlib library. Matplotlib is a third-party library for data visualization. It works well in combination with NumPy, SciPy, and Pandas.

Basic Plot, Function Visualization and Data Visualization

The 2009 data set ‘Wine Quality Dataset’ elaborated by Cortez et al. available at UCI Machine Learning, is a well-known dataset that contains wine quality information.It includes data about red and white wine physicochemical properties and a quality score. Before we start, we are going to visualize the head a little example dataset:

Sw n2fqqDcSUeBEsTm8hGi7xIdeCNMi2BaNxuH1FcWuWo N8dfG8f1Pgg2hljVj6VaIgjPJ8cKHma VK6UfHapGbA1JFPEDvtuznpLkSB2srRQalpnhINrma1Z5RX3dai65eJ5yi=s0

Basic Plot

Matplotlib is a library that has an infinite power to represent data in almost any possible way. To understand how it works, we are going to start with the most basic instructions and little by little we are going to increase the difficulty.

The most useful way to check the data distribution is representing it so we are going to start by painting a series of points. For this, we can both use plt.plot and plt.scatter to visualize them.

List of points plot distribution

Import matplotlib as plt 
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')
plt.axis([0, 6, 0, 21])
Enter fullscreen mode Exit fullscreen mode

Representing a list of points using plot function:

3Zw iDvj w

Fig 1. Plotting List of points using plt.plot and plt.scatter plot. The difference between the two comes with the control over the color, shape and size of points. In plt.scatter you have more control in each point´s appearance.

Import matplotlib as plt 
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
plt.axis([0, 6, 0, 21])
Enter fullscreen mode Exit fullscreen mode

Representing a list of points using the scatter function:

points = [[1,2,3,4], [1,4,9,16]]
plt.plot(points[0], points[1], 'g^')
plt.plot([x * 2 for x in points[0]], points[1], 'r--')
plt.plot([x * 2.3 for x in points[0]], points[1], 'bs')
plt.axis([0, 15, 0, 21])
Enter fullscreen mode Exit fullscreen mode

Fig 2. Plot of 3 different list of points. Scatter plot function allows to customize shape of the different points

Function Visualization

Sometimes we want to paint a series of points based on a certain function´s behaviour. To illustrate this example we are going to use the sine(2πx) function. As you will see, we are going to previously define the function so we could use any function that we create, it does not have to be predetermined.

Representing a function :

Import matplotlib as plt 
Import numpy as np 

def sin(t):
    return np.sin(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)

plt.scatter(t1, sin(t1))
Enter fullscreen mode Exit fullscreen mode

NACcVVytfG xbYnVwwPq7lpyJFdfunbyH4 BLGYD2ufOGLue0Lky2C4CJ8iQwGg 8uKeO9wfBhSbZDOl3mPy9JkPmZWG08U63eys3Yt

Fig 3. Representation of a function with points and lines using scatter plot and plot functions from matplotlib library

Now we will make the same representation but using a line that runs through all these points.

Import matplotlib as plt 
Import numpy as np 
def sin(t):
    return np.sin(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.1)

plt.plot(t1, sin(t1), 'b')
Enter fullscreen mode Exit fullscreen mode

Data Visualization

We are going to start with some basic but very useful visualizations when we begin to study our data. For this we are going to use the Quality wine dataset discussed above and we are going to learn how to represent a histogram of data and a comparison between two columns.

Representation of a histogram of a column in our dataset:

df_wine['fixed acidity'].hist(legend=True)
Enter fullscreen mode Exit fullscreen mode

7fek36ruqrRUPodf85nbmCdGKNHI8lrB5apnNDni6V7U5k0jw39tZ2edDmNVHbRx8L7zzWZjaeoJdowlQO VNgyzUniq44XBxpYcnZjGoJE6XqGf 5Me1OKBmimnNXU3lUrFm 1Z=s0

Fig 4

Comparison of two columns of the dataset:

plt.figure(figsize=(7, 4))
plt.plot(df_wine['fixed acidity'], df_wine['quality'], 'ro')
plt.xlabel('quality')
plt.ylabel('fixed acidity')
Enter fullscreen mode Exit fullscreen mode

f6RsP93pzrZU0IlhCqOzSWZ7YIMzZlAH MuUjvSf9pkJ pOsouVYfV53rF9BZ8

Fig 5

Representation of a histogram of a column in our dataset:

plt.bar(df_wine['quality'], df_wine['fixed acidity'])
plt.xlabel('Quality')
plt.ylabel('Fixed Acidity')
Enter fullscreen mode Exit fullscreen mode

i0jQom FqASZSPXzNZojROwOv9DmMeQ9Rnhelm9mL26CW65Jp8Df4BTmU3P6y

Fig 6

Now we are going to raise the difficulty a bit and we are going to enter what Matplotlib calls Figures.

Matplotlib graphs your data on Figures (i.e., windows, Jupyter widgets, etc.), each of which can contain one or more Axes (i.e., an area where points can be specified in terms of x-y coordinates, or theta-r in a polar plot, or x-y-z in a 3D plot, etc.).

The simplest way of creating a figure with an axes is using pyplot.subplots. We can then use Axes.plot to draw some data on the axes:

IUtl2ZzRmWfXJknISpcniwveVwhFF3NAIu8lKCO4gS8D2gMlGz1f21hzGwb7DSrXhZJqKIqZRuMCrOcMrdB1j5FMCYDUiAnMoqCzupko922bmUv780Pu8AE4ZwUGOx02NcL7sZA =s0

Fig 7

We are going to start by creating an empty figure and we are going to add a title to it.

Empty figure with Title ‘This is an empty figure’:

fig = plt.figure()
fig.suptitle('This is an empty figure', fontsize=14, fontweight='bold')
ax = fig.add_subplot(111)
plt.show()
Enter fullscreen mode Exit fullscreen mode

As you can see fig.add_subplot(111) are subplot grid parameters encoded as a single integer.

For example, “111” means “1×1 grid, first subplot” and “234” means “2×3 grid, 4th subplot”.

Alternative form for add_subplot(111) is add_subplot(1, 1, 1)

SNj3Sok5OqkdmteWtOYvsrxmR4PCrqlUpiF2YITKBt2D7VyZrHE 4L2Yp9ILXzkLOuLoBJel7iy wYzmvWuL4nlw1j4zgXQVy5nEoqKTDb 4NtaI6tpV2r1IkImPdgPve9PbVFR8=s0

Next we will write the name of what each axis represents and add a small text box.

Plot a text inside a box:

fig = plt.figure()
fig.suptitle('This is an empty figure', fontsize=14, fontweight='bold')
ax = fig.add_subplot(111)

ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')

ax.text(0.3, 0.8, 'boxed italics text in data coords', style='italic',
        bbox={'facecolor':'red', 'alpha':0.5, 'pad':10})
plt.show()
Enter fullscreen mode Exit fullscreen mode

btZRjtFu3EuawM5QcYGAkcp4bhqITPbxOk5MUyXiD ahhibfEDMo10NfKxFnG8nmNGGUYzwYQ0iYJoYMHMty zOt8AEb9FdDSINNLST9btmRPkKV1GXHh9TYAeZW2YeJsU2cBC3p=s0

Now we are going to try writing an annotation followed by an arrow.

Plot an annotate:

fig = plt.figure()
fig.suptitle('This is an empty figure', fontsize=14, fontweight='bold')
ax = fig.add_subplot(111)

ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')

ax.annotate('annotate', xy=(0.2, 0.1), xytext=(0.3, 0.4),
            arrowprops=dict(facecolor='black', shrink=0.05))
plt.show()
Enter fullscreen mode Exit fullscreen mode

Finally, something very useful that we usually need is to set the range of the axes for our representation. For this we are going to use the axis attribute and pass it the values ​​that we want to configure.

Change axis ranges to x -> [0, 10] y -> [0, 10]:

fig = plt.figure()
fig.suptitle('This is an empty figure', fontsize=14, fontweight='bold')
ax = fig.add_subplot(111)

ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')

ax.axis([0, 10, 0, 10])
Enter fullscreen mode Exit fullscreen mode

RCAzoBuNWMcGkzZ2AER0WFWS f9rpgqqIarrJZsaSUzEaT8qrVRSDRJEcSI9JX G9LkmF LPk6QWs8u Nkv4UcJ92pHK0k1HUhBTWmeFI GnF8YY PSiFfvprvN9Hs4UJJX9J9Cd=s0

Training your abilities

If you want to bring your skills further in Data Science, we have created a course that you can download for free here.

Top comments (0)