A picture is worth a thousand words 😀! You can derive valuable insights from data and also communicate these insights via data visualization.
We would clearly see trends and derive insights via Python’s Matplotlib library, which is the foundational library used by many visualization tools. There are 3 main layers in Matplotlib’s architecture and from top to bottom in terms of the high-level commands, they are -
- matplotlib.pyplot module - the scripting layer which is often called procedural plotting and is used when you want to quickly create plots and get done with it. This layer is designed to work like a MATLAB script.
- matplotlib.artist module - the artist layer which is often called objected-oriented plotting and you can do a lot more customizations because you have much more control. Note that this layer also uses the pyplot module for a few functions like creating the figure - we would see in the examples below that even in the object-oriented approach, pyplot is still used in creating the figure, which holds anything plotted, as you would see below.
- matplotlib.backend_bases module - the backend layer - Matplotlib can be used in many ways and also have different outputs formats e.g Matplotlib can be run from the python shell and we have plotting windows pop up; or it is run via Jupyter notebooks and plots are drawn inline. So, the backend layer exists to support these several use cases and outputs.
So, with what we have above, there are essentially 2 ways to create plots in Matplotlib -
- The procedural way - this is where we mostly do plt.xxx
- The object-oriented way - this is where we mostly do ax.xxx
We would focus on the object-oriented way in this piece since we can do a lot more customizations with it. Also, note that there are different ways to create an axes - an axes is contained in a figure as seen above. These different ways of creating an axes still produce the same results - I will highlight the different ways. Now, let’s kick off some Matplotlib plotting by taking a look at 1.py below:
In 1.py above -
- Line 1 = pyplot is a module in Matplotlib which will help us in plotting. It is conventionally imported with the alias plt
- Line 3 = when plt.subplots() is called without any arguments, it creates 2 objects - a Figure object and an Axes object. The Figure object is like a container that holds the axes and a figure can contain multiple axes. The Axes object is where we plot our data to visualize it
- Line 4 = displays the plot - which is a figure with empty axes because no data has been added yet
There are different ways used in creating an axes in Matplotlib - plt.subplot(), plt.subplots() and plt.axes() are all from the scripting layer and it corresponds to fig.add_subplot(), fig.subplots() and fig.add_axes() from the artist layer. Lines 6-44 shows other ways of creating an axes which produce the same result.
- Line 16 - fig.subplot(1, 1, 1) means 1 row, 1 column and the last argument gives the position of the subplot which is the 1st subplot in this case - the last argument has to be less than or equal to the product of the 1st and 2nd arguments
- Line 28 - fig.subplots(1, 1) means 1 row and 1 column
- Lines 39 and 43, the common arguments [0.1, 0.1, 0.8, 0.8] makes the axes 10% from the left of the figure, 10% from the bottom of the figure, 80% width of the figure, and 80% height of the figure.
Now let’s add some fictional data to our figure! See 2.py below -
In the line plot above, we can clearly see the temperature pattern as it increases from Jan to Jul then it starts decreasing from Aug to Nov then it starts increasing again. Imagine you have lots of data, would you rather go through the pain of reading off average temperatures from a table or use the line plot which shows clearer trends in the data?
We can even add more fictional data like in 3.py below -
In the image above, we can clearly see -
- Abuja is warmer than Lagos for the first 8 months - perhaps you might prefer chilling in Lagos for the first 8 months of the year 😉?
- Abuja has a drop in temperature in September which is even lower than that of Lagos for the same month - seems you might want to travel back to Abuja this time perhaps to meet with family 😉?
- But hey chill! looks like for the rest of the year, there is a rise in temperature which is higher than that of Lagos - Ermmm! I think you might want to just chill in Lagos for a bit and monitor the trends for a while 😉?
The line plots we have seen so far shows us the monthly trends for the average temperature across different cities but it does not communicate the data properly in a way that it can be easily understood. This is where we have to customize our plot in order to communicate the information more clearly. Let’s see 4.py below for some customizations -
In 4.py above -
- In lines 12-14, we added these arguments:
- marker which shows the actual data points
- markersize, makerfacecolor, markeredgewidth, markeredgecolor which customizes the marker by increasing its size, adding colour to its fill and giving the marker outline width and colour respectively
- linestyle, linewidth and color which gives the line in the plot its style, width and color. linestyle can be shortened to be ls while linewidth can be shortened to be lw
- Line 16 - sets the label for the x-axis
- Line 17 - sets the label for the y-axis
- Line 18 - sets the title for the line plot and this provides context for our visualization
- Note that every customization is done before a call to plt.show() is made.
To customize the line plot of Lagos and Abuja average monthly temperature, see 5.py below -
Sometimes, when we add more data to a plot like in customized line plot of Lagos and Abuja average monthly temperature values above, it makes it look so busy and it becomes a big mess which conceals patterns or trends in data rather than conveying them. The solution to this is to use subplots. Subplots are several small plots which show identical data under different conditions e.g temperature values data for different cities. Let’s see 6.py below for an example -
In 6.py above -
- Line 16 - sharey = True ensures subplots have the same range on the y-axis based on the data from both datasets
- Line 22 - since the subplots are on top of each other, we can just add the x-axis label to the bottom plot
In the example given in 6.py above, we got a 1-dimensional axes object since one of the dimensions is 1. For a 2-dimensional array, we can access the object in several ways like we see in lines 29-69 -
- 1a, 1b and 1c show different ways we can access the axes object: via regular indexing we do in Python, via flattening the 2D array and via tuple unpacking respectively.
- The rest shows the different usage patterns when ax is an array of axes objects.
As seen above you can create axes via different ways e.g. by adding an axes to the figure via fig.add_axes(), this would not be a subplot, but an axes which is an object of matplotlib.axes._axes.Axes. An axes, which is created via the subplot way is a matplotlib.axes._subplots.AxesSubplot. This class derives from matplotlib.axes._axes.Axes, thus this subplot is an axes. Hence, every subplot is an axes object but not every Axes object is a AxesSubplot object. An axes contain the x-axis and the y-axis. Be it singular or plural, it is still called axes.
Alrightee!!! Glad we are gradually demystifying the mystical Matplotlib library. I know you will eventually get the hang of it and begin your tour in becoming a Data Viz Wiz! Stay tuned on this series for my next article on Visualizing categorical and quantitative variables via Matplotlib! Have an amazing and fulfilled week ahead!