DEV Community

loading...
Cover image for How I am learning machine learning - week 6: python and matplotlib (part two)

How I am learning machine learning - week 6: python and matplotlib (part two)

gabrieleboccarusso profile image Gabriele Boccarusso Updated on ・5 min read

Last week we saw the fundamentals of the matplotlib package and what we can do with it, now we'll see how to use it with pandas and numpy to use it effectively.

Table of contents:

Pandas dataframes

Pandas dataframes are the most common data structure that we'll use and we can obviously plot them with matplolib. When we introduced pandas we used a sample about baseball players with just ten rows because I have deleted all of them to simplify the learning. Now we can use the complete one that you can find here

uploading data from a csv file into a pandas dataframe But we have a problem: the names of the columns are badly formatted. We can resolve this in few steps: first, we see the name of the columns with the keys function, and then we rename them with the rename function:

renaming columns in a pandas dataframe

Plotting a dataframe

We can quickly plot through our dataframe with the plot function. This is not the recommended way, but for quick plots works exactly fine:

plotting through a dataframe with the matplotlib plot function
This way we can have a quick overview of our data, another example is the hist function:

plotting a histogram from a pandas dataframe with the dedicated matplotlib hist function

The object-oriented method

Now we know how to quickly see the data we need, but in the majority of cases we'll need well structured and customized plots, and to do this is more efficient the object-oriented method or the OO method. Let's see an example:

Creating a plot using the object-oriented method
Now that we have our plot let's take a moment to understand what's going on here: We create a fig and an ax and gave to them a size, then we call the plot function on our dataframe, and into the function we declare

  • the type of plot that we want (scatter)
  • the x-axis (the age of every player)
  • the y-axis (the weight of our players)
  • the (c)olor of the plot, the range of every element based on a third value (their height in inches)
  • the ax that we are plotting. In this case there was only one axis, but if there was more we could have typed ax = ax[0, 0] or ax = axis_name.

But we are not still completely using the OO method. To do it, we should use the subplots function with using all that derives from it:

figure, axis = plt.subplots(figsize=(10, 7))
#creating the plot, same as before
first_plot = axis.scatter(x = bb_players['Age'],
             y = bb_players['Weight (lbs)'],
             c = bb_players['Height (inches)'])

# adding information
axis.set(title="correlations between baseball players weight and age",
        xlabel="players age",
        ylabel="players weight")
# adding the legend
axis.legend(*first_plot.legend_elements(), title = 'Weight in lbs');
# the "*" query all the elements in first_plot
Enter fullscreen mode Exit fullscreen mode

What we are doing here is simply creating a figure and customizing its plot.
The result will be:

the result of a plot using completely the object-oriented methodhere there are some typing error here, the most obvious is the legend referring to weight instead of height

Now should be clear how to do a figure with the OO method but with more plots in it. An example of code may be:

# creating a figure with more than 1 subplots
figure, (first_plot, second_plot) = plt.subplots(nrows = 2,
                                                 ncols = 1,
                                                 figsize = (12, 20))

scatter_plot = first_plot.scatter(x = bb_players['Age'],
                   y = bb_players['Weight (lbs)'],
                   c = bb_players['Height (inches)'])
first_plot.set(title="correlations between baseball players weight and age",
               xlabel="age of players",
               ylabel="weight of players")
first_plot.legend(*scatter_plot.legend_elements(), 
                  title = 'Height in inches');

# putting a line on the average data
first_plot.axhline(y = bb_players['Weight (lbs)'].mean(), 
                   color = 'r', 
                   linestyle = '-')

# second plot
second_plot.hist(x = bb_players['Age'], 
                 bins = 20)
second_plot.set(title = "Numbers of players with a certain age",
                xlabel="age of players",
                ylabel="number of players");
Enter fullscreen mode Exit fullscreen mode

As you can see from the code we have set the rows and the columns of the figure and set the bins of the histogram. The bins are simply the unit that spaces through the x-axis. A histogram with seven bins will be similar to:

example of an histogram with seven bins source
The axhline is a very simple function explained here. In this case, we set the mean value of the weight columns as the "y".

Now that the code should be clear we can see the result:

making a figure with two plots using matplotlib click to enlarge

Customizing and saving the plot

There various ways to customize our plot in matplotlib with general styles which we can modify details if we want.
the basic command is:

plt.style.available
Enter fullscreen mode Exit fullscreen mode

viewing all the available styles in matplotlib

Using matplotlib style are important to add meaning to our graph, here the difference between the default style and one that you can choose:

matplotlib plot using the default style

matplotlib plot styled with one the available styles

But despite being now slightly different is may still be confusing. Matplotlib offers us the cmap function, among others, to change the details of a style:

using the camp function in a scatter plot

The full reference is available at the official documentation page

Now the last step would be creating an image of your figure with the savefig function that we saw in the last post and everything would be ready to show our data to others.

Final thoughts

Today we saw the last part of the matplotlib python package. Not that we know well enough pandas, numpy, and matplotlib, we can see how scikit works and begin working on some model.
If you have any doubt, feel free to leave a comment.

Discussion (0)

Forem Open with the Forem app