Calculating weighted averages with numpy and Python!

#python #datascience #beginners #tutorial

Introduction

Navigating the world of data often means operating in scenarios where not all data points have the same importance as one another

This is where the weighted average, a statistical tool that assigns importance to each value, helps us incorporate the context of a situation into our average calculations!

import numpy as np

With Python's versatile ecosystem we're able to leverage tools such as numpy to quickly and efficiently calculate the weighted average in our analyses and data projects

Prerequisites and installation
What is the weighted average?
Examining a simple example
Using np.average to calculate weighted mean
Conclusion
Additional resources

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Prerequisites and installation

The following package is a prerequisite installation for following along with this blog post!

numpy

To install it open your preferred terminal/console and run:

pip3 install numpy

What is the weighted average?

The weighted average is an extension of a typical arithmetic mean that includes the importance (or weight) of each data point when calculating the average

In scenarios where all data points have the same importance, the weighted average simplifies to the standard arithmetic mean. However, when the significance of each data point varies the weighted average becomes a vital tool

Examining a simple example

Let's consider an example where we are a data scientist employed by a university to calculate the average student grade across all classes in the school

To preserve the privacy of individual students we are only provided data aggregated at the class level and are thus given each individual class'

average grade
number of students

Our initial instinct might be to just take the usual average across all classes but what happens when comparing small classes to very large classes?

If a class has an average test score of 20/100 but only has 4 students is it fair to compare it to a class that has an average test score of 93 and 500 students? No!

If we did that the small class would be given an outsized level of importance as the test grades of just 4 students should not impact the overall mean as much as 500 students

So how do we incorporate the number of students into our university grade average?

With the weighted average!

Using np.average to calculate weighted average

Continuing with the previous example let's say these are the grades and their respective number_of_students per class:

grades = [20, 93, 56, 79, 100, 86]
number_of_students = [4, 500, 93, 274, 12, 30]

To get the weighted average across the entire university using numpy all we have to do is incorporate the weights into the np.average:

import numpy as np
university_average = np.average(grades, weights=number_of_students)
print(university_average)

>>> 84.57174151150055

Conclusion

And just like that we're able to quickly incorporate the weighted average into our projects by leveraging the np.average's weights argument

Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Cheers!