DEV Community

Cover image for Calculating weighted averages with numpy and Python!
Chris Greening
Chris Greening

Posted on • Edited on

Calculating weighted averages with numpy and Python!

Introduction

Navigating the world of data often means operating in scenarios where not all data points have the same importance as one another

This is where the weighted average, a statistical tool that assigns importance to each value, helps us incorporate the context of a situation into our average calculations!

import numpy as np
Enter fullscreen mode Exit fullscreen mode

With Python's versatile ecosystem we're able to leverage tools such as numpy to quickly and efficiently calculate the weighted average in our analyses and data projects

Table of contents

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Prerequisites and installation

The following package is a prerequisite installation for following along with this blog post!

To install it open your preferred terminal/console and run:

pip3 install numpy
Enter fullscreen mode Exit fullscreen mode

What is the weighted average?

The weighted average is an extension of a typical arithmetic mean that includes the importance (or weight) of each data point when calculating the average

In scenarios where all data points have the same importance, the weighted average simplifies to the standard arithmetic mean. However, when the significance of each data point varies the weighted average becomes a vital tool

Examining a simple example

Let's consider an example where we are a data scientist employed by a university to calculate the average student grade across all classes in the school

To preserve the privacy of individual students we are only provided data aggregated at the class level and are thus given each individual class'

  • average grade
  • number of students

Our initial instinct might be to just take the usual average across all classes but what happens when comparing small classes to very large classes?

If a class has an average test score of 20/100 but only has 4 students is it fair to compare it to a class that has an average test score of 93 and 500 students? No!

If we did that the small class would be given an outsized level of importance as the test grades of just 4 students should not impact the overall mean as much as 500 students

So how do we incorporate the number of students into our university grade average?

With the weighted average!

Using np.average to calculate weighted average

Continuing with the previous example let's say these are the grades and their respective number_of_students per class:

grades = [20, 93, 56, 79, 100, 86]
number_of_students = [4, 500, 93, 274, 12, 30]
Enter fullscreen mode Exit fullscreen mode

To get the weighted average across the entire university using numpy all we have to do is incorporate the weights into the np.average:

import numpy as np
university_average = np.average(grades, weights=number_of_students)
print(university_average)

>>> 84.57174151150055
Enter fullscreen mode Exit fullscreen mode

Conclusion

And just like that we're able to quickly incorporate the weighted average into our projects by leveraging the np.average's weights argument

Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

favicon christophergreening.com

Cheers!


Oldest comments (0)