DEV Community

Janet Wafula
Janet Wafula

Posted on

The Core Crew of Statistics: Meet Mean, Median, and Mode

When you first step into the world of data science, whether you’re analyzing the English Premier League table or Love Island voting patterns and also cleaning messy Excel sheets, you’ll come across one of the most fundamental ideas in statistics: measures of central tendency.

These measures are mean, median and mode and are like the compass of data analysis. They help us understand where the "center" of the data lies, giving us a summary of a dataset in just a single number. As simple as they seem, they hold massive importance when it comes to building models, making predictions, and communicating findings.

What Are the Measures of Central Tendency?

Mean (Average)

Add all the values and divide by the count.
Useful for: datasets without extreme outliers.
Example: If you have the ages of 5 islanders 21, 23, 24, 25, 70 the mean age is pulled up by that 70-year-old bombshell.

Median (Middle value)

Sort the values, pick the middle one (or average of two middle values if even number of items).
Useful when: your data has outliers or is skewed.
Example: In the same age data, the median is 24, which better reflects the typical age of the islanders.

Mode (Most frequent value)

The number that occurs most often.
Useful when: you're working with categorical data or seeing popularity.
Example: In Love Island votes, if most people vote for “Andreina”, then she’s the mode.

Why They Matter in Data Science

As someone interested in working with raw data, cleaning it, and drawing insight, you’ll realize these three measures are often your first checkpoint. Here’s why:

1. They summarize big datasets quickly.
Whether it’s health data or financial transactions, instead of going through every row, a quick average or median gives you a starting point.

2.They help detect skewness or outliers.
If the mean is much higher than the median, you likely have a few large outliers. That’s a sign to investigate further before jumping to conclusions.

3.They guide model building.
When preparing data for machine learning models, measures of central tendency often play a role in feature scaling, imputing missing values (like you asked with fillna()), and choosing thresholds.

4.They make data storytelling easier.
Whether you're presenting to a team or writing a LinkedIn post about your latest Power BI dashboard, saying "the average customer age is 32" communicates way more than showing a list of 500 numbers

Just like every story has a main character, every dataset has a center,a point that helps you make sense of everything else. Mean, median, and mode may seem like basic stats, but in data science, they are your first clues, your sanity check, and sometimes even your problem solvers

Top comments (0)