DEV Community

ram vnet
ram vnet

Posted on

Statistical Analysis in Data Science : Measures of Central Tendency

Measures of Central Tendency — Deep & Clear Explanation

Measures of central tendency are statistical tools used to describe the center or typical value of a dataset. They help us understand where most values lie and summarize large data into a single representative number.

The three main measures are:

Mean (Average)

Median

Mode


1️⃣ Mean (Arithmetic Mean)
🔹 Definition

The mean is the sum of all values divided by the total number of observations.

🔹 Formula
Mean =∑𝑥/𝑛

Where:
∑x = sum of all values

n = number of observations

🔹 Example

Marks: 10, 20, 30, 40, 50

Mean

10
+
20
+
30
+
40
+
50

5

150

5

30
Mean=
5
10+20+30+40+50

=
5
150

=30
🔹 When to Use Mean

Data is numerical

Data is symmetrical

No extreme values (outliers)

🔹 Advantages

✔ Uses all observations
✔ Useful for further mathematical analysis
✔ Easy to understand

🔹 Limitations

❌ Highly affected by outliers

Example with outlier:
Income: 10k, 12k, 15k, 18k, 1,00,000k
→ Mean becomes misleading

2️⃣ Median
🔹 Definition

The median is the middle value when data is arranged in ascending or descending order.

🔹 Steps to Find Median

Arrange data in order

If odd number of values → middle value

If even number → average of two middle values

🔹 Example (Odd count)

Data: 5, 10, 15, 20, 25
Median = 15

🔹 Example (Even count)

Data: 10, 20, 30, 40
Median =
20
+
30

2

25
2
20+30

=25

🔹 When to Use Median

Data contains outliers

Data is skewed

Income, salary, property prices

🔹 Advantages

✔ Not affected by extreme values
✔ Represents real-world data better in skewed cases

🔹 Limitations

❌ Does not use all data values
❌ Not suitable for advanced mathematical calculations

3️⃣ Mode
🔹 Definition

The mode is the value that occurs most frequently in the dataset.

🔹 Example

Data: 2, 4, 4, 6, 8
Mode = 4

🔹 Types of Mode

Unimodal – one mode

Bimodal – two modes

Multimodal – more than two modes

No mode – all values occur once

🔹 When to Use Mode

Categorical data

Identifying most popular item

🔹 Advantages

✔ Works with non-numeric data
✔ Easy to identify

🔹 Limitations

❌ May not represent entire dataset
❌ Sometimes no clear mode

📌 Comparison Table
Measure Uses All Data Affected by Outliers Best For
Mean ✅ Yes ❌ Yes Symmetrical data
Median ❌ No ✅ No Skewed data
Mode ❌ No ✅ No Categorical data
📉 Relationship Between Mean, Median & Mode
1️⃣ Symmetrical Distribution

Mean = Median = Mode

2️⃣ Positively Skewed (Right Skewed)

Mean > Median > Mode

3️⃣ Negatively Skewed (Left Skewed)

Mean < Median < Mode

🎯 Real-World Examples
Scenario Best Measure Reason
Student marks Mean No extreme values
Salaries Median High income outliers
Shoe size in shop Mode Most demanded size
Customer rating Median/Mode Skewed ratings
🔍 Importance in Data Science & Business

Since you work with data science and analytics, these measures help in:

Understanding data distribution

Feature analysis

Data pre-processing

Business decisions

Dashboard insights (Tableau / Power BI)

Read More......

Top comments (0)