DEV Community

Cover image for Day 18 of improving my Data Science skills
Sylvester Promise
Sylvester Promise

Posted on

Day 18 of improving my Data Science skills

A boxplot that explains how to determine IQRIf AI is only as smart as the data we feed it, what happens when the data is lying to us?
Today, using the IQR method, I learned how to find and handle outliers, those sneaky values that can wreck averages and fool models.

Why IQR method?
The Inter-Quartile Range (IQR) focuses on the middle 50% of your data (Q1 - Q3). It is robust to extreme values (unlike the mean/std) and great for spotting values that sit far outside the common range.

How do we use this method to find outliers?

Step 1: compute Q1 & Q3
Q1 = 25th percentile (lower quartile)
Q3 = 75th percentile (upper quartile)

Step 2: compute the IQR
IQR = Q3 - Q1
Step 3: determine outlier limits
Lower limit = Q1 - 1.5 * IQR
Upper limit = Q3 + 1.5 * IQR
Any value < lower limit or > upper limit is commonly considered an outlier.

Step 3: label or extract outliers
Mark rows with values outside the limits, inspect them, then choose action (keep, remove, or investigate).

Step 4: decide what to do
If outliers are data errors, fix or remove.
If outliers are real rare events, consider keeping them but using robust models or transform them.
Always document your decision, it matters for reproducibility

I am loving this data science learning journey each and everyday😊

Thank you all for remaining my accountability partners, consciously, subconsciously and unconsciously🫣

When I get my first Data Science job in a finance company, I go share the money with una😁❤️

-SP🤍

Top comments (0)