DEV Community

Nitin Kendre
Nitin Kendre

Posted on • Updated on

Outliers

My Simple Definition of an outlier is :

Outlier in a dataset means extremely high or extremely low value from other values in dataset.

Outliers

In above graph we can see that value below 50 and greater than 150 are outliers.

Finding Outliers from a Dataset

An outliers follows the any one condition from below.

1. outlier < Q1 - 1.5*(IQR)
2. outlier > Q3 + 1.5*(IQR)

where

  1. IQR = Interquartile range
  2. Q1 = Lower Quartile
  3. Q2 = Median or 2 Quartile
  4. Q3 = Upper Quartile

From above we can say that 1st rule means data point should be below of lower quartile and 2nd rule means that data point should be greater than upper quartile.

To Find the outlier we have to calculate the Q1, Q2, Q3 and IQR first.

Finding the Upper, median, lower quartile and inter quartile range in an odd dataset:

Let, we have -

3,5,1,4,2,6,7
Enter fullscreen mode Exit fullscreen mode

The first step is to sort the given data in ascending order.

1,2,3,4,5,6,7
Enter fullscreen mode Exit fullscreen mode

Now, here lowest value is 1 i.e. MIN and highest value is 7 i.e. MAX

Calculating Q2 in an odd dataset:

Now, Q2 means median or quartile 2, In this step we will calculate it.

Our given data contains odd values i.e. 7.
So, we have to divide it in equal to parts and there will be one middle value i.e. 4.

(1,2,3),4,(5,6,7)
Enter fullscreen mode Exit fullscreen mode

So here 4 is the median or Q2 value.

Now, To verify it OR an alternate way to calculate it.

index of median = (total_no_of_values+1)/2
Enter fullscreen mode Exit fullscreen mode

Here, (7+1)/2 = 4 which means number in dataset or array at place 4.

SO Q2 = 4.

Calculating Q1 in an odd dataset:

Initial Dataset :

(1,2,3),4,(5,6,7)
Enter fullscreen mode Exit fullscreen mode

So, to calculate the lower quartile or Q1 we have to take first half of the data. That is-

1,2,3
Enter fullscreen mode Exit fullscreen mode

So, here also we have to pick the middle value i.e. 2

And Formula to calculate the Q1 is -

Q1_place = (total_number_of_values_in_first_half+1)/2
Q1_place = (3+1)/2
Q1_place = 2
Enter fullscreen mode Exit fullscreen mode

That means, Q1 is at place 2 in data.
So, Q1 = 2.

Calculating Q3 in an odd dataset:

It is similar to calculating Q1 but instead of First half we have to take another half.
(1,2,3),4,(5,6,7)

5,6,7
Enter fullscreen mode Exit fullscreen mode

So, the middle value is Q3 i.e. 6

Formula:

Q3_place = (total_no_of_values_in_last_half+1)/2
Q3_place = (3+1)/2
Q3_place = 2
Enter fullscreen mode Exit fullscreen mode

That means, Q3 is at 2nd place in given half.
So, Q3 = 6.

Calculating IQR in an odd dataset:

Formula for Calculating IQR is :-

IQR = Q3 - Q1
Enter fullscreen mode Exit fullscreen mode

To find the IQR of given data from above-

IQR = 6-2
IQR = 4
Enter fullscreen mode Exit fullscreen mode

To find an outlier in an odd dataset:

Given Data is-

1,2,3,4,5,6,7
Enter fullscreen mode Exit fullscreen mode

We have calculated -

MIN = 1
Q1 = 2
MED = 4
Q3 = 6
MAX = 7
IQR = 4
Enter fullscreen mode Exit fullscreen mode

Now, we can find if any outliers in data -
A data point to be an outlier it must follow any one rule of below.

outlier < Q1 - 1.5*(IQR)
Enter fullscreen mode Exit fullscreen mode

OR

outlier > Q3 + 1.5*(IQR)
Enter fullscreen mode Exit fullscreen mode

So to find an outlier we have to calculate that minimum and maximum value.

outlier < Q1-1.5*(IQR)
outlier < 2-1.5*(4)
outlier < 2-6.0
outlier < -4
Enter fullscreen mode Exit fullscreen mode

There are no minimum value outliers, because there is no value in dataset less than -4.

Next,-

outlier > Q3 + 1.5*(IQR)
outlier > 6 + 1.5*4
outlier > 6 + 6
outlier > 12
Enter fullscreen mode Exit fullscreen mode

And There are no maximum value outliers in data.

Finding the upper, median, lower quartile and IQR in an Even Dataset:

The process of finding quartiles and finding the outliers is bit different from odd dataset.

Calculating Q2 in an even dataset:

Let, we have -

4,8,12,16,20,24,28,32
Enter fullscreen mode Exit fullscreen mode

Now, given data is already sorted.
So, to find the median or Q2 we have get an average of middle two numbers. Like-

(4,8,12),16,20,(24,28,32)
Enter fullscreen mode Exit fullscreen mode

So, here we have to take average of 20 and 25, and that will be our median or Q2.

Q2 = (16+20)/2
Q2 = 36/2
Q2 = 18
Enter fullscreen mode Exit fullscreen mode

Calculating Q1 in an even dataset:

To calculate Q1 we have to cut given dataset in half -

4,8,12,16 | 20,24,28,32
Enter fullscreen mode Exit fullscreen mode

Here to find the Q1, we have to take average of the middle 2 numbers of first half -

4,(8,12),16
Enter fullscreen mode Exit fullscreen mode

That is, average of 8 and 12

Q1 = (8+12)/2
Q1 = 20/2
Q1 = 10
Enter fullscreen mode Exit fullscreen mode

Calculating Q3 in an even dataset:

Showing given data in two half's -

4,8,12,16 | 20,24,28,32
Enter fullscreen mode Exit fullscreen mode

To calculate Q3, we have to take average of middle two numbers of last half. Like -

20,(24,28),32
Enter fullscreen mode Exit fullscreen mode

That is, average of 24 and 28

Q3 = (24+28)/2
Q3 = 52/2
Q3 = 26
Enter fullscreen mode Exit fullscreen mode

Calculating IQR in an even dataset:

Calculating IQR is same as from Odd dataset, That is -

IQR = Q3 - Q1
IQR = 26 - 10
IQR = 16
Enter fullscreen mode Exit fullscreen mode

Finding an outlier in an even dataset:

Now, we have calculated terms required -

MIN = 4
Q1 = 10
MED = 18
Q3 = 26
MAX = 32
IQR = 16
Enter fullscreen mode Exit fullscreen mode

Rules for outliers -

outlier < Q1 - 1.5*(IQR)
Enter fullscreen mode Exit fullscreen mode

OR

outlier > Q3 + 1.5*(IQR)
Enter fullscreen mode Exit fullscreen mode

Finding minimum value outlier -

outlier < Q1 - 1.5*(IQR)
outlier < 10 - 1.5*(16)
outlier < 10 - 24
outlier < -14
Enter fullscreen mode Exit fullscreen mode

SO, There is no minimum value outlier, Because there no value less than -14.

Finding Maximum value outlier -

outlier > Q3 + 1.5*(IQR)
outlier > 26 + 1.5*(16)
outlier > 26 + 24
outlier > 50
Enter fullscreen mode Exit fullscreen mode

Here, Also no outlier, because there is no value greater than 50.

Conclusion:

In this article we learned about how to calculate quartiles, inter quartile range and outliers.

Thank You!

Oldest comments (0)