My Simple Definition of an outlier is :
Outlier in a dataset means extremely high or extremely low value from other values in dataset.
In above graph we can see that value below 50 and greater than 150 are outliers.
Finding Outliers from a Dataset
An outliers follows the any one condition from below.
1. outlier < Q1 - 1.5*(IQR)
2. outlier > Q3 + 1.5*(IQR)
where
- IQR = Interquartile range
- Q1 = Lower Quartile
- Q2 = Median or 2 Quartile
- Q3 = Upper Quartile
From above we can say that 1st rule means data point should be below of lower quartile and 2nd rule means that data point should be greater than upper quartile.
To Find the outlier we have to calculate the Q1, Q2, Q3 and IQR
first.
Finding the Upper, median, lower quartile and inter quartile range in an odd dataset:
Let, we have -
3,5,1,4,2,6,7
The first step is to sort the given data in ascending order.
1,2,3,4,5,6,7
Now, here lowest value is 1 i.e. MIN and highest value is 7 i.e. MAX
Calculating Q2 in an odd dataset:
Now, Q2 means median or quartile 2, In this step we will calculate it.
Our given data contains odd values i.e. 7.
So, we have to divide it in equal to parts and there will be one middle value i.e. 4.
(1,2,3),4,(5,6,7)
So here 4 is the median or Q2 value.
Now, To verify it OR an alternate way to calculate it.
index of median = (total_no_of_values+1)/2
Here, (7+1)/2 = 4
which means number in dataset or array at place 4.
SO Q2 = 4
.
Calculating Q1 in an odd dataset:
Initial Dataset :
(1,2,3),4,(5,6,7)
So, to calculate the lower quartile or Q1 we have to take first half of the data. That is-
1,2,3
So, here also we have to pick the middle value i.e. 2
And Formula to calculate the Q1 is -
Q1_place = (total_number_of_values_in_first_half+1)/2
Q1_place = (3+1)/2
Q1_place = 2
That means, Q1 is at place 2 in data.
So, Q1 = 2
.
Calculating Q3 in an odd dataset:
It is similar to calculating Q1 but instead of First half we have to take another half.
(1,2,3),4,(5,6,7)
5,6,7
So, the middle value is Q3 i.e. 6
Formula:
Q3_place = (total_no_of_values_in_last_half+1)/2
Q3_place = (3+1)/2
Q3_place = 2
That means, Q3 is at 2nd place in given half.
So, Q3 = 6
.
Calculating IQR in an odd dataset:
Formula for Calculating IQR is :-
IQR = Q3 - Q1
To find the IQR of given data from above-
IQR = 6-2
IQR = 4
To find an outlier in an odd dataset:
Given Data is-
1,2,3,4,5,6,7
We have calculated -
MIN = 1
Q1 = 2
MED = 4
Q3 = 6
MAX = 7
IQR = 4
Now, we can find if any outliers in data -
A data point to be an outlier it must follow any one rule of below.
outlier < Q1 - 1.5*(IQR)
OR
outlier > Q3 + 1.5*(IQR)
So to find an outlier we have to calculate that minimum and maximum value.
outlier < Q1-1.5*(IQR)
outlier < 2-1.5*(4)
outlier < 2-6.0
outlier < -4
There are no minimum value outliers, because there is no value in dataset less than -4.
Next,-
outlier > Q3 + 1.5*(IQR)
outlier > 6 + 1.5*4
outlier > 6 + 6
outlier > 12
And There are no maximum value outliers in data.
Finding the upper, median, lower quartile and IQR in an Even Dataset:
The process of finding quartiles and finding the outliers is bit different from odd dataset.
Calculating Q2 in an even dataset:
Let, we have -
4,8,12,16,20,24,28,32
Now, given data is already sorted.
So, to find the median or Q2 we have get an average of middle two numbers. Like-
(4,8,12),16,20,(24,28,32)
So, here we have to take average of 20 and 25, and that will be our median or Q2.
Q2 = (16+20)/2
Q2 = 36/2
Q2 = 18
Calculating Q1 in an even dataset:
To calculate Q1 we have to cut given dataset in half -
4,8,12,16 | 20,24,28,32
Here to find the Q1, we have to take average of the middle 2 numbers of first half -
4,(8,12),16
That is, average of 8 and 12
Q1 = (8+12)/2
Q1 = 20/2
Q1 = 10
Calculating Q3 in an even dataset:
Showing given data in two half's -
4,8,12,16 | 20,24,28,32
To calculate Q3, we have to take average of middle two numbers of last half. Like -
20,(24,28),32
That is, average of 24 and 28
Q3 = (24+28)/2
Q3 = 52/2
Q3 = 26
Calculating IQR in an even dataset:
Calculating IQR is same as from Odd dataset, That is -
IQR = Q3 - Q1
IQR = 26 - 10
IQR = 16
Finding an outlier in an even dataset:
Now, we have calculated terms required -
MIN = 4
Q1 = 10
MED = 18
Q3 = 26
MAX = 32
IQR = 16
Rules for outliers -
outlier < Q1 - 1.5*(IQR)
OR
outlier > Q3 + 1.5*(IQR)
Finding minimum value outlier -
outlier < Q1 - 1.5*(IQR)
outlier < 10 - 1.5*(16)
outlier < 10 - 24
outlier < -14
SO, There is no minimum value outlier, Because there no value less than -14.
Finding Maximum value outlier -
outlier > Q3 + 1.5*(IQR)
outlier > 26 + 1.5*(16)
outlier > 26 + 24
outlier > 50
Here, Also no outlier, because there is no value greater than 50.
Conclusion:
In this article we learned about how to calculate quartiles, inter quartile range and outliers.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.