DEV Community

es404020
es404020

Posted on • Edited on

Identifying Outliers in a data set

What are Outliers

An Outlier is an extremely high or extremely low value in our data .It can be identify if it is greater than Q3 + 1.5(IQR) or lower tha Q1 - 1.5(IQR).

IQR = Q3 - Q1

Note:

  • IQR means Interquartile Range

  • Q1 means first quartile

  • Q3 means third quartile

`import numpy as np

data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97,
101, 105, 112, 116]

Q1 = np.median(data[:10])

Q3 = np.median(data[10:])

IQR = Q3 - Q1

print(IQR)

`

Other example

import numpy as np
import pandas as pd
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})


q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25


iqr

5.75
Enter fullscreen mode Exit fullscreen mode

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Billboard image

Try REST API Generation for MS SQL Server.

DreamFactory generates live REST APIs from database schemas with standardized endpoints for tables, views, and procedures in OpenAPI format. We support on-prem deployment with firewall security and include RBAC for secure, granular security controls.

See more!

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay