DEV Community

es404020
es404020

Posted on • Edited on

Identifying Outliers in a data set

What are Outliers

An Outlier is an extremely high or extremely low value in our data .It can be identify if it is greater than Q3 + 1.5(IQR) or lower tha Q1 - 1.5(IQR).

IQR = Q3 - Q1

Note:

  • IQR means Interquartile Range

  • Q1 means first quartile

  • Q3 means third quartile

`import numpy as np

data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97,
101, 105, 112, 116]

Q1 = np.median(data[:10])

Q3 = np.median(data[10:])

IQR = Q3 - Q1

print(IQR)

`

Other example

import numpy as np
import pandas as pd
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})


q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25


iqr

5.75
Enter fullscreen mode Exit fullscreen mode

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more