What Are Outliers?
Outliers are data points that differ significantly from the rest of the dataset.
- Global Outlier: falls outside the normal range of a dataset
- Local Outlier: are outliers that are within the normal range of a dataset but different from its neighbours
Machine learning algorithms don't work well when outliers are present. Outlier detection is important in many applications.
Why Local Outlier Factor (LOF)?
LOF is a density-based, unsupervised approach:
- which identifies outliers relative to their local neighbourhood
- LOF Score > 1 → Outlier
- Fast and robust for clusters with varying densities
Implementation in Python
Import Libraries
from sklearn.neighbors import LocalOutlierFactor
import pandas as pd
import matplotlib.pyplot as plt
Load Dataset
data = pd.read_csv("fraud_lof_example.csv")
Define LOF Model
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
scores = lof.negative_outlier_factor_
clean_data = data[yhat != -1]
Visualise Outliers
outliers = data[yhat == -1]
plt.figure(figsize=(10,6))
plt.scatter(data['Amount'], data['Distance_to_Home_Town'], color='blue', label='Normal')
plt.scatter(outliers['Amount'], outliers['Distance_to_Home_Town'], color='red', label='Outliers')
plt.xlabel('Amount')
plt.ylabel('Distance to Home Town')
plt.title('Local Outlier Visualization')
plt.legend()
plt.show()
Conclusion
This was an introduction to local outlier detection and Local Outlier Factor. LOF is a simple yet effective method to detect and remove local outliers.
If you’d like to try LOF hands-on, I’ve put together a small practical bundle with:
- A Jupyter Notebook (complete LOF implementation)
- A Cheat Sheet (parameters and score interpretation)
- A Sample Dataset (ready to use)
You can find it here:
Anomaly Detection Starter Kit
With the notebook, cheat sheet, and sample dataset, you can start cleaning your data for ML immediately.
Want more practical ML tutorials? Subscribe to my newsletter:
DataScience&AILearningJourney’s
For a deeper, more theory-focused explanation, you can check out the full version on Medium:
Medium Article
Top comments (0)