DEV Community

Cover image for How to Find and Remove Anomalies with Local Outlier Factor (LOF)
Steffi
Steffi

Posted on

How to Find and Remove Anomalies with Local Outlier Factor (LOF)

What Are Outliers?

Outliers are data points that differ significantly from the rest of the dataset.

  • Global Outlier: falls outside the normal range of a dataset
  • Local Outlier: are outliers that are within the normal range of a dataset but different from its neighbours

Machine learning algorithms don't work well when outliers are present. Outlier detection is important in many applications.

Why Local Outlier Factor (LOF)?

LOF is a density-based, unsupervised approach:

  • which identifies outliers relative to their local neighbourhood
  • LOF Score > 1 → Outlier
  • Fast and robust for clusters with varying densities

Implementation in Python

Import Libraries

from sklearn.neighbors import LocalOutlierFactor
import pandas as pd
import matplotlib.pyplot as plt
Enter fullscreen mode Exit fullscreen mode

Load Dataset

data = pd.read_csv("fraud_lof_example.csv")
Enter fullscreen mode Exit fullscreen mode

Define LOF Model

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
scores = lof.negative_outlier_factor_
clean_data = data[yhat != -1]
Enter fullscreen mode Exit fullscreen mode

Visualise Outliers

outliers = data[yhat == -1]

plt.figure(figsize=(10,6))
plt.scatter(data['Amount'], data['Distance_to_Home_Town'], color='blue', label='Normal')
plt.scatter(outliers['Amount'], outliers['Distance_to_Home_Town'], color='red', label='Outliers')
plt.xlabel('Amount')
plt.ylabel('Distance to Home Town')
plt.title('Local Outlier Visualization')
plt.legend()
plt.show()
Enter fullscreen mode Exit fullscreen mode

Conclusion

This was an introduction to local outlier detection and Local Outlier Factor. LOF is a simple yet effective method to detect and remove local outliers.

If you’d like to try LOF hands-on, I’ve put together a small practical bundle with:

  • A Jupyter Notebook (complete LOF implementation)
  • A Cheat Sheet (parameters and score interpretation)
  • A Sample Dataset (ready to use)

You can find it here:
Anomaly Detection Starter Kit

With the notebook, cheat sheet, and sample dataset, you can start cleaning your data for ML immediately.

Want more practical ML tutorials? Subscribe to my newsletter:
DataScience&AILearningJourney’s

For a deeper, more theory-focused explanation, you can check out the full version on Medium:
Medium Article

Top comments (0)