DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

DBSCAN From Scratch: Clustering by Density (with Noise)

K-Means makes you pick the number of clusters and only finds round blobs. DBSCAN does neither — it grows clusters from dense regions, discovers the count on its own, and flags outliers as noise. Here it is, clustering two moons that K-Means could never split.

🌌 Try it (drag eps + minPts): https://dev48v.infy.uk/ml/day16-dbscan.html

Two knobs, no K

  • eps — the neighborhood radius.
  • minPts — how many neighbors (within eps) a point needs to count as "dense."

That's it. You never say how many clusters there are.

Three kinds of points

  • Core — has ≥ minPts neighbors within eps (sits in a dense region).
  • Border — within eps of a core point, but not dense itself.
  • Noise — neither. The loners. DBSCAN labels them as outliers instead of forcing them into a cluster.

How clusters form

Pick an unvisited point; if it's core, start a cluster and grow it outward through density-connected core points (a BFS), sweeping in their borders. Repeat. Because clusters chain through density, DBSCAN finds arbitrary shapes — moons, rings, blobs — and the count emerges from the data.

When to use it

Great for spatial/geo data and anomaly detection (noise = anomalies). Weak when clusters have very different densities or in high dimensions.

🔨 Built from scratch (region query → core/border/noise → BFS expand) on the page: https://dev48v.infy.uk/ml/day16-dbscan.html

Part of MachineLearningFromZero. 🌐 https://dev48v.infy.uk

Top comments (0)