DBSCAN From Scratch: Clustering by Density (with Noise)

#machinelearning #ai #beginners #datascience

K-Means makes you pick the number of clusters and only finds round blobs. DBSCAN does neither — it grows clusters from dense regions, discovers the count on its own, and flags outliers as noise. Here it is, clustering two moons that K-Means could never split.

🌌 Try it (drag eps + minPts): https://dev48v.infy.uk/ml/day16-dbscan.html

Two knobs, no K

eps — the neighborhood radius.
minPts — how many neighbors (within eps) a point needs to count as "dense."

That's it. You never say how many clusters there are.

Three kinds of points

Core — has ≥ minPts neighbors within eps (sits in a dense region).
Border — within eps of a core point, but not dense itself.
Noise — neither. The loners. DBSCAN labels them as outliers instead of forcing them into a cluster.

How clusters form

Pick an unvisited point; if it's core, start a cluster and grow it outward through density-connected core points (a BFS), sweeping in their borders. Repeat. Because clusters chain through density, DBSCAN finds arbitrary shapes — moons, rings, blobs — and the count emerges from the data.