DEV Community

Cover image for Unsupervised Machine Learning. K-Means & Hierarchical Clustering
Kelvin
Kelvin

Posted on

Unsupervised Machine Learning. K-Means & Hierarchical Clustering

Unsupervised machine learning is a branch of machine learning where models are trained on data without labelled outcomes. Unlike supervised learning, where the goal is to predict a known target, unsupervised learning focuses on discovering hidden patterns, structures, or relationships within the data.

Common tasks in unsupervised learning include:

  • Clustering (grouping similar data points)
  • Dimensionality reduction

Clustering is the process of grouping data points such that points within the same cluster are similar and points in different clusters are dissimilar.

Similarity is usually measured using distance metrics like:

  • Euclidean distance (most common)
  • Manhattan distance
  • Cosine similarity

K-Means Clustering.

K-Means is a partition-based clustering algorithm that divides data into K distinct clusters, where K is predefined. The goal is to minimize the within-cluster variance, also called inertia.

How K-Means Works

  1. Choose K (number of clusters) - Example: K = 3
  2. Initialize centroids randomly - These are K points representing cluster centers.
  3. Assign data points to nearest centroid - Each point is assigned to the cluster with the closest centroid (using distance, usually Euclidean).
  4. Update centroids - Compute the new centroid as the mean of all points in that cluster. iterate steps 3 and 4 until Centroids stop changing, or maximum iterations is reached.

K-Means Workflow.

Scaling data

Finding the best K.

Plotting the elbow curve. This helps identify the best K - where the curve starts to plateau.

Elbow curve. (4 is our best k)

Training the model
Fits our model on 4 clusters then creates a new column named 'Clusters'.

Profiling clusters.
This code is all about understanding what each cluster actually represents after you’ve created them with K-Means.

Visualization of the clusters and the centroids.


Advantages of K-Means

  • Simple and fast
  • Works well on large datasets
  • Easy to interpret

Limitations of K-Means

  1. Must specify K in advance
  2. Sensitive to - Initial centroid placement & Outliers
  3. Assumes clusters are spherical and equally sized

Hierarchical Clustering.

Hierarchical clustering builds a tree-like structure of clusters, called a dendrogram. Unlike K-Means, it does not require specifying the number of clusters upfront.

There are two types:

  • Agglomerative (bottom-up) – most common
  • Divisive (top-down)

Agglomerative clustering
steps

  1. Start with all points separate: Treat each data point as its own cluster like A, B, C, ... Initially, you have n clusters for n data points.
  2. Compute pairwise distances: Calculate the distance between every pair of clusters. Common choices include Euclidean, Manhattan or Cosine distance. Store these values in a distance matrix. To know more about them refer to: Measures of Distance
  3. Merge the nearest clusters: Identify the two clusters that are closest based on the chosen linkage method such as single, complete, average or Ward linkage. Combine them into a single new cluster.
  4. Update distances: Recalculate the distances between the newly formed cluster and all remaining clusters. Use the same linkage rule to ensure consistency.
  5. Repeat the process: Continue merging clusters and updating distances iteratively. Stop when you reach a predefined number of clusters (k) or a distance threshold.
  6. Visualize the results: Create a dendrogram to visualize how clusters merged at each step. Choose a suitable cut on the dendrogram to obtain the final cluster groups.

Linkage methods
How we measure the distance between clusters.

  1. Single Linkage: Minimum distance between points
  2. Complete Linkage: Maximum distance
  3. Average Linkage: Average distance
  4. Ward’s Method: Minimizes variance (most common)

Dendrogram
A dendrogram is a tree diagram that shows:

  • How clusters are merged
  • At what distance they are merged You can “cut” the dendrogram at a certain height to decide the number of clusters.

Hierarchical model workflow.
Picking a small dataset for easier readability.

Linkage (Ward) to minimize variance.

Plotting the dendrogram.

Fitting the model.
Fits our model on 4 clusters then creates a new column named 'hc-cluster'.

Profiling.
This step helps understand what each cluster represents.

Visualization.
A comparison between the two models. K-Means & Hierarchical clustering.

Advantages of Hierarchical Clustering

  • No need to predefine number of clusters
  • Produces interpretable tree structure
  • Works well for small datasets

Limitations of Hierarchical Clustering

  • Computationally expensive (slow for large datasets)
  • Once clusters are merged, they cannot be undone
  • Sensitive to noise and outliers

Top comments (0)