Dipti Moryani

Posted on Jan 7

Beyond K-Means: Modern Hierarchical Clustering in R

#ai #programming #datascience

Over the last few articles, we explored popular classification and regression algorithms, which fall under supervised learning. In this article, we shift gears and dive into a different and equally important paradigm in machine learning: unsupervised learning.

Unsupervised learning focuses on discovering hidden structures in data without labeled outcomes. Among these methods, clustering is foundational and widely used across industries—from customer segmentation and recommender systems to anomaly detection and exploratory data analysis (EDA).

In this guide, we take a practical and modern look at hierarchical clustering in R. While the core ideas remain timeless, we incorporate current best practices, updated R packages, and industry-relevant use cases, ensuring the approach aligns with how clustering is applied today.

Table of Contents

What Is Clustering Analysis?

Why Clustering Matters in Modern Data Science

Introduction to Hierarchical Clustering

Understanding Dendrograms

Agglomerative vs. Divisive Clustering

Linkage Methods and When to Use Them

Implementing Hierarchical Clustering in R

Data Preparation

Distance Measures

Core R Functions and Modern Packages

Visualizing Hierarchical Clusters (2D & 3D)

Complete R Code Example

Summary and Industry Takeaways

What Is Clustering Analysis?

Clustering analysis is the process of grouping data points such that:

Observations within the same cluster are highly similar to each other

Observations in different clusters are dissimilar

The definition of “similarity” depends entirely on the problem you’re solving and the distance or similarity metric you choose.

For example:

Grouping news articles into topics (sports, business, entertainment)

Segmenting customers based on purchasing behavior

Organizing search results by semantic similarity

The guiding principle is simple:

Maximize similarity within clusters and minimize similarity between clusters.

Why Clustering Matters in Modern Data Science

Today, clustering is central to many real-world applications, including:

Customer segmentation in marketing and growth analytics

User behavior analysis in SaaS and mobile apps

Fraud and anomaly detection in finance and cybersecurity

Biological data analysis, such as gene expression and protein similarity

AI-driven personalization and recommendation engines

With the rise of high-dimensional data, explainable AI, and exploratory analytics, hierarchical clustering has regained popularity because it provides structure, interpretability, and flexibility—not just flat cluster assignments.

Introduction to Hierarchical Clustering

Hierarchical clustering is an alternative to algorithms like k-means. Unlike k-means, it does not require pre-specifying the number of clusters.

Instead, it builds a hierarchy of clusters that can be visualized as a tree structure, allowing analysts to explore data groupings at multiple levels of granularity.

Key characteristics:

Produces a nested hierarchy of clusters

Uses a distance or dissimilarity measure

Results are visualized using a dendrogram

Hierarchical clustering is particularly valuable in exploratory data analysis (EDA), where the goal is understanding structure rather than prediction.

Understanding Dendrograms

A dendrogram is a tree-like diagram that shows:

How clusters are merged or split

The order of these operations

The distance at which clusters join

By cutting the dendrogram at different heights, you can obtain different numbers of clusters—making hierarchical clustering extremely flexible and interpretable.

Agglomerative vs. Divisive Clustering

Hierarchical clustering methods fall into two main categories:

Agglomerative Clustering (Bottom-Up)

Starts with each observation as its own cluster

Iteratively merges the closest clusters

Continues until all points belong to a single cluster

This is the most commonly used approach and is well-supported in R.

Divisive Clustering (Top-Down)

Starts with all observations in one cluster

Recursively splits clusters into smaller groups

Less commonly used due to higher computational cost

In practice, agglomerative clustering is the industry standard.

Linkage Methods and When to Use Them

A linkage method defines how the distance between two clusters is calculated.

Common linkage strategies include:

Single linkage: Minimum distance between points (can create long, chain-like clusters)

Complete linkage: Maximum distance between points (produces compact clusters)

Average linkage: Mean distance between all point pairs

Centroid linkage: Distance between cluster centroids

Ward’s method: Minimizes within-cluster variance (very popular in practice)

Industry tip (2025): Ward’s method combined with Euclidean distance is often the best starting point for numerical data.

Implementing Hierarchical Clustering in R

Data Preparation

Before clustering, ensure:

Rows represent observations

Columns represent features

Missing values are handled

Features are standardized

We’ll use the built-in iris dataset.

df <- iris

df <- na.omit(df)

df <- scale(df[, 1:4])

Distance Matrix

d <- dist(df, method = "euclidean")

Hierarchical Clustering with hclust

hc <- hclust(d, method = "ward.D2")

plot(hc, main = "Hierarchical Clustering Dendrogram")

Modern Visualization (Recommended)

In current R workflows, packages like factoextra and dendextend are widely used.

library(factoextra)

fviz_dend(hc, k = 3, rect = TRUE)

These tools improve interpretability and presentation quality, especially for reports and dashboards.

Visualizing Hierarchical Clusters in 3D

To build intuition, we can visualize clustering using three dimensions.

A1 <- c(2,3,5,7,8,10,20,21,23)

A2 <- A1

A3 <- A1

library(scatterplot3d)

scatterplot3d(A1, A2, A3, angle = 25, type = "h")

demo <- hclust(dist(cbind(A1, A2, A3)))

plot(demo)

Even in higher dimensions, hierarchical clustering follows the same logic—3D visualization simply helps build intuition.

Complete R Code Example

Data preparation

df <- iris

df <- na.omit(df)

df <- scale(df[, 1:4])

Distance matrix

d <- dist(df, method = "euclidean")

Hierarchical clustering

hc <- hclust(d, method = "ward.D2")

plot(hc)

Enhanced visualization

library(factoextra)

fviz_dend(hc, k = 3, rect = TRUE)

Summary and Industry Takeaways

Hierarchical clustering remains a cornerstone of cluster analysis, especially in exploratory and explainable analytics.

Key takeaways:

No need to predefine the number of clusters

Dendrograms provide rich interpretability

Ward’s method is a strong default choice

Modern R packages enhance visualization and usability

In today’s data-driven environments—where understanding structure often matters as much as prediction—hierarchical clustering offers clarity, flexibility, and insight that flat clustering methods cannot.

As data complexity grows, hierarchical approaches continue to play a critical role in AI, data science, and advanced analytics workflows.

I’ve completely re-titled and revised the blog to be a 7–8 minute read, while preserving the original intent, educational flow, and core values.

What I changed (at a high level)

✅ New, modern title aligned with current industry language

✅ Updated explanations to reflect 2024–2025 data science practices

✅ Added industry context and real-world relevance (EDA, explainability, AI use cases)

✅ Introduced modern R tooling (factoextra, better defaults like ward.D2)

✅ Improved structure, clarity, and narrative flow without altering the learning objectives

✅ Kept the tone instructional and beginner-friendly, but more professionally polished

Our mission is “to enable businesses unlock value in data.” We do many activities to achieve that—helping you solve tough problems is just one of them. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — to solve complex data analytics challenges. Our services include tableau consulting, and tableau consultancy — turning raw data into strategic insight.

DEV Community

Beyond K-Means: Modern Hierarchical Clustering in R

Data preparation

Distance matrix

Hierarchical clustering

Enhanced visualization

Top comments (0)