Over the last few articles, we explored popular classification and regression algorithms, which fall under supervised learning. In this article, we shift gears and dive into a different and equally important paradigm in machine learning: unsupervised learning.
Unsupervised learning focuses on discovering hidden structures in data without labeled outcomes. Among these methods, clustering is foundational and widely used across industries—from customer segmentation and recommender systems to anomaly detection and exploratory data analysis (EDA).
In this guide, we take a practical and modern look at hierarchical clustering in R. While the core ideas remain timeless, we incorporate current best practices, updated R packages, and industry-relevant use cases, ensuring the approach aligns with how clustering is applied today.
Table of Contents
What Is Clustering Analysis?
Why Clustering Matters in Modern Data Science
Introduction to Hierarchical Clustering
Understanding Dendrograms
Agglomerative vs. Divisive Clustering
Linkage Methods and When to Use Them
Implementing Hierarchical Clustering in R
Data Preparation
Distance Measures
Core R Functions and Modern Packages
Visualizing Hierarchical Clusters (2D & 3D)
Complete R Code Example
Summary and Industry Takeaways
- What Is Clustering Analysis?
Clustering analysis is the process of grouping data points such that:
Observations within the same cluster are highly similar to each other
Observations in different clusters are dissimilar
The definition of “similarity” depends entirely on the problem you’re solving and the distance or similarity metric you choose.
For example:
Grouping news articles into topics (sports, business, entertainment)
Segmenting customers based on purchasing behavior
Organizing search results by semantic similarity
The guiding principle is simple:
Maximize similarity within clusters and minimize similarity between clusters.
- Why Clustering Matters in Modern Data Science
Today, clustering is central to many real-world applications, including:
Customer segmentation in marketing and growth analytics
User behavior analysis in SaaS and mobile apps
Fraud and anomaly detection in finance and cybersecurity
Biological data analysis, such as gene expression and protein similarity
AI-driven personalization and recommendation engines
With the rise of high-dimensional data, explainable AI, and exploratory analytics, hierarchical clustering has regained popularity because it provides structure, interpretability, and flexibility—not just flat cluster assignments.
- Introduction to Hierarchical Clustering
Hierarchical clustering is an alternative to algorithms like k-means. Unlike k-means, it does not require pre-specifying the number of clusters.
Instead, it builds a hierarchy of clusters that can be visualized as a tree structure, allowing analysts to explore data groupings at multiple levels of granularity.
Key characteristics:
Produces a nested hierarchy of clusters
Uses a distance or dissimilarity measure
Results are visualized using a dendrogram
Hierarchical clustering is particularly valuable in exploratory data analysis (EDA), where the goal is understanding structure rather than prediction.
- Understanding Dendrograms
A dendrogram is a tree-like diagram that shows:
How clusters are merged or split
The order of these operations
The distance at which clusters join
By cutting the dendrogram at different heights, you can obtain different numbers of clusters—making hierarchical clustering extremely flexible and interpretable.
- Agglomerative vs. Divisive Clustering
Hierarchical clustering methods fall into two main categories:
Agglomerative Clustering (Bottom-Up)
Starts with each observation as its own cluster
Iteratively merges the closest clusters
Continues until all points belong to a single cluster
This is the most commonly used approach and is well-supported in R.
Divisive Clustering (Top-Down)
Starts with all observations in one cluster
Recursively splits clusters into smaller groups
Less commonly used due to higher computational cost
In practice, agglomerative clustering is the industry standard.
- Linkage Methods and When to Use Them
A linkage method defines how the distance between two clusters is calculated.
Common linkage strategies include:
Single linkage: Minimum distance between points (can create long, chain-like clusters)
Complete linkage: Maximum distance between points (produces compact clusters)
Average linkage: Mean distance between all point pairs
Centroid linkage: Distance between cluster centroids
Ward’s method: Minimizes within-cluster variance (very popular in practice)
Industry tip (2025): Ward’s method combined with Euclidean distance is often the best starting point for numerical data.
- Implementing Hierarchical Clustering in R
Data Preparation
Before clustering, ensure:
Rows represent observations
Columns represent features
Missing values are handled
Features are standardized
We’ll use the built-in iris dataset.
df <- iris
df <- na.omit(df)
df <- scale(df[, 1:4])
Distance Matrix
d <- dist(df, method = "euclidean")
Hierarchical Clustering with hclust
hc <- hclust(d, method = "ward.D2")
plot(hc, main = "Hierarchical Clustering Dendrogram")
Modern Visualization (Recommended)
In current R workflows, packages like factoextra and dendextend are widely used.
library(factoextra)
fviz_dend(hc, k = 3, rect = TRUE)
These tools improve interpretability and presentation quality, especially for reports and dashboards.
- Visualizing Hierarchical Clusters in 3D
To build intuition, we can visualize clustering using three dimensions.
A1 <- c(2,3,5,7,8,10,20,21,23)
A2 <- A1
A3 <- A1
library(scatterplot3d)
scatterplot3d(A1, A2, A3, angle = 25, type = "h")
demo <- hclust(dist(cbind(A1, A2, A3)))
plot(demo)
Even in higher dimensions, hierarchical clustering follows the same logic—3D visualization simply helps build intuition.
- Complete R Code Example
Data preparation
df <- iris
df <- na.omit(df)
df <- scale(df[, 1:4])
Distance matrix
d <- dist(df, method = "euclidean")
Hierarchical clustering
hc <- hclust(d, method = "ward.D2")
plot(hc)
Enhanced visualization
library(factoextra)
fviz_dend(hc, k = 3, rect = TRUE)
- Summary and Industry Takeaways
Hierarchical clustering remains a cornerstone of cluster analysis, especially in exploratory and explainable analytics.
Key takeaways:
No need to predefine the number of clusters
Dendrograms provide rich interpretability
Ward’s method is a strong default choice
Modern R packages enhance visualization and usability
In today’s data-driven environments—where understanding structure often matters as much as prediction—hierarchical clustering offers clarity, flexibility, and insight that flat clustering methods cannot.
As data complexity grows, hierarchical approaches continue to play a critical role in AI, data science, and advanced analytics workflows.
I’ve completely re-titled and revised the blog to be a 7–8 minute read, while preserving the original intent, educational flow, and core values.
What I changed (at a high level)
✅ New, modern title aligned with current industry language
✅ Updated explanations to reflect 2024–2025 data science practices
✅ Added industry context and real-world relevance (EDA, explainability, AI use cases)
✅ Introduced modern R tooling (factoextra, better defaults like ward.D2)
✅ Improved structure, clarity, and narrative flow without altering the learning objectives
✅ Kept the tone instructional and beginner-friendly, but more professionally polished
Our mission is “to enable businesses unlock value in data.” We do many activities to achieve that—helping you solve tough problems is just one of them. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — to solve complex data analytics challenges. Our services include tableau consulting, and tableau consultancy — turning raw data into strategic insight.
Top comments (0)