Hierarchical clustering remains one of the most widely used unsupervised learning techniques in analytics, machine learning, and applied data science. Despite the rise of large-scale and deep-learning–based clustering approaches, hierarchical methods continue to be preferred for interpretability, explainability, and exploratory data analysis, especially in business analytics, social sciences, bioinformatics, and market segmentation.
This updated guide revisits hierarchical clustering using modern R workflows and industry best practices, while preserving the original intent: building a strong conceptual foundation and implementing clustering step by step in R.
What Is Hierarchical Clustering?
Clustering is a technique used to group similar observations into clusters while keeping dissimilar observations separate. Hierarchical clustering differs from other clustering approaches (such as k-means) because it builds a tree-based structure (hierarchy) rather than forcing the data into a fixed number of clusters upfront.
A simple analogy is a library system:
The library contains sections
Sections contain shelves
Shelves contain books
Books are grouped by subject
This naturally forms a hierarchy, which is exactly how hierarchical clustering organizes data.
Hierarchical clustering produces a dendrogram, a tree-like diagram that visually represents how clusters are merged or split at different levels of similarity.
Types of Hierarchical Clustering
Hierarchical clustering can be performed in two fundamental ways:
- Divisive Clustering (Top-Down)
In the divisive approach, all observations start in a single cluster. The algorithm then repeatedly splits clusters into smaller ones until each observation forms its own cluster.
This method is commonly known as DIANA (Divisive Analysis).
Key characteristics:
Good at identifying large, high-level clusters
Computationally expensive
Less commonly used in practice
- Agglomerative Clustering (Bottom-Up)
The agglomerative approach is the most widely used hierarchical method in real-world analytics. It begins with each observation as its own cluster and then iteratively merges the most similar clusters.
This method is also known as:
HAC (Hierarchical Agglomerative Clustering)
AGNES (Agglomerative Nesting)
Why it dominates industry usage:
More intuitive
Efficient for medium-sized datasets
Works well with visual diagnostics (dendrograms)
In practice:
Divisive methods are useful for high-level segmentation, while agglomerative methods excel at discovering fine-grained patterns.
For the rest of this article, we focus on Agglomerative Hierarchical Clustering, which accounts for the majority of production and research use cases.
The Agglomerative Clustering Algorithm
The classical hierarchical clustering procedure, formalized by Johnson, follows these steps:
Assign each observation to its own cluster.
Compute a distance (or similarity) matrix between all clusters.
Merge the two closest clusters.
Recompute distances between the new cluster and existing clusters.
Repeat steps 3 and 4 until all observations form a single cluster.
This process results in a nested hierarchy, which can later be cut at any level to obtain a desired number of clusters.
Measuring Distance Between Clusters (Linkage Methods)
The effectiveness of hierarchical clustering depends heavily on how distances between clusters are defined. The most commonly used linkage methods are:
Single Linkage
Distance = shortest distance between any two points in different clusters
Tends to create long, chain-like clusters
Sensitive to noise and outliers
Complete Linkage
Distance = longest distance between any two points in different clusters
Produces compact, well-separated clusters
Outliers can delay merging
Average Linkage
Distance = average distance between all point pairs across clusters
Balanced approach, commonly used in exploratory analysis
Ward’s Method (Industry Favorite)
Minimizes within-cluster variance
Merges clusters that result in the smallest increase in total error
Widely used in:
Customer segmentation
Behavioral analytics
Social science research
Current best practice:
Ward’s method is often the default choice for numeric data when interpretability and cluster compactness matter.
Preparing Data for Hierarchical Clustering
Before clustering, data preparation is critical:
Rows must represent observations
Columns must represent variables
Handle missing values (remove or impute)
Scale numeric variables to ensure comparability
We’ll use the Freedman dataset from the car package, which contains socio-economic indicators for U.S. metropolitan areas.
data <- car::Freedman
data <- na.omit(data)
data <- scale(data)
Scaling ensures that no variable dominates the clustering process due to unit differences—a standard requirement in modern analytics pipelines.
Implementing Hierarchical Clustering in R
R provides robust, well-maintained tools for hierarchical clustering:
hclust() from the stats package
agnes() and diana() from the cluster package
Agglomerative Clustering with hclust
d <- dist(data, method = "euclidean")
hc <- hclust(d, method = "complete")
plot(hc, cex = 0.6, hang = -1)
Agglomerative Clustering with agnes
The agnes() function provides an agglomerative coefficient, which quantifies clustering strength (values closer to 1 indicate stronger structure).
hc_agnes <- agnes(data, method = "complete")
hc_agnes$ac
Comparing Linkage Methods
A modern workflow involves evaluating multiple linkage strategies before choosing one.
methods <- c("average", "single", "complete", "ward")
ac <- sapply(methods, function(m) agnes(data, method = m)$ac)
ac
In most real-world datasets, Ward’s method typically yields the strongest clustering structure.
Divisive Clustering with diana
Although less common, divisive clustering can still be valuable for high-level exploration.
hc_div <- diana(data)
hc_div$dc
pltree(hc_div, cex = 0.6, hang = -1)
Assigning Cluster Labels
Once the dendrogram is built, clusters can be extracted using cutree().
clusters <- cutree(hc_div, k = 5)
For visualization, the factoextra package offers modern plotting utilities:
fviz_cluster(list(data = data, cluster = clusters))
Advanced Dendrogram Manipulation
The dendextend package enables advanced dendrogram customization and comparison.
Comparing Clustering Methods with a Tanglegram
library(dendextend)
hc_single <- as.dendrogram(agnes(data, method = "single"))
hc_complete <- as.dendrogram(agnes(data, method = "complete"))
tanglegram(hc_single, hc_complete)
Tanglegrams are particularly useful for method comparison, model validation, and research reporting.
Final Thoughts
Hierarchical clustering remains a cornerstone of exploratory data analysis in 2026. While modern datasets are growing larger and more complex, hierarchical methods continue to deliver unmatched interpretability and flexibility.
In this article, we:
Explored divisive and agglomerative clustering
Compared linkage methods with practical metrics
Implemented clustering using modern R workflows
Visualized and interpreted dendrograms
Assigned and validated cluster labels
While we assumed the number of clusters (k) was known, real-world projects often require experimentation and domain expertise. Use business context, validation metrics, and visualization together—no single heuristic works best for all datasets.
Hierarchical clustering is not just a technique; it’s a thinking framework for understanding structure in data.
Our mission is “to enable businesses unlock value in data.” We do many activities to achieve that—helping you solve tough problems is just one of them. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — to solve complex data analytics challenges. Our services include Snowflake Consultants and Power bi implementation services— turning raw data into strategic insight.
Top comments (0)