Abhijeet Pratap Singh

Posted on Jul 2

K-Means Clustering (Unsupervised Learning)

#algorithms #beginners #datascience #machinelearning

1. The Problem It Solves

In many real-world problems, we don't have labeled data.

We may have thousands of customers, products, or transactions, but no information about which ones belong together.

For example:

Which customers behave similarly?
Which products attract similar buyers?
Which users are likely to become power users?
Which stores have similar purchasing patterns?

K-Means Clustering solves this problem by automatically grouping similar data points together based on their characteristics.

Unlike supervised learning, there are no labels telling the algorithm what the correct answer is.

Its job is simply to discover hidden patterns within the data.

2. Core Intuition

Imagine dropping hundreds of marbles onto a large table.

Your task is to separate them into three piles.

Nobody tells you which marble belongs where.

You begin by placing three random markers on the table.

These markers are called Centroids.

Now every marble walks to the nearest marker.

Once every marble has chosen a marker, you move each marker to the exact center of its group.

Because the markers moved, some marbles are now closer to a different marker.

They switch groups.

Again, the markers move to the center.

This process repeats until the markers stop moving.

Eventually, each marker sits at the center of a natural group of marbles.

That's exactly how K-Means works.

3. How the Algorithm Works

K-Means follows a simple iterative process.

Step 1 — Choose K

The first thing you must decide is the number of clusters.

This value is called K.

For example:

K = 2
K = 5
K = 10

Unlike supervised learning, the algorithm cannot determine this automatically.

You choose it before training begins.

Step 2 — Initialize Centroids

The algorithm randomly places K centroids inside the feature space.

These are simply starting points.

Modern implementations usually use K-Means++, which chooses better initial locations to improve convergence.

Step 3 — Assign Points to the Nearest Centroid

Every data point calculates its distance to every centroid.

The point joins whichever centroid is closest.

Mathematically, the assignment is based on minimizing Euclidean distance.

Where:

xᵢ = data point
μⱼ = centroid

Each point belongs to exactly one cluster.

Step 4 — Update the Centroids

Once all points have been assigned, each centroid moves to the average position of the points inside its cluster.

Where:

Sⱼ = all points assigned to cluster j

The centroid is literally the mathematical center of that cluster.

Step 5 — Repeat

The algorithm repeats two operations:

Assign points
Move centroids

until the centroids stop moving significantly.

At that point, the clusters are considered stable.

4. What Is K-Means Optimizing?

K-Means tries to make every cluster as compact as possible.

It minimizes the Within-Cluster Sum of Squares (WCSS), also called Inertia.

Lower WCSS means:

Points are closer to their centroid.
Clusters are tighter.
Similar observations stay together.

The algorithm stops when further improvements become very small.

5. Choosing the Right Value of K

One of the biggest challenges with K-Means is selecting the number of clusters.

The algorithm doesn't know how many groups actually exist.

Two common methods are used.

Elbow Method

Train the model using different values of K.

Plot WCSS against K.

Look for the point where improvement starts slowing down.

That bend is called the Elbow.

Silhouette Score

Measures how well-separated the clusters are.

Close to 1 → Excellent clusters
Around 0 → Overlapping clusters
Below 0 → Poor clustering

6. When Should You Use K-Means?

K-Means works well when:

Data contains natural groups.
Features are numerical.
Clusters are roughly spherical.
Cluster sizes are similar.
The dataset is relatively clean.

Typical applications include:

Customer segmentation
Product recommendation
User behavior analysis
Market segmentation
Image compression
Document clustering
Sales territory planning

7. Advantages

K-Means is popular because it is:

Simple to understand.
Fast on large datasets.
Easy to implement.
Highly scalable.
Computationally efficient.
Works well for many business segmentation problems.

8. When It Starts Breaking Down

K-Means also has important limitations.

Sensitive to Feature Scaling

Distance calculations are everything.

If one feature has much larger values than another, it dominates the clustering.

Example:

Salary → 100,000
Age → 25

Without scaling, salary completely overwhelms age.

This is why StandardScaler is almost always used before K-Means.

Poor for Non-Spherical Clusters

K-Means assumes clusters are roughly circular.

It struggles with:

Crescent shapes
Spiral data
Nested clusters
Elongated groups

Algorithms like DBSCAN or Hierarchical Clustering handle these cases much better.

Sensitive to Outliers

A single extreme point can pull the centroid far away from the true center.

This affects the entire cluster.

Requires K in Advance

You must decide the number of clusters before training.

Choosing the wrong value often produces poor segmentation.

Local Optimum

Because centroids start randomly, different runs can produce different results.

This is why modern implementations use K-Means++ and multiple random initializations (n_init).

9. Python Implementation

import numpy as np
import pandas as pd

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Generate sample CRM data
np.random.seed(42)

data = np.vstack([
    np.random.normal(
        loc=[5, 150],
        scale=[1, 20],
        size=(50, 2)
    ),
    np.random.normal(
        loc=[50, 2500],
        scale=[8, 300],
        size=(50, 2)
    )
])

df = pd.DataFrame(
    data,
    columns=[
        "Seat_Count",
        "Monthly_Billing"
    ]
)

# Scale features
scaler = StandardScaler()

X_scaled = scaler.fit_transform(df)

# Train K-Means
model = KMeans(
    n_clusters=2,
    init="k-means++",
    random_state=42,
    n_init=10
)

df["Cluster"] = model.fit_predict(X_scaled)

print(df.head())

print("\nCluster Centers\n")
print(model.cluster_centers_)

10. How to Evaluate the Model

Since K-Means has no labels, traditional accuracy metrics cannot be used.

Instead, we use clustering metrics.

Inertia (WCSS)

Measures how compact each cluster is.

Lower values indicate tighter clusters.

Silhouette Score

Measures both cohesion and separation.

Values close to 1 indicate well-separated clusters.

Davies-Bouldin Index

Measures cluster similarity.

Lower values indicate better clustering.

Calinski-Harabasz Index

Measures the ratio of separation between clusters to compactness within clusters.

Higher values generally indicate better-defined clusters.

11. Real-World Engineering Notes

Here are a few things you'll notice in production:

Always scale numerical features before using K-Means.
Remove obvious outliers whenever possible.
Try several values of K instead of assuming one is correct.
Use K-Means as an exploratory tool rather than expecting perfect segmentation.
K-Means is extremely fast, making it a great first clustering algorithm for large datasets.
For irregular cluster shapes, consider DBSCAN or Hierarchical Clustering instead.

12. Key Takeaways

K-Means is an unsupervised learning algorithm used for clustering.
It groups similar observations based on Euclidean distance.
The algorithm repeatedly assigns points to the nearest centroid and updates centroid locations until convergence.
Its objective is to minimize the Within-Cluster Sum of Squares (WCSS).
Feature scaling is essential because distance calculations drive the algorithm.
Choosing the correct number of clusters is one of the most important parts of using K-Means effectively.
It performs best on compact, well-separated, spherical clusters and is widely used for customer segmentation and exploratory data analysis.

DEV Community

K-Means Clustering (Unsupervised Learning)

1. The Problem It Solves

2. Core Intuition

3. How the Algorithm Works

Step 1 — Choose K

Step 2 — Initialize Centroids

Step 3 — Assign Points to the Nearest Centroid

Step 4 — Update the Centroids

Step 5 — Repeat

4. What Is K-Means Optimizing?

5. Choosing the Right Value of K

Elbow Method

Silhouette Score

6. When Should You Use K-Means?

7. Advantages

8. When It Starts Breaking Down

Sensitive to Feature Scaling

Poor for Non-Spherical Clusters

Sensitive to Outliers

Requires K in Advance

Local Optimum

9. Python Implementation

10. How to Evaluate the Model

Inertia (WCSS)

Silhouette Score

Davies-Bouldin Index

Calinski-Harabasz Index

11. Real-World Engineering Notes

12. Key Takeaways

Top comments (0)