DEV Community

Cover image for Kmean: Selecting the number of clusters
es404020
es404020

Posted on

Kmean: Selecting the number of clusters

What is the criteria for setting the number of clusters ? .Is it just selected by vibes and is there a standard approach to selecting this.

This are very valid question has a machine learning expert .In the article we would be look at the best way to pick the number of cluster .

The widely accepted method of picking the number of cluster is the ELBOW METHOD

Remember that clustering is about

  1. Minimizing the distance between points in a cluster

  2. Maximizing. the distance between clusters

For Kmeans this two occurs at the same time .The distance between points in a cluster is measures using

within-cluster sum of squares or WCSS

WCSS is a measure developed within the ANOVA framework.If we minimize WCSS, we have reached the perfect clustering solutions. The elbow of the graph shows the best possible number of cluster to be used.

The below screenshot shows the best solution for this :

Image description

Image description

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up