DEV Community

ALI MANSOOR
ALI MANSOOR

Posted on

1 1

Implementing K-Means Clustering from scratch in Python

Disclaimer! I am a student learning Datascience, Machine Learning. What I write here might have mistakes, do point them out in comments or reach out directly to me at my linkedin account.

What is K-Means Clustering?

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. - Wikipedia

If you did not understand this wikipedia definition like me, let me explain it in simpler terms.
In K-means clustering we divide n number of observations into k groups/clusters in such a way that the observations similar to each other are linked in one group.

K-Means Convergence

Image Credits: Wikipedia

Steps for K-Means Clustering

  1. Decide the value of k, which is the number of groups to divide your observations into.
  2. Select k random points C (aka centroids) for each cluster within your observations.
  3. Calculate absolute difference of each point from all centroids. |X-C|
  4. Put the observation in the cluster which has the closest centroid.
  5. Calculate new centroid for each cluster by taking average of all observations in that cluster.
  6. Repeat step 3-5 until the centroids stop changing.
  7. You have successfully organized n observations in k clusters.

I have also written the python code from scratch to implement k-means clustering for n-clusters, it currently works for 2-4 clusters(limited color values) but sometimes goes into infinite loop. if given n values of colors, it can work for n clusters

Github: https://github.com/TheAli711/datascience/tree/main/k-means-clustering

See you guys in the next article :)

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry πŸ•’

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more β†’

Top comments (0)

Sentry image

See why 4M developers consider Sentry, β€œnot bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

πŸ‘‹ Kindness is contagious

Please leave a ❀️ or a friendly comment on this post if you found it helpful!

Okay