When we have data with no identifiable pattern we may want to divide it into clusters. Clusters can be the amount of likes a patootie will get on a post based on her body weight; 90-120lbs=80-100likes, 121-150lbs=60-79likes, and 150+lbs=-59likes (#bodypositivity). At the beginning all the data we would have is body weight and amount of likes, what the k-means algorithm would do is try to find natural breakpoints in the data. At first, it randomly guesses a few “centers” (think of them as starting points for groups). Then it checks each data point — each person’s body weight and likes — and assigns it to the closest center. After everything has been assigned, it moves the centers to the “average position” of the points in that group. This process repeats until the clusters settle into place.
In the end, instead of us having to decide exactly where the cutoffs should be (like 90–120 lbs = 80–100 likes), the algorithm discovers those clusters on its own by looking at the patterns hidden in the data.
For more clarity, imagine you’re at a party where no one knows each other. At first, people are scattered randomly across the room. Then someone says, “Okay, let’s form groups based on who we naturally vibe with.” Everyone looks around, picks a spot, and gathers with the people they feel closest to. After a few minutes, some people switch groups because they realize they actually fit better with another circle. This shuffling continues until the groups feel stable, and no one wants to move anymore.
That’s basically how k-means works:
• The “people” are your data points.
• The “circles” are the clusters.
• The moving around is the algorithm adjusting until the groups make sense.
Top comments (0)