DEV Community

Cover image for From Pythagorean Theorem to K-Means: How Grade School Math Powers Machine Learning
Adolph Odhiambo
Adolph Odhiambo

Posted on

3 1

From Pythagorean Theorem to K-Means: How Grade School Math Powers Machine Learning

A Fourth-Grade Discovery That Shaped My Career

Back in 2010, I was a curious fourth-grader staring at the colorful posters on my math classroom walls. Among them was one that etched itself into my memory: the Pythagorean Theorem. Memorizing the iconic triples like (3-4-5) and (5-12-13) felt like solving magical puzzles. I didn’t know it then, but that simple formula a2+b2=c2a^2 + b^2 = c^2 would one day power some of the most critical algorithms I use in my career.

Pythagorean

Fast forward 15 years, and I’m a data scientist clustering customers based on their transaction patterns and account growth. My go-to algorithm? K-Means clustering, a machine learning technique that owes its elegance and efficiency to none other than the Pythagorean theorem.

The Pythagorean Theorem’s Hidden Superpower

The Pythagorean theorem states:

c=a2+b2 c = \sqrt{a^2 + b^2}


where ( c ) is the hypotenuse of a right triangle, and ( a ), ( b ) are the other two sides.

But here’s the twist: this formula is the secret sauce behind measuring distances in machine learning.

By simply reinterpreting the sides of the triangle, we can measure distances in higher-dimensional spaces a technique that underpins many algorithms.

The Secret Superpower: Euclidean Distance

Let’s start with the Euclidean distance—the straight-line distance between two points. Imagine two points on a 2D plane, (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2) . The distance between them is:

d=(x2x1)2+(y2y1)2 d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}

This formula is essentially the Pythagorean theorem in disguise! Instead of triangle sides, the differences in xx - and yy -coordinates form the “legs,” while the hypotenuse becomes the distance between points.Instead of triangle sides, we’re measuring the “straight-line” distance between points.

Why Distance Matters in Machine Learning ?

In machine learning, distance = similarity. The closer two data points are in a feature space, the more alike they are.

For example, consider two customers:

  • Customer A: 25 years old, earning $50K. Represented as:
a=[2550] \vec{\mathbf{a}} = \begin{bmatrix} 25 & 50 \end{bmatrix}
  • Customer B: 40 years old, earning $80K. Represented as:
b=[4080] \vec{\mathbf{b}} = \begin{bmatrix} 40 & 80 \end{bmatrix}

To measure their similarity, calculate the Euclidean distance:

d=(4025)2+(8050)2=225+900=33.54 d = \sqrt{(40 - 25)^2 + (80 - 50)^2} = \sqrt{225 + 900} = 33.54

The smaller the distance, the more similar the customers. This simple concept becomes the backbone of clustering algorithms like K-Means.

Scaling to Higher Dimensions (and Real-World Problems)

What if we add more features, like number of purchases or average transaction amount? The Euclidean distance formula adapts effortlessly:

d=(x2x1)2+(y2y1)2+(z2z1)2+ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2 + \dots}

Even in 100-dimensional space, the principle remains the same simplified as:

d=i=1n(xi2xi1)2 d = \sqrt{\sum_{i=1}^n (x_{i2} - x_{i1})^2}

K-Means Clustering: Geometry in Action

KMEANS

K-Means is one of the most popular clustering algorithms in machine learning. Here’s how it works:

  1. Initialization: Start by guessing initial cluster centers (centroids).
  2. Assignment: Assign each data point to the nearest centroid, using Euclidean distance.
  3. Update: Recalculate the centroids as the average of all points assigned to them.
  4. Repeat: Continue until the centroids stabilize.

Euclidean distance is the heart of this process, ensuring that clusters group together similar points.

Full Circle: A Fourth-Grade Formula in Action

In my customer segmentation project, every customer’s transaction history became a vector in multi-dimensional space. By calculating Euclidean distances, I grouped customers with similar behavior patterns into clusters. This allowed my team to design targeted marketing strategies and predict account growth effectively.

Looking back, it’s incredible to see how a formula I first encountered in elementary school has grown with me, becoming a tool I use every day.

Final Thoughts

Math isn’t just a subject , it’s a lens to understand the world. The Pythagorean theorem, once a tool to solve triangles, now powers machine learning models that drive real-world decisions. Whether it’s triangles on a chalkboard or billion-dollar ML models, the fundamentals remain timeless. Next time you see a right triangle, remember: you’re staring at the foundation of modern AI.

Billboard image

Monitoring as code

With Checkly, you can use Playwright tests and Javascript to monitor end-to-end scenarios in your NextJS, Astro, Remix, or other application.

Get started now!

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay