DEV Community

sunweiran
sunweiran

Posted on

3

Unsupervised clustering for faster pairwise distance computation

Recently at work, I was given a list of locations(around 70k+) with their lat and long coordinates. My task was to find the nearest point of all the points in the list.

My predecessor who developed the first version used the simplest and most straightforward way for this computation -- for each point, compute Euclidean distance with all other points. This computation takes about 30 minutes to finish as it involves 70k * 70k times float point computation. The results are stored into a huge nested list, which takes a lot of RAM as well.

However, this computation efficiency is not acceptable in both production and testing environment. The following algorithm was created for enhancing computation efficiency:

  • Apply K-Means with k=100 cluster centres on this dataset
  • For each point, instead of computing pairwise distance with all other points, only compute pairwise distance with points in the same cluster and the nearest point in the cluster will be the nearest point in the dataset.

For each cluster, computation is done ~ 700 * 700 times. Therefore in total, 70k * 700 times computation is done. The computation is 100 times faster.

In fact, K-means clustering is very effective and efficient as compare to pairwise distance computation. After this algorithm implemented, it took less than two minutes.

First time using unsupervised learning in non machine learning related domain. Excited and want to record this down. However in production this methodology is not utilised as the technical lead decides to reduce the number of candidate locations and only recompute when there are new location points added into the list.

Either solve the problem, or reduce the problem. Lessons for the day

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →