Basics of Minkowski Distance

#ai #claude #cloudcomputing

Classification problem involving Minkowski distance at p=2 in Iris dataset

Distance metrics are used by proximity-based models to find certain paths between two points.

One of the best applications of distance metrics is in the travelling salesman problem where nearest neighbour, an approximation algorithm, is usually utilized. Aside from combinatorial optimization, distance metrics are also widely used in classification, clustering, and special relativity problems.

In this article, we will tackle one of the basic distance metrics, named after the German mathematician Hermann Minkowski—the Minkowski distance.

By definition

Minkowski distance is a distance metric, or a similarity measurement, between two points in a normed vector space. It uses a parameter $p$ which represents the order of the norm.

Let’s say we have $X_n$ and $Y_n$ denoted as:

X = (x_1,x_2,...,x_n),

Y = (y_1,y_2,...,y_n)

The distance of two points can be measured by getting the absolute value of the difference of $X_i$ and $Y_i$ ( $|X_i - Y_i|$ ). Hence, to get the distances of all $X,Y$ points ( $D(X,Y)$ ), the formula will be in the form of:

D(X,Y) = \displaystyle\sum_{i=1}^n{|x_i - y_i|}^p

To satisfy the Minkowski distance, adding $\frac{1}{p}$ (which, again, represents the order of the norm) completes the equation:

D(X,Y) = \bigg(\displaystyle\sum_{i=1}^n {|x_i - y_i|}^p\bigg)^{\frac{1}{p}}

Minkowski distance can also be derived as:

{ D(X,Y) = \sqrt[p]{|x_{1}-y_{1}|^p + |x_{2}-y_{2}|^p + ... + |x_{n}-y_{n}|^p} } ,

D(i,j) = \sqrt[p]{|x_{i1}-x_{j1}|^p + |x_{i2}-x_{j2}|^p + ... + |x_{in}-x_{jn}|^p}

Orders of the norm

The order of the norm (denoted by the parameter $p$ ) varies the distance norm of the points.

The case where $p=1$ is equivalent to the Manhattan distance—named after the rectilinear street layout of Manhattan since it measures the distance between two points in a city if we can only travel along orthogonal city blocks.

Manhattan distance can be calculated using:

{ D(X,Y) = \bigg(\displaystyle\sum_{i=1}^n {|x_i - y_i|}\bigg) } ,

D(i,j) = {|x_{i1}-x_{j1}| + |x_{i2}-x_{j2}| + ... + |x_{in}-x_{jn}|}

The case where $p=2$ is equivalent to the Euclidean distance (or the Pythagorean distance, since it can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem).

Euclidean distance can be calculated using:

{ D(X,Y) = \bigg(\displaystyle\sum_{i=1}^n {|x_i - y_i|}^2\bigg)^{\frac{1}{2}} } ,

D(i,j) = \sqrt{|x_{i1}-x_{j1}|^2 + |x_{i2}-x_{j2}|^2 + ... + |x_{in}-x_{jn}|^2}

Therefore, Minkowski distance can also be considered as a generalization of both the Euclidean distance and the Manhattan distance.

The parameter $p$ can extend to more than the value of $2$ to $\infty$ that is equal to the Chebyshev distance. Having $p<1$ violates the triangle inequality—an advanced topic that is out of the scope of this article.

Applications of Minkowski distance

Minkowski distance is widely used in machine learning (classification, clustering, and feature extraction) and special relativity (Minkowski space) problems.

One example is in Chess programming. The Minkowski distance at $p=1$ can be used in the static evaluation of the late endgame, where for instance races of the two king to certain squares is often an issue—or in so called Mop-up evaluation, which considers the Manhattan-Distance between winning and losing king.^[1]

$p=2$ has its best usage in calculating the length of a line segment between the two points. It is widely used in geocoding/reverse geocoding applications, such as finding the distance between the geocoded point and the true address location is computed to evaluate the positional accuracy of a geocoding procedure.^[2]