DEV Community

Levenshtein Distance

Fabrizio Bagalà on May 26, 2023

🔄 Updates June 01, 2023 - Added an implementation with modifiable operation costs. Thanks to @cicirello for the suggestion. The Levenshtein dist...

Read full post

Vincent A. Cicirello • May 26 '23

There's a version of the algorithm that allows you to specify the costs of the different edit operations. Very similar to original. You can find the details in:

R. A. Wagner and M. J. Fischer, "The string-to-string correction problem," Journal of the ACM, vol. 21, no. 1, pp. 168–173, January 1974.

Fabrizio Bagalà • May 26 '23

Thank you for informing me. I was not familiar with this version. I will take a look and possibly implement it in the article.

Vincent A. Cicirello • May 27 '23

You're welcome

Fabrizio Bagalà • May 30 '23

@cicirello if I understand the article correctly, the algorithm is the Damerau-Levenshtein algorithm. It handles asymmetric costs and character transpositions. Am I right?

Vincent A. Cicirello • May 30 '23

No, Wagner and Fischer's algorithm doesn't handle transpositions. It is basically like Levenshtein, but instead of the costs of insertions, deletions, and substitutions each being 1, the algorithm has parameters for the costs of each of those operations. You can specify any non-negative cost for deletions, any non-negative cost for insertions, and any non-negative cost for substitutions.

The algorithm is almost identical to the one your post is about. Basically, where you have the + 1 in the statement for the deletion cost, you'd instead add whatever the cost of a deletion operation. And then in the statement for the insertion case, you'd add the cost of an insertion instead of adding 1. And in the conditional statement for the case of a substitution, that 1 becomes the cost of a substitution. The rest of algorithm remains the same as what you have.

You can then just pass 1 for all 3 costs if you want the Levenshtein case. Or you can pass different costs for the operations if in your application you want to treat the operations differently.

Vincent A. Cicirello • May 30 '23 • Edited

@fabriziobagala after my previous reply, it dawned on me that it might be useful to share my Java implementation of Wagner and Fischer's edit distance algorithm. I have a couple different classes that implements it in a Java library:

This one has methods for computing the edit distance between arrays of various types, between Strings, between lists, etc, and uses ints for the three cost values: github.com/cicirello/JavaPermutati...
This one is just like the above, but uses doubles for the three cost parameters: github.com/cicirello/JavaPermutati...
This last one is for the special case of computing the distance between permutations. The algorithm is the same really. It just predates the other implementations in this library, from when the library was entirely focused on permutations: github.com/cicirello/JavaPermutati...

Fabrizio Bagalà • May 30 '23

Based on what you told me and the code you sent me, I thought of writing this version you suggested in the following way:

public int LevenshteinDistance(string source, string target, int insertionCost, int deletionCost, int substitutionCost)
{
    if (insertionCost < 0)
    {
        throw new ArgumentException("Value cannot be negative.", nameof(insertionCost));
    }

    if (deletionCost < 0)
    {
        throw new ArgumentException("Value cannot be negative.", nameof(deletionCost));
    }

    if (substitutionCost < 0)
    {
        throw new ArgumentException("Value cannot be negative.", nameof(substitutionCost));
    }

    var m = source.Length;
    var n = target.Length;

    var distance = new int[m + 1, n + 1];

    for (var i = 0; i <= m; i++)
    {
        distance[i, 0] = i * deletionCost;
    }

    for (var j = 0; j <= n; j++)
    {
        distance[0, j] = j * insertionCost;
    }

    for (var i = 1; i <= m; i++)
    {
        for (var j = 1; j <= n; j++)
        {
            var cost = source[i - 1] == target[j - 1] ? 0 : substitutionCost;

            var deletion = distance[i - 1, j] + deletionCost;
            var insertion = distance[i, j - 1] + insertionCost;
            var substitution = distance[i - 1, j - 1] + cost;

            distance[i, j] = Math.Min(Math.Min(deletion, insertion), substitution);
        }
    }

    return distance[m, n];
}

What do you think of this implementation?

Vincent A. Cicirello • May 30 '23

I think that looks correct.

Fabrizio Bagalà • May 31 '23

Great! Then I will include this version in the article. Thank you

Anthony Fung • May 26 '23

Thanks for sharing.

I once used this algorithm (from a NuGet package) to do some fuzzy matching. I was trying to find the most likely thing that a user was typing in from a finite set of options - in case there were any typos.

Great to have a better insight into how it actually works.

Fabrizio Bagalà • May 26 '23

It is an algorithm that I particularly like. It can be used in different areas and it is worth knowing how it works 😉