đ Updates
June 01, 2023 - Added an implementation with modifiable operation costs. Thanks to @cicirello for the suggestion.
The Levenshtein dist...
For further actions, you may consider blocking this person and/or reporting abuse
There's a version of the algorithm that allows you to specify the costs of the different edit operations. Very similar to original. You can find the details in:
Thank you for informing me. I was not familiar with this version. I will take a look and possibly implement it in the article.
You're welcome
@cicirello if I understand the article correctly, the algorithm is the Damerau-Levenshtein algorithm. It handles asymmetric costs and character transpositions. Am I right?
No, Wagner and Fischer's algorithm doesn't handle transpositions. It is basically like Levenshtein, but instead of the costs of insertions, deletions, and substitutions each being 1, the algorithm has parameters for the costs of each of those operations. You can specify any non-negative cost for deletions, any non-negative cost for insertions, and any non-negative cost for substitutions.
The algorithm is almost identical to the one your post is about. Basically, where you have the
+ 1
in the statement for the deletion cost, you'd instead add whatever the cost of a deletion operation. And then in the statement for the insertion case, you'd add the cost of an insertion instead of adding 1. And in the conditional statement for the case of a substitution, that 1 becomes the cost of a substitution. The rest of algorithm remains the same as what you have.You can then just pass 1 for all 3 costs if you want the Levenshtein case. Or you can pass different costs for the operations if in your application you want to treat the operations differently.
@fabriziobagala after my previous reply, it dawned on me that it might be useful to share my Java implementation of Wagner and Fischer's edit distance algorithm. I have a couple different classes that implements it in a Java library:
Based on what you told me and the code you sent me, I thought of writing this version you suggested in the following way:
What do you think of this implementation?
I think that looks correct.
Great! Then I will include this version in the article. Thank you
Thanks for sharing.
I once used this algorithm (from a NuGet package) to do some fuzzy matching. I was trying to find the most likely thing that a user was typing in from a finite set of options - in case there were any typos.
Great to have a better insight into how it actually works.
It is an algorithm that I particularly like. It can be used in different areas and it is worth knowing how it works đ