Implement a similarity method that compares two strings and returns a number representing the percent similarity between the two strings.
The method must be able to calculate the minimum number of operations you must do to change 1 string into another. In other words the the Levenshtein distance is the model.
def similarity(string1, string2)
# ...
return similarity_percentage
end
Top comments (4)
This is interesting. I tried for a little while to come up with my own solution, but, unfortunately, when I was researching the problem, I came across an algorithm for Levenshtein Distance. I couldn't think of any solution that I liked better than the one that I found.
So I thought I'd share it in case anybody else is interested. Here's a link to the original write-up I found.
Essentially, from what I can tell, you create a 2-dimensional matrix. As you build the matrix, moving from top-left to bottom-right, you're weighing the cumulative cost of:
The bottom right corner of the matrix ends up holding the minimum number of changes you need to make to make the two strings match.
If anybody can explain the reasoning behind it to me better, I would love to hear about it!
Rust solution for similarity of strings. It uses iterative approach with dynamic programming fashion. It creates a matrix of values and the bottom last cell has the number of changes needed to make the 2 strings similar.
I used the explanation from Wikipedia here.
The dynamic part work by first starting with prefix of both strings (Single characters at first) and them working it's way onwards to the complete string. It checks the similarity and computer the distance between the sub-strings, by checking if insertion, deletion, or substitution help to make the sub-strings similar. The last computer value will be the value of the differences of both the strings.
If the name of the algo was not given in the challenge, I would have gone with LCS (due to familiarity with the algo) and it would be good enough, as it works on the same principle but does not take substitutions into account.
I loved working on this, and it introduced me to a new algo; but please add some examples (test cases). A little more explanation using the examples would help a lot. The question just feels like try and implement this algo.
Some test cases I used:
Are the strings of the same length? Or how do you define similarity? Are we supposed to come up with our own model?
Thanks Oskar for suggestion. I just updated the description of the challenge.