This is interesting. I tried for a little while to come up with my own solution, but, unfortunately, when I was researching the problem, I came across an algorithm for Levenshtein Distance. I couldn't think of any solution that I liked better than the one that I found.

"""Finds the percent similarity between two strings."""importnumpyasnpdefsimilarity(first,second):"""Returns the percent similarity between two strings.
Calculated as the # of non-changed characters divided by
the total characters in the smallest input (assuming all
added characters are changes that need made).
"""changes=levenshtein_distance(first,second)min_total_chars=min(len(first),len(second))return(min_total_chars-changes)/min_total_charsdeflevenshtein_distance(first,second):"""Calculates the Levenshtein Distance
(https://en.wikipedia.org/wiki/Levenshtein_distance) between two strings.
Algorithm from Michael Gilleland (https://goo.gl/aUYbqW).
"""first_len=len(first)second_len=len(second)iffirst_len==0orsecond_len==0:raiseValueError("Inputs must not have length 0")matrix=np.zeros((first_len+1,second_len+1),dtype=np.int)matrix[:,0]=range(first_len+1)matrix[0,:]=range(second_len+1)fori,first_charinenumerate(first,start=1):forj,second_charinenumerate(second,start=1):iffirst_char==second_char:cost=0else:cost=1min_cost=min(matrix[i-1,j]+1,matrix[i,j-1]+1,matrix[i-1,j-1]+cost)matrix[i,j]=min_costreturnmatrix[first_len,second_len]

Essentially, from what I can tell, you create a 2-dimensional matrix. As you build the matrix, moving from top-left to bottom-right, you're weighing the cumulative cost of:

Deleting a character in one of the words.

Inserting a new character in one of the words.

Changing (or leaving alone) the existing character.

The bottom right corner of the matrix ends up holding the minimum number of changes you need to make to make the two strings match.

If anybody can explain the reasoning behind it to me better, I would love to hear about it!

## re: Write a script to identify similarity from two string. VIEW POST

VIEW FULL DISCUSSIONThis is interesting. I tried for a little while to come up with my own solution, but, unfortunately, when I was researching the problem, I came across an algorithm for Levenshtein Distance. I couldn't think of any solution that I liked better than the one that I found.

So I thought I'd share it in case anybody else is interested. Here's a link to the original write-up I found.

Essentially, from what I can tell, you create a 2-dimensional matrix. As you build the matrix, moving from top-left to bottom-right, you're weighing the cumulative cost of:

The bottom right corner of the matrix ends up holding the minimum number of changes you need to make to make the two strings match.

If anybody can explain the reasoning behind it to me better, I would love to hear about it!