DEV Community

loading...

PHP Damarau levenstein on Longest Match SubString

Teddy Zugana
vi veri veniversum vivus vici Noob Teams
・1 min read
function getLongestMatchingSubstring($str1, $str2)
{
    $len_1 = strlen($str1);
    $longest = '';
    for($i = 0; $i < $len_1; $i++){
        for($j = $len_1 - $i; $j > 0; $j--){
            $sub = substr($str1, $i, $j);
            if (strpos($str2, $sub) !== false && strlen($sub) > strlen($longest)){
                $longest = $sub;
                break;
            }
        }
    }
    return $longest;
}
Enter fullscreen mode Exit fullscreen mode

Discussion (2)

Collapse
moay profile image
moay • Edited

Great solution to the problem. I think there might be several improvements though.

  • First find which string is longer and start with the shorter strings length.
  • Dependending on the string length, this solution could take very very long. On the plus side you can stop on the first match, but especially when assuming that the longest common substring will be a combination of several words or a short sentence, you'll be comparing stuff for a long time on longer texts. This could be massively sped up by choosing a different approach on this. For example
    • Start with half the length of the shorter string. If there is a match raise the length by 50%. If there is not, divide the length by 2. This way you'll approach the string length way quicker, especially on long strings.
    • Start at 1. Search for the first letter appearing in both. When found, raise length. Search for first pair of letters appearing in both. When found one, raise length. And so on. This will also be much quicker on long strings.

Love the challenge, I'll create an alternative and benchmark this later...

Collapse
moay profile image
moay • Edited

So, I played around a bit, sadly didn't have time for deeper look and a lot of tries. Here is my quick solution to this:

public function getLongestMatchingSubstring(string $string1, string $string2): ?string
{
    $shorterString = strlen($string1) > strlen($string2) ? $string1 : $string2;
    $longerString = strlen($string1) > strlen($string2) ? $string2 : $string1;

    $shorterStringLength = strlen($shorterString);
    $longestMatch = null;
    $matchingLength = 1;
    $index = 0;

    do {
        $possibleMatch = substr($shorterString, $index, $matchingLength);
        if (strpos($longerString, $possibleMatch) !== false) {
            $longestMatch = $possibleMatch;
            $matchingLength++;
        } else {
            $index++;
        }
    } while ($shorterStringLength - $index - $matchingLength >= 0);

    return $longestMatch;
}
Enter fullscreen mode Exit fullscreen mode

This seems to outperform the posted solution in most cases (by a huge factor up to 500 or more). For the extreme examples of having one very long string and one very short, both solutions seem to be pretty close, depending on the string order. While the order can easily be fixed, this still is way quicker in most "normal" circumstances.

Let me know if you find even better solutions. :-)