loading...

Find the longest subsequence of two strings

annlin profile image ann lin Updated on ・4 min read

Today coding question is,

find the length of the longest subsequence of two strings.
Given "abc" and "aec", the longest subsequence string is "ac".

Quick FYI,

if you are confused about substring and subsequence.

Given two strings: "iamnice", "iamtired"
e.g. of a longest substring: "iam"
e.g. of a longest subsequence: "iamie"

In this question, we only care about subsequence.

So naively I can,

list down (space complexity is kaboom over here) all the possible combinations of both strings (kaboom time complexity), then search for the longest common string (O(mn) where m and n are the lengths of the strings). Altogether, extremely terrible time and space complexity.

def findCombinationsOfString(string, list, combi):
    if len(string) == 0:
        return

    newCombi = combi + string[0]

    # don't add common strings
    if newCombi not in list:
        list.append(newCombi)

    findCombinationsOfString(string[1:], list, newCombi)

    for i in range(1, len(string)):
        findCombinationsOfString(string[i:], list, combi)

    return list


def nativelyFindLongestSubsequenceOfStrings(str1, str2):
    lengthStr1, lengthStr2 = len(str1), len(str2)

    if lengthStr1 == 0 or lengthStr2 == 0:
        return 0

    if str1 == str2:
        return lengthStr1

    list1 = findCombinationsOfString(str1, [], "")
    # ['a', 'ab', 'abc', 'ac', 'b', 'bc', 'c']

    list1 = sorted(list1, key=len, reverse=True)
    # ['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']

    list2 = findCombinationsOfString(str2, [], "")
    # ['a', 'ae', 'aec', 'ac', 'e', 'ec', 'c']

    list2 = sorted(list2, key=len, reverse=True)
    # ['aec', 'ae', 'ac', 'ec', 'a', 'e', 'c']

    for i in range(len(list1)):
        for j in range(len(list2)):
            if list1[i] == list2[j]:
                return len(list1[i])

    return 0


print nativelyFindLongestSubsequenceOfStrings("", "")

The better way is to,

put two strings into a 2D array. (I tried to draw my explanation, refer to narrative explanation in the next section)

my amazing drawing

More explanation in words.

We put two strings into a 2D array. Iterate through the 2D array to find all common characters. If the characters from both strings are the same, increment 1 (an indicator of 1 character in the subsequence).

To get the longest subsequence, we need to consider all possible combinations of common characters that come before. So we increment 1 + get whatever (possible combinations) value that comes before.

Soon we realise there are 3 possible scenarios of the possible combination values; it could be from the top left box, the top box, and the left box. Since we are finding the longest subsequence, we get the maximum value of the 3 possible values.

If the current index shows that both characters from the two strings are the same, we increment 1 + max(top left value, top value, left value).

Until it reaches the last index at bottom right. It should contain the longest subsequence of two strings.

Tada.

Code,

of course in Python.
Time complexity: O(mg), just kidding. It's O(mn), where m and n are the lengths of the strings
Space complexity: O(mn), the 2D array takes up space.

def findLongestSubsequenceString(str1, str2):
    lengthStr1, lengthStr2 = len(str1), len(str2)

    # if one of the strings is empty, return length of non empty string
    # we can easily use a variable for magic number 0. e.g. EMPTY_STRING = 0
    if lengthStr1 == 0 or lengthStr2 == 0:
        return 0

    # do you think it's a good idea to catch specific case like this?
    # if the possibility of both strings being same is high, this is a legit case I think.
    if str1 == str2:
        return lengthStr1

    # create 2D array with length of str1 and str2
    # I think this syntax is cool AF.
    memo = [[0 for i in range(lengthStr2)] for i in range(lengthStr1)]

    for i in range(lengthStr1):
        for j in range(lengthStr2):

            topLeft = top = left = 0
            # need to check if the indices are within range of 2D array,
            # else the code will throw you into the sea (I meant throw out of range error)
            if i > 0 and j > 0:
                topLeft = memo[i - 1][j - 1]

            if i > 0:
                top = memo[i - 1][j]

            if j > 0:
                left = memo[i][j - 1]

            # get the maximum value of possible subsequence combinations
            memo[i][j] = max(topLeft, top, left)

            # increment 1 if current characters are the same
            # we can do it up there but I think it looks clearer if we separate out this logic
            if str1[i] == str2[j]:
                memo[i][j] = memo[i][j] + 1

    return memo[lengthStr1 - 1][lengthStr2 - 1]


print findLongestSubsequenceString('abc', 'bca')
# 2

# 2D Array looks likes this:
# [1, 0, 0]
# [0, 1, 1]
# [0, 1, 2]

What are we missing out,

this piece of (shit) code only caters for unique strings. What if there are duplicated characters (opps). I just started playing with dynamic programming so I need to study more on it. So far this is the best solution from my side. We could also do it recursively. I have not thought of that approach yet. (Next part?)

Comment if I miss out any edge cases. :<

I did a quick peek (googled for solution :> ),

there is a better solution using suffix thingy but I believe my brain has enough of this question. We will come back to this suffix thingy soon.

Ok,

bye.

p/s: are we able to color code the code?

Posted on Sep 6 '18 by:

annlin profile

ann lin

@annlin

I am a spider, I build web.

Discussion

markdown guide
 

p/s: are we able to color code the code?

You can add the language directly after the first three back-ticks at the beginning of your code-block. Like so:

code block with language example

 

ohhh, thanks peter! colourrssss

 

Your algorithm is not correct in some cases. Just launch it for 'aa' and 'a' strings ;)