DEV Community

Cover image for Solution: Short Encoding of Words (ver. 1)
seanpgallivan
seanpgallivan

Posted on • Edited on

Solution: Short Encoding of Words (ver. 1)

This is part of a series of Leetcode solution explanations (index). If you liked this solution or found it useful, please like this post and/or upvote my solution post on Leetcode's forums.


Note: This is my first version of a solution for this problem. Due to the constraints listed for this problem, this version is the more performant solution, but the nature of this problem really calls for a trie solution, so I've included a breakdown of the trie approach here, as well.


Leetcode Problem #820 (Medium): Short Encoding of Words


Description:


(Jump to: Solution Idea || Code: JavaScript | Python | Java | C++)

A valid encoding of an array of words is any reference string s and array of indices indices such that:

  • words.length == indices.length
  • The reference string s ends with the '#' character.
  • For each index indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i].

Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.


Examples:

Example 1:
Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"
Example 2:
Input: words = ["t"]
Output: 2
Explanation: A valid encoding would be s = "t#" and indices = [0].

Constraints:

  • 1 <= words.length <= 2000
  • 1 <= words[i].length <= 7
  • words[i] consists of only lowercase letters.

Idea:


(Jump to: Problem Description || Code: JavaScript | Python | Java | C++)

So a simple encoding of the input would be to add the '#' marker to the end of each word and then join them in a string. Per the instructions, this encoding can be made shorter if you can combine two or more words into one encoded word. In order to do this, the smaller word would have to be not just a substring of the larger word, but the rightmost substring, or its suffix.

A naive solution here would be to compare each word to each other word and examine if the larger word has the smaller word as its suffix, but with a range of up to 2000 words, that would mean almost 4 million potential combinations.

But if we're asked to check for matching suffixes, we might also be thinking of a trie solution. This seems like a good time for a trie, but tries tend to have a lot of processing and memory overhead involved, and in this instance there's an easier method.

Going back to our naive method, what if, instead of comparing each word to the up to 2000 other words, we instead just identified which possible words could share a suffix with the current word and check for them? Since each word is at most 7 characters long, that means only up to 6 checks per word, rather than 2000.

In order to make this work more efficiently, of course, we'll have to first make a map of the words in W so that we don't have to iterate through it repeatedly. In this case, we don't need to store a value, so we can use a set as our wordmap.

Then for each word, we should check for each of its suffixes and remove any matches from our set as unnecessary if found.


Implementation:

For Javascript and Python, it's faster/easier to just join() the remaining words before counting the length, while for Java and C++ it's faster to just iterate through the set directly.


Javascript Code:


(Jump to: Problem Description || Solution Idea)

var minimumLengthEncoding = function(W) {
    let set = new Set(W)
    for (let word of W)
        if (set.has(word))
            for (let i = 1; i < word.length; i++) 
                set.delete(word.slice(i))
    return Array.from(set).join().length + 1
};
Enter fullscreen mode Exit fullscreen mode

Python Code:


(Jump to: Problem Description || Solution Idea)

class Solution:
    def minimumLengthEncoding(self, W: List[str]) -> int:
        wset = set(W)
        for word in W:
            if word in wset:
                for i in range(1,len(word)):
                    wset.discard(word[i:])
        return len("#".join(list(wset))) + 1
Enter fullscreen mode Exit fullscreen mode

Java Code:


(Jump to: Problem Description || Solution Idea)

class Solution {
    public int minimumLengthEncoding(String[] W) {
        Set<String> set = new HashSet<>(Arrays.asList(W));
        for (String word : W)
            if (set.contains(word))
                for (int i = 1; i < word.length(); i++) 
                    set.remove(word.substring(i));
        int ans = set.size();
        for (String word : set) ans += word.length();
        return ans;
    }
}
Enter fullscreen mode Exit fullscreen mode

C++ Code:


(Jump to: Problem Description || Solution Idea)

class Solution {
public:
    int minimumLengthEncoding(vector<string>& W) {
        unordered_set<string> wset(W.begin(), W.end());
        for (string &word : W)
            if (wset.find(word) != wset.end())
                for (int i = 1; i < word.length(); i++) 
                    wset.erase(word.substr(i));
        int ans = wset.size();
        for (string word : wset) ans += word.size();
        return ans;
    }
};
Enter fullscreen mode Exit fullscreen mode

Top comments (0)