Leetcode 1461. Check If a String Contains All Binary Codes of Size K [Multiple Approaches]

#algorithms #programming #cpp #tutorial

This is a very interesting question that can check your thinking ability and knowledge about bitwise operators. I have mentioned 2 different approaches and 3 code samples. You must solve this.

Difficulty: Medium
Jump To:

Problem Statement
Explanation
Solution
- Approach 1 (Using Set/Bitset)
- Approach 2 (Using Rolling Hash)
Implementation

Problem Statement

Question: https://leetcode.com/problems/check-if-a-string-contains-all-binary-codes-of-size-k/

Given a binary string s and an integer k.

Return True if every binary code of length k is a substring of s. Otherwise, return False.

Example 1:

Input: s = "00110110", k = 2
Output: true
Explanation: The binary codes of length 2 are "00", "01", "10" and "11". They can be all found as substrings at indicies 0, 1, 3 and 2 respectively.

Example 2:

Input: s = "00110", k = 2
Output: true

Example 3:

Input: s = "0110", k = 1
Output: true
Explanation: The binary codes of length 1 are "0" and "1", it is clear that both exist as a substring.

Example 4:

Input: s = "0110", k = 2
Output: false
Explanation: The binary code "00" is of length 2 and doesn't exist in the array.

Example 5:

Input: s = "0000000001011100", k = 4
Output: false

Constraints:

1 <= s.length <= 5 * 10^5
s consists of 0's and 1's only.
1 <= k <= 20

Explanation

The problem is simple, we just need to check if every binary code of length k is a substring of s. For example, if k=2 then binary code representation of length 2 are 00, 01, 10, and 11. So the string s must include all these values like the string 00110 has i.e. 00 110, 0 01 10, 00 11 0, and 001 10 . So if the string has every binary code of length k as a substring in s then return true else false.

Solution

The naive approach can be to first find all the combinations of the binary string of length k and then check if each of these combinations is a substring to the string. But this is a very heavy approach as time complexity can be O(N*2^k). So let's discuss the more efficient approaches as below:

Approach 1:

We can say that it's the reverse approach, instead of checking for each combination being a substring of s or not, we'll count all the distinct substrings of length k in s is equal to 2^k or not. Because for any binary string of length k we can get maximum 2^k combinations as we saw in the above example that for k=2 we have 2^2 = 4 combinations.

To count distinct substrings inside s of length k we will be using a set or unordered_set, as we know that a set can contain only distinct values so we'll add each substring of length k in this set. In the end, if the total number of items in this set is equal to 2^k then we return true else false.

class Solution {
public:
    bool hasAllCodes(string s, int k) {
        unordered_set<string> substr_set;
        int len = s.size();
        int no_of_comb = (1<<k);
        for(int i = 0; i+k <= len && substr_set.size() != no_of_comb; i++){
            substr_set.insert(s.substr(i,k));
        }
        return substr_set.size() == no_of_comb;
    }
};

The above solution can be improved as every binary string represents a number as well so instead of inserting a string in the set we'll be inserting its decimal value in the set. so less space will be used.

class Solution {
public:
    bool hasAllCodes(string s, int k) {
        unordered_set<int> substr_set;
        int len = s.size();
        int no_of_comb = (1 << k);
        for(int i = 0; i+k <= len && substr_set.size() != no_of_comb; i++){
            substr_set.insert(stoi(s.substr(i,k),nullptr,2));
        }
        return substr_set.size() == no_of_comb;
    }
};

The above solution can be further improved using bitset instead of a set, as every binary string represents a number let's say n so instead of inserting in the set we'll be setting nth bit in the bitset and at the end, we'll be counting set bits in the bitset and check if it is equal to 2^k then return true else false.

class Solution {
public:
    bool hasAllCodes(string s, int k) {
        bitset<(1<<20)> bset;
        int len = s.size();
        for(int i = 0; i+k <= len; i++){
            bset[stoi(s.substr(i,k),nullptr,2)] = 1;
        }
        return bset.count() == (1<<k);
    }
};

Time Complexity: O(N*k), where N is the size of the string and k is the length of binary code.
Space Complexity: O(N*k)

Approach 2 (Using Rolling Hash)

We don't even need to calculate the full value let's say hash for each of the substring of length k using stoi() as we were doing so in the last approach. We can just calculate hash for the first substring of length k and thereafter, we can apply rolling hash technique.

This approach is nothing but removing the last character of the previous substring and adding the next character from s and calculating the hash from this instead of recalculating the whole hash.

Let's understand this with the below example:

hash = hash << 1       :  one shift to left
hash &= allOnes        :  removes kth bit, ensures our hash always less than 2^k
hash |= s[i] - '0'     :  calculates hash with new character after removing first character of previous substring

Eg. Suppose string is 110101 and k = 3. We start with hash = 0. Here size = 2^k = 8 and allOnes = 7 (111).
1. hash = 000 << 1 = 000  ||  hash &= (111) = 000(0)  ||  hash |= 1  = 001         👉 hash = 001 (1)
2. hash = 001 << 1 = 010  ||  hash &= (111) = 010(2)  ||  hash |= 1  = 011         👉 hash = 011 (3)
3. hash = 011 << 1 = 110  ||  hash &= (111) = 110(6)  ||  hash |= 0  = 110         👉 hash = 110 (6)
4. hash = 110 << 1 = 1100 ||  hash &= (111) = 100(4)  ||  hash |= 1  = 101         👉 hash = 101 (5)
            ^exceeds (2^k)   ^2nd step removes kth bit^
5. hash = 101 << 1 = 1010 ||  hash &= (111) = 010(2)  ||  hash |= 0  = 010         👉 hash = 010 (2)
6. hash = 010 << 1 = 100  ||  hash &= (111) = 100(4)  ||  hash |= 1  = 101         👉 hash = 101 (5)

We can observe that using rolling hash, after first k-1 steps, our hash value is always equal 
to the decimal equivalent of substring of length k. This can be used as for indexing the set which
will store the distinct substrings of length k found. If number of distinct substrings is equal to size,
then the answer will be true.

I picked this great explanation from this Leetcode discuss post given by Abhishek.

So as per above approach the code is below:

class Solution {
public:
    bool hasAllCodes(string s, int k) {
        int n=s.size(), 
        size=1<<k, // This results in 2^k
        count=0, // count to store distinct substrings found
        hash=0, // rolling hash variable
        allOnes= ~size; // this results in a 0 followed by k 1's

        bitset<(1<<20)> bset; // Bitset to store found substr

        for(int i=0; i<n; i++){
            // Rolling hash calculation
            hash = ((hash << 1) & allOnes) | (s[i] - '0');

            // Skip first k-1 hash and check if this string is not found yet
            if(i >= k - 1 && !bset[hash]){
                // Set bit to mark substr found
                // Increase count
                bset[hash] = true, count++;
                if(count == size) // If all substr are found
                    break;
            }
        }
        return count == size;
    }
};