DEV Community

Minh Le
Minh Le

Posted on

424. Longest Repeating Character Replacement

function characterReplacement(s: string, k: number): number {
    let lo = k, hi = s.length + 1;

    while(lo < hi - 1) {
        const mid = Math.floor((lo + hi)/2);
        if(isValidLength(mid)) {
            lo = mid;
        }
        else {
            hi = mid;
        }
    }

    return lo;

    function isValidLength(l:number):boolean {
        const hashTB:Record<string, number> = {};

        let start = 0, maxFreq = 0;

        for(let end = 0; end < s.length; end++) {
            if(!hashTB[s[end]]) hashTB[s[end]] = 1;
            else hashTB[s[end]]++;


            if(end - start + 1 > l) {
                hashTB[s[start]]--;
                start++;
            }

            if(maxFreq < hashTB[s[end]]) {
                    maxFreq = hashTB[s[end]];
            }

            if(l - maxFreq <= k) return true;

        }

        return false;
    }
};
Enter fullscreen mode Exit fullscreen mode

My solution might not be the fastest, most efficient, or simplest, but I want to use this opportunity to delve deeper into the related algorithms.

The problem requires us to find the length of the longest substring that contains the same letter, which can be achieved by changing any character of the string to any other uppercase English letter no more than k times.

The brute force approach would start with a substring of length 1 and check if it is valid. If it is, the process continues by checking substrings of lengths 2, 3, and so on, until finding the first substring that is not valid. Let's say this substring is of length n. Thus, the longest valid substring is one of length n-1, and we would return n-1.

However, this method is time-consuming. Is there a faster way? Notice that we are consecutively checking substrings of lengths 1, 2, 3, 4, 5, and so on, meaning that the lengths of the substrings we are checking are in sorted and ascending order. With any searching task in a sorted sequence like this, we can always employ binary search to speed up the process.

How can we apply binary search in this context? First, we need to identify the characteristics that make a substring valid. Essentially, the problem requires us to find a substring where the difference between the frequency of the most frequently occurring character and the length of the substring is less than or equal to k. This difference represents the number of characters that are not the most frequent character. If this number is less than or equal to k, we can change all of them to the most frequent character, resulting in a substring that contains only the same character (the most frequent character).

Let's say the length of the substring is l and the frequency of the most frequent character is maxFreq. A substring is considered valid if l - maxFreq <= k. If a substring of length l is valid, is a substring of length l-1 also valid? Of course, it is. This is because (l - 1) - maxFreq < l - maxFreq <= k. In other words, if a substring of length l is valid, all substrings shorter than that are also valid.
Conversely, if a substring of length l is invalid, what can we say about substrings of length l+1? An invalid substring of length l means that l - maxFreq > k. But since l + 1 > l, it follows that l + 1 - maxFreq > l - maxFreq > k. Therefore, if a substring of length l is invalid, all substrings longer than l are also invalid.

With this information in hand, how can we apply binary search? Binary search operates by setting low (lo) and high (hi) pointers at both ends of the search space. A search space is a set in which any element could potentially be the one we are looking for. We then compare the element in the middle with the target element to determine if the target is in the first half or the second half of the search space. Based on this comparison, we adjust the lo and hi pointers to shrink the search space. The process continues until there is only one element left, which is the element we are looking for.

Let's return to our problem and define the search space with two boundaries using pointers: lo and hi. We know that a substring of length k is always valid because we can change all its characters (up to k times) to the same character. Since a substring of length k is valid, all shorter substrings are also valid, but they are not the longest valid substrings. Therefore, we start the search space from k, setting lo to k as the starting point.

Additionally, we cannot rule out the possibility that the original string is the longest valid substring. Therefore, we set hi to the length of the string plus 1 (s.length + 1) to mark the end of the search space.

In summary, lo points to the length of a longest valid substring known so far, and hi points to one unit higher than the search space. We call these two facts invariant because we need to maintain them after each round.

To shrink the search space, we follow these steps:

  1. Find the midpoint.

  2. If we can find a valid substring whose length equals the value at the midpoint, it means all substrings that are shorter than that are also valid but can’t be the longest substring. The longest one that we know is the one whose length equals to the value at the midpoint. To shrink the search space while keeping the invariant, we move lo to mid. Now, lo still points to the length of the longest valid substring known so far (previously pointed by mid), and hi remains in the same position.

  3. If we cannot find a valid substring whose length equals the value at the midpoint, it means all longer substrings are also invalid. Thus, the length of the longest substring should be less than the midpoint value. We shrink the search space to the range from lo to mid - 1 by moving hi to mid. Now, the hi pointer is one unit higher than the search space, and lo remains in the same position, maintaining the invariant.

  4. We continue steps 2 and 3 until only one element is left in the search space. Typically, in binary search, when the algorithm stops, lo and hi can point to either the same element or two adjacent elements, with lo pointing to the lower one and hi pointing to the higher one. In our case:

  • lo always points to the length of a known valid substring, while hi always points to the length of a known invalid substring. Therefore, they cannot point to the same element.

  • Any invalid substring has a length greater than any valid substring. For example, if there is a value l such that it is the length of an invalid substring while l+1 is the length of a valid substring, it leads to a contradiction because a shorter string, including l, should also be valid. Therefore, hi always points to an invalid substring and lo always points to a valid substring, so hi is always greater than lo.

  • The only possibility left is that lo and hi are adjacent, with lo pointing to the lower value and hi pointing to the higher value. The invariant states that lo points to the length of the longest valid substring known so far while hi points to one unit higher than the search space. Thus, there is only one value in the search space, pointed to by lo, that can be the length of the longest substring. It is the one that we are looking for.

Top comments (0)