function characterReplacement(s: string, k: number): number {
let lo = k, hi = s.length + 1;
while(lo < hi - 1) {
const mid = Math.floor((lo + hi)/2);
if(isValidLength(mid)) {
lo = mid;
}
else {
hi = mid;
}
}
return lo;
function isValidLength(l:number):boolean {
const hashTB:Record<string, number> = {};
let start = 0, maxFreq = 0;
for(let end = 0; end < s.length; end++) {
if(!hashTB[s[end]]) hashTB[s[end]] = 1;
else hashTB[s[end]]++;
if(end - start + 1 > l) {
hashTB[s[start]]--;
start++;
}
if(maxFreq < hashTB[s[end]]) {
maxFreq = hashTB[s[end]];
}
if(l - maxFreq <= k) return true;
}
return false;
}
};
My solution might not be the fastest, most efficient, or simplest, but I want to use this opportunity to delve deeper into the related algorithms.
The problem requires us to find the length of the longest substring that contains the same letter, which can be achieved by changing any character of the string to any other uppercase English letter no more than k times.
The brute force approach would start with a substring of length 1 and check if it is valid. If it is, the process continues by checking substrings of lengths 2, 3, and so on, until finding the first substring that is not valid. Let's say this substring is of length n. Thus, the longest valid substring is one of length n-1, and we would return n-1.
However, this method is time-consuming. Is there a faster way? Notice that we are consecutively checking substrings of lengths 1, 2, 3, 4, 5, and so on, meaning that the lengths of the substrings we are checking are in sorted and ascending order. With any searching task in a sorted sequence like this, we can always employ binary search to speed up the process.
How can we apply binary search in this context? First, we need to identify the characteristics that make a substring valid. Essentially, the problem requires us to find a substring where the difference between the frequency of the most frequently occurring character and the length of the substring is less than or equal to k. This difference represents the number of characters that are not the most frequent character. If this number is less than or equal to k, we can change all of them to the most frequent character, resulting in a substring that contains only the same character (the most frequent character).
Let's say the length of the substring is l and the frequency of the most frequent character is maxFreq. A substring is considered valid if l - maxFreq <= k. If a substring of length l is valid, is a substring of length l-1 also valid? Of course, it is. This is because (l - 1) - maxFreq < l - maxFreq <= k. In other words, if a substring of length l is valid, all substrings shorter than that are also valid.
Conversely, if a substring of length l is invalid, what can we say about substrings of length l+1? An invalid substring of length l means that l - maxFreq > k. But since l + 1 > l, it follows that l + 1 - maxFreq > l - maxFreq > k. Therefore, if a substring of length l is invalid, all substrings longer than l are also invalid.
With this information in hand, how can we apply binary search? Binary search operates by setting low (lo) and high (hi) pointers at both ends of the search space. A search space is a set in which any element could potentially be the one we are looking for. We then compare the element in the middle with the target element to determine if the target is in the first half or the second half of the search space. Based on this comparison, we adjust the lo and hi pointers to shrink the search space. The process continues until there is only one element left, which is the element we are looking for.
Let's return to our problem and define the search space with two boundaries using pointers: lo and hi. We know that a substring of length k is always valid because we can change all its characters (up to k times) to the same character. Since a substring of length k is valid, all shorter substrings are also valid, but they are not the longest valid substrings. Therefore, we start the search space from k, setting lo to k as the starting point.
Additionally, we cannot rule out the possibility that the original string is the longest valid substring. Therefore, we set hi to the length of the string plus 1 (s.length + 1) to mark the end of the search space.
In summary, lo points to the length of a longest valid substring known so far, and hi points to one unit higher than the search space. We call these two facts invariant because we need to maintain them after each round.
To shrink the search space, we follow these steps:
Find the midpoint.
If we can find a valid substring whose length equals the value at the midpoint, it means all substrings that are shorter than that are also valid but can’t be the longest substring. The longest one that we know is the one whose length equals to the value at the midpoint. To shrink the search space while keeping the invariant, we move
lotomid. Now,lostill points to the length of the longest valid substring known so far (previously pointed bymid), andhiremains in the same position.If we cannot find a valid substring whose length equals the value at the midpoint, it means all longer substrings are also invalid. Thus, the length of the longest substring should be less than the midpoint value. We shrink the search space to the range from
lotomid - 1by movinghitomid. Now, thehipointer is one unit higher than the search space, andloremains in the same position, maintaining the invariant.We continue steps 2 and 3 until only one element is left in the search space. Typically, in binary search, when the algorithm stops,
loandhican point to either the same element or two adjacent elements, withlopointing to the lower one andhipointing to the higher one. In our case:
loalways points to the length of a known valid substring, whilehialways points to the length of a known invalid substring. Therefore, they cannot point to the same element.Any invalid substring has a length greater than any valid substring. For example, if there is a value
lsuch that it is the length of an invalid substring whilel+1is the length of a valid substring, it leads to a contradiction because a shorter string, includingl, should also be valid. Therefore,hialways points to an invalid substring andloalways points to a valid substring, sohiis always greater thanlo.The only possibility left is that
loandhiare adjacent, withlopointing to the lower value andhipointing to the higher value. The invariant states thatlopoints to the length of the longest valid substring known so far whilehipoints to one unit higher than the search space. Thus, there is only one value in the search space, pointed to bylo, that can be the length of the longest substring. It is the one that we are looking for.
Top comments (0)