Discussion on: The algorithm behind Ctrl + F.

View post

Replies for: Boyer-Moore and KMP are both O(m+n) in the worst case. Please fix the typo (or check your references).

Akhil • Jul 7 '20 • Edited

Worst case is still O(mn).
Read this : cs.cornell.edu/courses/cs312/2002s...

Yves Lucet • Jul 7 '20

Agreed, but your article shows O(mn).

Akhil • Jul 7 '20

This algorithm works well if the alphabet is reasonably big, but not too big. If the last character usually fails to match, the current shift s is increased by m each time around the loop. The total number of character comparisons is typically about n/m, which compares well with the roughly n comparisons that would be performed in the naive algorithm for similar problems. In fact, the longer the pattern string, the faster the search! However, the worst-case run time is still O(nm). The algorithm as presented doesn't work very well if the alphabet is small, because even if the strings are randomly generated, the last occurrence of any given character is near the end of the string. This can be improved by using another heuristic for increasing the shift. Consider this example:

T = ...LIVID_MEMOIRS...
P = EDITED_MEMOIRS

maowtm • Jul 9 '20

KMP is definitely O(m+n) even in worst case, because after the table construction (O(m)) it's just a linear scan on the string (O(n)).

Akhil • Jul 9 '20

Thanks for sharing! Updated!