DEV Community

loading...

The algorithm behind Ctrl + F.

Akhil on July 05, 2020

Ctrl + F on chrome opens up a search box that is used to find text on a web page, pdf, etc. It's one of the fastest ones I have seen and decided to...
pic
Editor guide
Collapse
johnphamous profile image
John Pham • Edited

Great write up on the different string matching algorithms!

The algorithm which we will implement might be similar to the one used in Chrome, but since its Google we're talking about, they might've made optimizations

The source code for the find in page is actually open source! You can see how it's implemented here: source.chromium.org/chromium/chrom...

Actual implementation: source.chromium.org/chromium/chrom...

Looks like they swap in different algorithms based on the context.

Collapse
akhilpokle profile image
Akhil Author

OMG ! thanks for sharing :). I read on StackOverflow that somewhere around 2008-2010 when chrome was picking of, they shared how they've implemented a version of Boyer Moore's pattern matching algorithm. But I couldn't find it on youtube.

Collapse
brugui7 profile image
Alejandro Brugarolas

Amazing post!!!!

I'll left here an implementation of BM and KMP I did in C for an university task just in case someone want to dig deeper on this.

github.com/Brugui7/Algorithmics/tr...

Collapse
akhilpokle profile image
Akhil Author

That's awesome! Thanks for reading :)

Collapse
jmau111 profile image
Julien Maury

very nice read, thanks

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
ben profile image
Ben Halpern

Very cool!

Collapse
akhilpokle profile image
Akhil Author

OMG !! Thanks for reading 🙏🙏🙏

Collapse
ben profile image
Collapse
nyalla profile image
Naveen Yalla

Good read

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
urielbitton profile image
Uriel Bitton

very interesting!

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
kshcode profile image
SeongHoon Kim • Edited

That code does not work for a string which is consisting of only the same character. (e.g. pattern = 'TT')

so, after replacing 'skip = Math.max(1,j-map[string[i+j]]);' with 'skip = Math.max(1,j-map[string[i+j].charCodeAt(0)]);', working correctly.

Collapse
akhilpokle profile image
Akhil Author

Thanks for pointing out :) Code updated!

Collapse
kshcode profile image
SeongHoon Kim

well, the combined code is still the same as before.

Collapse
ekdnam profile image
Aditya Mandke

Hey Akhil! Amazing article! I had always wondered how is chrome able to find matching strings so fast.
One question, is this type of pattern matching slower or faster than using Regular Expressions (say in python)?

Collapse
samwightt profile image
Sam Wight

It depends on the complexity of the regex you're searching with. Regex engines usually build some sort of state machine from your regex, sort of like a compiler. Then they use that to check against the string. A state-machine based implementation will just be slower because of all of the extra variable changes it's having to do, but with simple enough regexes the compiled state machine might be fairly similar to this.

Regexes also aren't guaranteed to be much faster than a hand-written implementation for things like finding phone numbers, etc. They're pretty darn fast, don't get me wrong, but there might possibly be optimizations that could be made for specific use cases. Regex is just a convenient abstraction. Much like Javascript or Python, it's "just fast enough" for the vast majority of use cases, but for some like these, you need a better implementation for it to be performant.

Collapse
akhilpokle profile image
Akhil Author

I am not sure about its speed in python, maybe StackOverflow might help with that but overall regex are faster for dynamic situations like finding all phones numbers and algorithms might be faster for finding a particular phone number in a record of million phone numbers.

Collapse
zanehannanau profile image
ZaneHannanAU

Regexes are generally functions, with arbitrary rules as to their composition. For example, "all 10 or 11 digit phone numbers starting with +91" might be expressed in regex as "(?:\+91|\(\+91\))[ -]?\d{3}[ -]?\d{3}[ -]?\d{3}", but a compiled function might use many other tricks to find its way through the document, be it a trie (moderately memory intensive) or whatever.

Regexes are just a simplified means of expressing functions, with their own grammatical structure.

Simple byte lookups (especially in an ASCII or ASCII compatible document) are dozens if not hundreds or thousands of times faster than composition of a function like that.

Collapse
akhilpokle profile image
Akhil Author

Yea, it depends on various factors like how many time regex is being executed etc.

Eg : If your string is 'QABC' and pattern is 'ABC' then the naive algorithm will perform better.

I read somewhere about the progress being made in fast string matching with regex using pattern matching algorithms with them.

Thread Thread
zanehannanau profile image
ZaneHannanAU

That first one is true in js, but in most languages it's false.

Using regex to find string matches is still quite slow, but does work fairly well. In a compiled language, like rust, c, or go, it will be quite consistent, and have a constant time (unless gc interrupts it).

The short of it is: avoid regexes where possible. There are many premade solutions available.

Collapse
ylucet profile image
Yves Lucet

Boyer-Moore and KMP are both O(m+n) in the worst case. Please fix the typo (or check your references).

Collapse
akhilpokle profile image
Collapse
maowtm profile image
maowtm

KMP is definitely O(m+n) even in worst case, because after the table construction (O(m)) it's just a linear scan on the string (O(n)).

Thread Thread
akhilpokle profile image
Akhil Author

Thanks for sharing! Updated!

Collapse
ylucet profile image
Yves Lucet

Agreed, but your article shows O(mn).

Thread Thread
akhilpokle profile image
Akhil Author

This algorithm works well if the alphabet is reasonably big, but not too big. If the last character usually fails to match, the current shift s is increased by m each time around the loop. The total number of character comparisons is typically about n/m, which compares well with the roughly n comparisons that would be performed in the naive algorithm for similar problems. In fact, the longer the pattern string, the faster the search! However, the worst-case run time is still O(nm). The algorithm as presented doesn't work very well if the alphabet is small, because even if the strings are randomly generated, the last occurrence of any given character is near the end of the string. This can be improved by using another heuristic for increasing the shift. Consider this example:

T = ...LIVID_MEMOIRS...
P = EDITED_MEMOIRS

Collapse
aleksandrhovhannisyan profile image
Aleksandr Hovhannisyan

Step 2 seems flawed. What if a character occurs more than once in the pattern? Then you'll only store the last index.

Collapse
akhilpokle profile image
Akhil Author

Yep, that's the beauty of the algorithm since we will store the last index, the "skip" will be large. Since the skip is large if the characters from the end do not match, we will skip even more characters bringing down the overall number of comparisons.

Collapse
dulemart profile image
Dule Martins

At first, while reading I was confused about the time it takes to match each pattern.

Nice read, let me try implementing it.

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
arcticspacefox profile image
ArcticSpaceFox

Nice profile picture of gilfoyle 😂 cool post very interesting for a newbe

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
willem640 profile image
willem640

The article I didn't know I needed. Awesome!

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
hem profile image
Hem

This is interesting !

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
dpkahuja profile image
Deepak Ahuja 👨‍💻

Amazing post! Thanks

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
abcdan profile image
Daniël

Really interesting!

Collapse
ashwinsharmap profile image
Ashwin Sharma P

Good one and very informative.

Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)

Collapse
fiqrisr profile image
Fiqri Syah Redha

This is interesting. Thanks for sharing.

Collapse
akhilpokle profile image
Akhil Author

Thanks alot for reading :)

Collapse
codewithsaviour profile image
Collapse
akhilpokle profile image
Akhil Author

Thanks for reading :)