DEV Community

loading...

Discussion on: The algorithm behind Ctrl + F.

Collapse
ekdnam profile image
Aditya Mandke

Hey Akhil! Amazing article! I had always wondered how is chrome able to find matching strings so fast.
One question, is this type of pattern matching slower or faster than using Regular Expressions (say in python)?

Collapse
zanehannanau profile image
ZaneHannanAU

Regexes are generally functions, with arbitrary rules as to their composition. For example, "all 10 or 11 digit phone numbers starting with +91" might be expressed in regex as "(?:\+91|\(\+91\))[ -]?\d{3}[ -]?\d{3}[ -]?\d{3}", but a compiled function might use many other tricks to find its way through the document, be it a trie (moderately memory intensive) or whatever.

Regexes are just a simplified means of expressing functions, with their own grammatical structure.

Simple byte lookups (especially in an ASCII or ASCII compatible document) are dozens if not hundreds or thousands of times faster than composition of a function like that.

Collapse
akhilpokle profile image
Akhil Author

Yea, it depends on various factors like how many time regex is being executed etc.

Eg : If your string is 'QABC' and pattern is 'ABC' then the naive algorithm will perform better.

I read somewhere about the progress being made in fast string matching with regex using pattern matching algorithms with them.

Thread Thread
zanehannanau profile image
ZaneHannanAU

That first one is true in js, but in most languages it's false.

Using regex to find string matches is still quite slow, but does work fairly well. In a compiled language, like rust, c, or go, it will be quite consistent, and have a constant time (unless gc interrupts it).

The short of it is: avoid regexes where possible. There are many premade solutions available.

Collapse
akhilpokle profile image
Akhil Author

I am not sure about its speed in python, maybe StackOverflow might help with that but overall regex are faster for dynamic situations like finding all phones numbers and algorithms might be faster for finding a particular phone number in a record of million phone numbers.

Collapse
samwightt profile image
Sam Wight

It depends on the complexity of the regex you're searching with. Regex engines usually build some sort of state machine from your regex, sort of like a compiler. Then they use that to check against the string. A state-machine based implementation will just be slower because of all of the extra variable changes it's having to do, but with simple enough regexes the compiled state machine might be fairly similar to this.

Regexes also aren't guaranteed to be much faster than a hand-written implementation for things like finding phone numbers, etc. They're pretty darn fast, don't get me wrong, but there might possibly be optimizations that could be made for specific use cases. Regex is just a convenient abstraction. Much like Javascript or Python, it's "just fast enough" for the vast majority of use cases, but for some like these, you need a better implementation for it to be performant.