DEV Community

loading...

Discussion on: The algorithm behind Ctrl + F.

Collapse
zanehannanau profile image
ZaneHannanAU

Regexes are generally functions, with arbitrary rules as to their composition. For example, "all 10 or 11 digit phone numbers starting with +91" might be expressed in regex as "(?:\+91|\(\+91\))[ -]?\d{3}[ -]?\d{3}[ -]?\d{3}", but a compiled function might use many other tricks to find its way through the document, be it a trie (moderately memory intensive) or whatever.

Regexes are just a simplified means of expressing functions, with their own grammatical structure.

Simple byte lookups (especially in an ASCII or ASCII compatible document) are dozens if not hundreds or thousands of times faster than composition of a function like that.

Collapse
akhilpokle profile image
Akhil Author

Yea, it depends on various factors like how many time regex is being executed etc.

Eg : If your string is 'QABC' and pattern is 'ABC' then the naive algorithm will perform better.

I read somewhere about the progress being made in fast string matching with regex using pattern matching algorithms with them.

Thread Thread
zanehannanau profile image
ZaneHannanAU

That first one is true in js, but in most languages it's false.

Using regex to find string matches is still quite slow, but does work fairly well. In a compiled language, like rust, c, or go, it will be quite consistent, and have a constant time (unless gc interrupts it).

The short of it is: avoid regexes where possible. There are many premade solutions available.