What you are doing does not even seem like a job for a Regex.
Regex is a pattern matching engine (like globbing).
A good example for what to use Regexes is filtering non-comment lines in config files (/^\s*(#|$)/), as this requires an actual pattern.
Another good thing about Regexes is their grouping
There's a pretty good reason why, at work, I am always using grep -rF foobar /etc over grep -re foobar /etc when looking for something.
Instead of
It's not that regex is slow either, it's just that you are creating a terrible overhead by using it even though you don't even use the overhead for anything.
What you probably want is lazily splitting the input by whitespace and using a replacement-map (as seen here).
There's a few important things with that:
If search&replacing in a large body of text: don't load it to memory.
You will end up replacing variable-length text, so you're either going to need some sort of Rope, or you're running into performance problems, due to lots of copies and/or reallocations.
Best thing to do is lazily reading the file and outputting it somewhere else (preferably reading stdin and writing to stdout, that way the program is very easy to use).
Second, you might want to take a look at PHF tables.
Rust has a great library for that which generates the PHF table at compile time, so that at runtime it only hashes the value with a very fast and efficient algorithm and does a single comparison to get the desired value of the map.
Update
I did implement this using the above methods (in Rust though).
The PHF is raising the compile time, currently ~12 seconds for ~50k replacement entries, which is quite okay I guess.
Runtime scales linearly with the wordlist-length:
This is doing single threaded search&replace in linear time (~1 second per ⅓ million @ 2.60GHz). Those numbers already include all the I/O to be done for the replacement.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
What you are doing does not even seem like a job for a Regex.
Regex is a pattern matching engine (like globbing).
A good example for what to use Regexes is filtering non-comment lines in config files (
/^\s*(#|$)/
), as this requires an actual pattern.Another good thing about Regexes is their grouping
There's a pretty good reason why, at work, I am always using
grep -rF foobar /etc
overgrep -re foobar /etc
when looking for something.Instead of
It's not that regex is slow either, it's just that you are creating a terrible overhead by using it even though you don't even use the overhead for anything.
What you probably want is lazily splitting the input by whitespace and using a replacement-map (as seen here).
There's a few important things with that:
If search&replacing in a large body of text: don't load it to memory.
You will end up replacing variable-length text, so you're either going to need some sort of Rope, or you're running into performance problems, due to lots of copies and/or reallocations.
Best thing to do is lazily reading the file and outputting it somewhere else (preferably reading stdin and writing to stdout, that way the program is very easy to use).
Second, you might want to take a look at PHF tables.
Rust has a great library for that which generates the PHF table at compile time, so that at runtime it only hashes the value with a very fast and efficient algorithm and does a single comparison to get the desired value of the map.
Update
I did implement this using the above methods (in Rust though).
The PHF is raising the compile time, currently ~12 seconds for ~50k replacement entries, which is quite okay I guess.
Runtime scales linearly with the wordlist-length: