DEV Community

Discussion on: Regex was taking 5 days to run. So I built a tool that did it in 15 minutes.

Collapse
 
paddy3118 profile image
Paddy3118

Hmm. I applaud you for creating a library that is useful, but, having a solution that works in 5 days - If you had a multi-core machine you might have used N instances of your regex program running on 1/n'th of your inputs to get it down to running in, say, a day?

Given hundreds of replacements, I would have at least got estimated run times for that version where you look up each word in a dictionary of replacements.
(Simplistically):

out = ' '.join([lookup.get(word, word) for word in text.strip().split()])

But the community gains a new tool! I applaud you sir :-)

Collapse
 
vi3k6i5 profile image
Vikash Singh • Edited

@paddy3118 FlashText is designed to deal with multi term keywords like 'java script' getting replaced with 'javascript'. There is also the problem of extracting java from I like java. (notice the full stop in the end). There are multiple other problems with assuming that this problem is as simple as you assumed it to be.

PS: You assuming that I wouldn't have tried your suggestion is fine, but you assuming that everyone who clapped are not smart enough to figure your suggestion by themselves is not.

Collapse
 
paddy3118 profile image
Paddy3118

but you assuming that everyone who clapped are not smart enough to figure your suggestion by themselves is not

Not sure of what you are accusing me of there?

On the comment on what makes a word, and multi-word lookups then solving those issues could be thought of as a task you would have to do for your new library, but the new library then goes on to use a trie whereas dicts are a built-in datatype.

Thread Thread
 
vi3k6i5 profile image
Vikash Singh

I am sorry, I can't help you. Let's move on in life and do better things :)

All the best :)