Hmm. I applaud you for creating a library that is useful, but, having a solution that works in 5 days - If you had a multi-core machine you might have used N instances of your regex program running on 1/n'th of your inputs to get it down to running in, say, a day?
Given hundreds of replacements, I would have at least got estimated run times for that version where you look up each word in a dictionary of replacements.
(Simplistically):
out = ' '.join([lookup.get(word, word) for word in text.strip().split()])
But the community gains a new tool! I applaud you sir :-)
@paddy3118
FlashText is designed to deal with multi term keywords like 'java script' getting replaced with 'javascript'. There is also the problem of extracting java from I like java. (notice the full stop in the end). There are multiple other problems with assuming that this problem is as simple as you assumed it to be.
PS: You assuming that I wouldn't have tried your suggestion is fine, but you assuming that everyone who clapped are not smart enough to figure your suggestion by themselves is not.
but you assuming that everyone who clapped are not smart enough to figure your suggestion by themselves is not
Not sure of what you are accusing me of there?
On the comment on what makes a word, and multi-word lookups then solving those issues could be thought of as a task you would have to do for your new library, but the new library then goes on to use a trie whereas dicts are a built-in datatype.
Hmm. I applaud you for creating a library that is useful, but, having a solution that works in 5 days - If you had a multi-core machine you might have used N instances of your regex program running on 1/n'th of your inputs to get it down to running in, say, a day?
Given hundreds of replacements, I would have at least got estimated run times for that version where you look up each word in a dictionary of replacements.
(Simplistically):
out = ' '.join([lookup.get(word, word) for word in text.strip().split()])
But the community gains a new tool! I applaud you sir :-)
@paddy3118 FlashText is designed to deal with multi term keywords like
'java script' getting replaced with 'javascript'
. There is also the problem of extracting java fromI like java.
(notice the full stop in the end). There are multiple other problems with assuming that this problem is as simple as you assumed it to be.PS: You assuming that I wouldn't have tried your suggestion is fine, but you assuming that everyone who clapped are not smart enough to figure your suggestion by themselves is not.
Not sure of what you are accusing me of there?
On the comment on what makes a word, and multi-word lookups then solving those issues could be thought of as a task you would have to do for your new library, but the new library then goes on to use a trie whereas dicts are a built-in datatype.
I am sorry, I can't help you. Let's move on in life and do better things :)
All the best :)