DEV Community

Discussion on: Regex was taking 5 days to run. So I built a tool that did it in 15 minutes.

Collapse
 
xowap profile image
Rémy 🤖

Suppose that your input is one of

  • My name is remy
  • My name is RÉMY
  • My name is Rémy

Then your output would be

My name is Rémy

It's like when you say you want to replace different instances of JavaScript. If you want JavaScript formated the same way all the time then you can use this technique to achieve that.

Thread Thread
 
vi3k6i5 profile image
Vikash Singh

That can already be done right?

print(kp.replace_keywords(normalize('My name is remy')))
print(kp.replace_keywords(normalize('My name is RÉMY')))
print(kp.replace_keywords(normalize('My name is Rémy')))

output:
my name is Rémy
my name is Rémy
my name is Rémy

Thread Thread
 
xowap profile image
Rémy 🤖

Yup but then you're getting my name is Rémy instead of My name is Rémy.

Also it would allow to process the string without holding it several times in memory (and thus possibly to work on a stream). If you're dealing with big texts it might be interesting as well

I don't have a direct application right now though, but from the things I usually do I'm guessing it would make sense.

Thread Thread
 
vi3k6i5 profile image
Vikash Singh

Ok Remy, Btw, if we change normalize method to not lower the text your requirement will be solved.

def normalize(c):
    return unidecode(c)

output:
My name is Rémy
My name is Rémy
My name is Rémy

Also, if I call normalize from within FlashText or outside FlashText it will be the same amount of memory and computation.

Still, I will keep looking for a possible use case for your suggestion. Thanks for bringing it up :) :)