loading...

Do we need Regex?

heyfirst profile image First Kanisorn Sutham ・1 min read

My question is "Do we need regex?".

Some people use regex and said

"It's useful! Short and easy to understand. Only one pattern can show you want to get"

but some people don't use regex because they said

"It's over engineer, Sometime just manipulate string. simple method like .split .substring is enough. And easier to read the logic and understand"


I slightly agree with second opinion because developer know the basic of string method and basic loop/condition.

So what do you all think about it, people?

please share πŸ™πŸ»

Discussion

markdown guide
 

IMHO, regular expressions are a very powerful tool that should be used only when needed.
Even though It has built in support in many programming languages, it is a language all by itself, and you need to be fluent in that language before you can use it all over the place.

Short, well documented regular expressions that saves you multiple lines of code are great and I do use them from time to time.

The most recent example was when I had to extract some data from a string provided by a 3rd party in the form of "id=value id=value" - the entire regular expression was short and sweet: "\s17=([+-]?\d+)\s"

On the other hand, long regular expressions are hard to read and maintain and I do prefer to write more lines of code for what I can get from a regular expression like that.
An example for this would have to be the regular expression recommended by Microsoft to validate an email address:

 @"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
 @"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-0-9a-z]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$"

This is something You'll never see in my code. Ever.

I would much rather suffering the few false positives the MailAddress constructor has than have to read that regular expression.

 

I feel myself in matrix movie when I start writing regex :D I feel it complicated and not easy to remember however I tried to study it more than 3 times. \

But let us say a fair word:

  • Regex is very powerful and fast in matching complex string patterns
  • Regex can substitute multiple lines of code with a simple formula
  • Regex is supported by nearly all programming languages
  • Regex is stable

but,

  • Regex is complex
  • Regex is tricky, some formulas you may interpret them as different than their actual function.
 
 

Oh, I love a good regex fight! :-)

The great thing about regex is that they are a powerful DSL which exist inside many command line tools (e.g., grep, sed, etc) across multiple platforms and also available in most programming languages.

The terrible thing about regex is that it can be a particularly terse DSL with no single reference implementation, making readability and portability difficult.

I've become more comfortable with using regex both from the command line and inside languages over the years I've been a developer, but I have certainly shot myself in the foot many times trying to debug my own expressions. I'm much happier to use regex when scripting on the command line because I'm far less comfortable writing shell scripts to avoid using regex.

Ultimately, regex is a very powerful tool and each person should come to their own level of comfort with when and how they should be used. When working within a team, this should be a conversation/agreement at team level.

 

Thanks for sharing! I agree with you about using Regex in the command line also. that's right. Another thing about implementation.

Sometimes, when I use Javascript and Java or C#. I use the same regex pattern but the result isn't the same. :*( I hope this is the terrible thing like you mention

The terrible thing about regex is that it can be a particularly terse DSL with no single reference implementation,

 

I think regex are great.

Mostly, I think they're great for use outside code. With tools like vim, sed and grep.

In code, I think they can be great, but used in the right way. For example, if you want to know if a string starts with something, and your language provides a code startswith() function, you should use that.

If you want anything more complex, split the regex over multiple lines and comment it. This makes it easier to read, maintain and version-control.

 

If you're dealing with .split(), you're dealing with regular expressions. That isn't an escape.

Yes, regular expressions can get big and complex. In my language of choice, you can add comments within your regular expressions. One hand, that's a strong testament to their power and usefulness, but on the other hand, heavy comment on their unreadability.

So, developers should learn to write regular expressions. They should learn to write readable and comprehendible regular expressions. And they should know alternatives that might serve the problem and the next developer (which might be Future Them) better.

 

Besides the implementation in codes even you don't want to use it in your implementation.

Regular Expressions are really powerful tools to improve your productivity such as refactoring, some conditional searching, and replacing.

My quick suggestion in implementation is it depends on how complicate of the value you want to replace or matching for instants you want to take some part of the string and capture it to variable

validation status is 12044,  and value is \"Lazy fox jumping\"
                     ^^^^^^ want this     ^^^^^^^^^^^^^^^^^^ and want this

with traditional looping will cause extra logic to extract data from the input and harder to maintain but with Regular Expressions, it would be easier with capturing group like

.*([0-9]{5}).*"([a-zA-Z\s].*)"$ demo

but at the end of the day, it depends on the developer who maintains the project to decide this!