DEV Community

loading...

What are your code smells and/or best practices with regular expressions?

ben profile image Ben Halpern ・1 min read

Discussion (24)

pic
Editor guide
Collapse
meanin profile image
Paweł Ruciński

Lately I was trying to get information about ftp paths for couple of directories. I implement catching needed directories using regular expressions. But this is not a common solution.

Despite the fact that I am a big fan of regex at all, I mainly use it to catch something in large documents. Mostly I try to use web solution like regex101 (I am not an author of this, but find it very useful).

Collapse
danidee10 profile image
Osaetin Daniel

Thumbs up for Regex 101. I don't think i can write Regular expressions without it...Lol

Collapse
pawelsalawa profile image
Pawel Salawa

For the same there is debuggex.com/ which I find more friendly and readable.

Thread Thread
danidee10 profile image
Osaetin Daniel

Wow! debuggex seems to be better. Visualization is really important (Not just for Regex).

Thanks Pawel

Collapse
allanjeremy profile image
Allan N Jeremy

Thanks for the link. Definitely a life saver

Collapse
allanjeremy profile image
Allan N Jeremy

'Inappropriate intimacy' for code smells ~ usually end up with functions/methods (that should belong in different classes) in the same class. This in turn leads to areas that shouldn't interact being tightly coupled. Often end up refactoring

Also, forgot the name of the code smell but ~ having code for a feature that you think will be there in future but isn't there yet. Makes it hard to debug after a long time away from the code. I bet most of you guys have this too?

As for regex, regex makes me cry. Still practicing. Curious as to what resources you guys used to learn/reference regex. I use RegExr for referencing and testing

Edit: Seen @Pawel's link to Regex101 ~ definitely going to be fiddling around with that more.

Collapse
jorinvo profile image
jorin

Regular expressions can be really quick solutions to complex parsing problems.
But at a certain level of complexity or performance requirements a custom solution (usually some sort of state machine) is better suited.

If you see a regular expression as a good fit, please always leave a detail comment above it.

In case someone is trying to figure out if a RegEx is the right fit for their HTML parsing problem here is my favorite post on that topic.

Collapse
mikesimons profile image
Mike Simons

Use named capture groups.
It makes it so much easier to correlate results with the matching pattern.

Collapse
craser profile image
Chris Raser

I have no idea how I didn't know about those before. You have changed my life.

Collapse
jtvanwage profile image
John Van Wagenen

This. Amen. I was about to say something similar.

Quick story: when I started with my current employer, one of the first things I did with my mentor was strip out over 900 regular expressions and replace them with an HTML parser. One lesson that was hit home during that time is regular expressions are for "Regular" languages (meaning they have some predictable pattern to them). In our case, HTML is not a regular language and regular expressions was a poor choice that caused many, many bugs. It got much better when we replaced them with an HTML parser.

Collapse
craser profile image
Chris Raser

I usually add a comment with two key things: a sample of the input it's supposed to parse, and a list of what's getting pulled out as a capture group.

But even then, I use them with reluctance, and only with rigorous unit tests to make sure they do exactly what I want.

Collapse
lexlohr profile image
Alex Lohr

When I finally really understood regular expressions, I was overusing it for everything for a short time. My colleagues were joking that my code was becoming unreadable for being riddled with/by RegExp and that I was writing more "strange Emojis" than JavaScript.

My current best practice is: if the comment that is necessary to explain the RegExp to my colleagues is longer than the RegExp itself, it's better to split it into multiple parts or rewrite the code into an understandable parser. Vice versa, the code smell would be any RegExp that requires a comment longer than itself to understand it will in 99% of all cases better be rewritten without (too long) regular expressions.

The other (obvious) code smell is the attempt to use regular expressions to solve irregular problems (like HTML).

Collapse
moopet profile image
Ben Sinclair

Use them all the time outside of "programming", like in shell commands or quick one-off hacks. Only use them in code when they're run against predictable input.
That's it. Just validate that they match something before assuming that they're right, and if your predictable input fails to match, log the context.

Collapse
sivakumar_kailasam profile image
Sivakumar Kailasam

Other have expressed most code smells I've dealt with. Whenever we'd have to use a regex, we make sure we use libraries from verbalexpressions.github.io/. Its an agreeable balance between readable code & using regexes when necessary.

Collapse
gsonderby profile image
Gert Sønderby

Definitely multiple simple expressions over one big one. The benefits of simplicity far outstrip any performance that might be gained, and greatly improve correctness.

Collapse
recursivefaults profile image
Ryan Latta

Yes!

I personally see regular expressions no better than a magic string. They stand out in code like a sore spot in your mouth. The more that regex tries to do the more that sore hurts and paradoxically becomes something you keep on agitating.

My approach with regexes, if I truly decide they are needed, focuses on creating progressive filters. I'll use regexes that progressively hone in on what I am exactly looking for.

I don't use regexes for validation. Creating a correct regex that validates anything is way harder than it seems, and the result is confusing and unmaintainable.

Collapse
gsonderby profile image
Gert Sønderby

I saw a regex for validating an SMTP adress once, and it was a glorious, ridiculous piece of work. Something like 600 or 700 characters worth. Pretty sure that represents an anti-pattern! :-P

I mainly use regexes for searching/replacing, or for testing for presence of specific fragments. A common one for me is checking to see if a link URL is an absolute address, a mail address or a relative path, for example. Simple, and reasonable.

Collapse
allanjeremy profile image
Allan N Jeremy

YAGNI, haha, perfect. Read your post, loved it. Definitely agree with all the points on there

Collapse
jorinvo profile image
jorin

Haha sorry didn't click the link and didn't expect the golden hammer to be the same HTML issue. Definitely a classic :D