What are your code smells and/or best practices with regular expressions?

Did you find this post useful? Show some love!

Lately I was trying to get information about ftp paths for couple of directories. I implement catching needed directories using regular expressions. But this is not a common solution.

Despite the fact that I am a big fan of regex at all, I mainly use it to catch something in large documents. Mostly I try to use web solution like regex101 (I am not an author of this, but find it very useful).

Thumbs up for Regex 101. I don't think i can write Regular expressions without it...Lol

For the same there is debuggex.com/ which I find more friendly and readable.

Wow! debuggex seems to be better. Visualization is really important (Not just for Regex).

Thanks Pawel

Thanks for the link. Definitely a life saver

Regular expressions can be really quick solutions to complex parsing problems.
But at a certain level of complexity or performance requirements a custom solution (usually some sort of state machine) is better suited.

If you see a regular expression as a good fit, please always leave a detail comment above it.

In case someone is trying to figure out if a RegEx is the right fit for their HTML parsing problem here is my favorite post on that topic.

Lol, that's the same one I linked to. Quite a classic.

I agree, leaving a comment or some form of documentation is a great idea.

Haha sorry didn't click the link and didn't expect the golden hammer to be the same HTML issue. Definitely a classic :D

Use named capture groups.
It makes it so much easier to correlate results with the matching pattern.

I have no idea how I didn't know about those before. You have changed my life.

Ben Halpern DEV.TO FOUNDER

Hey there, we see you aren't signed in. (Yes you, the reader. This is a fake comment.)

Please consider creating an account on dev.to. It literally takes a few seconds and we'd appreciate the support so much. ❤️

Plus, no fake comments when you're signed in. 🙃

I usually add a comment with two key things: a sample of the input it's supposed to parse, and a list of what's getting pulled out as a capture group.

But even then, I use them with reluctance, and only with rigorous unit tests to make sure they do exactly what I want.

Unless I need the performance, I'd rather do without regex... especially if it gets too complicated and someone else will have to maintain my code.

My favorite "smells" are validating email addresses and using it as a golden hammer.

(fyi, for those who haven't heard the term "golden hammer" before...)

This. Amen. I was about to say something similar.

Quick story: when I started with my current employer, one of the first things I did with my mentor was strip out over 900 regular expressions and replace them with an HTML parser. One lesson that was hit home during that time is regular expressions are for "Regular" languages (meaning they have some predictable pattern to them). In our case, HTML is not a regular language and regular expressions was a poor choice that caused many, many bugs. It got much better when we replaced them with an HTML parser.

Anything with more than 3 capture groups is a smell. It means I either need a real parser or need to break up the input into smaller pieces.

'Inappropriate intimacy' for code smells ~ usually end up with functions/methods (that should belong in different classes) in the same class. This in turn leads to areas that shouldn't interact being tightly coupled. Often end up refactoring

Also, forgot the name of the code smell but ~ having code for a feature that you think will be there in future but isn't there yet. Makes it hard to debug after a long time away from the code. I bet most of you guys have this too?

As for regex, regex makes me cry. Still practicing. Curious as to what resources you guys used to learn/reference regex. I use RegExr for referencing and testing

Edit: Seen @Pawel's link to Regex101 ~ definitely going to be fiddling around with that more.

About the code smell, one of the terms is YAGNI (you aint gonna need it). Another one is future-proofing. I've fallen into that trap too... even wrote about it on here just the other day.

That Regex101 site is the one I like to use. It's great that it lists out step-by-step exactly what your regex statement is doing and why.

YAGNI, haha, perfect. Read your post, loved it. Definitely agree with all the points on there

When I finally really understood regular expressions, I was overusing it for everything for a short time. My colleagues were joking that my code was becoming unreadable for being riddled with/by RegExp and that I was writing more "strange Emojis" than JavaScript.

My current best practice is: if the comment that is necessary to explain the RegExp to my colleagues is longer than the RegExp itself, it's better to split it into multiple parts or rewrite the code into an understandable parser. Vice versa, the code smell would be any RegExp that requires a comment longer than itself to understand it will in 99% of all cases better be rewritten without (too long) regular expressions.

The other (obvious) code smell is the attempt to use regular expressions to solve irregular problems (like HTML).

Use them all the time outside of "programming", like in shell commands or quick one-off hacks. Only use them in code when they're run against predictable input.
That's it. Just validate that they match something before assuming that they're right, and if your predictable input fails to match, log the context.

Definitely multiple simple expressions over one big one. The benefits of simplicity far outstrip any performance that might be gained, and greatly improve correctness.

Yes!

I personally see regular expressions no better than a magic string. They stand out in code like a sore spot in your mouth. The more that regex tries to do the more that sore hurts and paradoxically becomes something you keep on agitating.

My approach with regexes, if I truly decide they are needed, focuses on creating progressive filters. I'll use regexes that progressively hone in on what I am exactly looking for.

I don't use regexes for validation. Creating a correct regex that validates anything is way harder than it seems, and the result is confusing and unmaintainable.

I saw a regex for validating an SMTP adress once, and it was a glorious, ridiculous piece of work. Something like 600 or 700 characters worth. Pretty sure that represents an anti-pattern! :-P

I mainly use regexes for searching/replacing, or for testing for presence of specific fragments. A common one for me is checking to see if a link URL is an absolute address, a mail address or a relative path, for example. Simple, and reasonable.

Other have expressed most code smells I've dealt with. Whenever we'd have to use a regex, we make sure we use libraries from verbalexpressions.github.io/. Its an agreeable balance between readable code & using regexes when necessary.

Classic DEV Post from Apr 30

How engineers can stand out from the applicant pool

Technical founders share stories and advice about how software engineers can stand out from the applicant pool.

READ POST
Follow @lynnetye to see more of their posts in your feed.
dev.to is now open source!
View Announcement Post View GitHub Repo
Ben Halpern
A Canadian living in New York, having a lot of fun cultivating this community! Creator and webmaster of dev.to.
Trending on dev.to
How custom is your setup?
#discuss #ide #font #color
VerbalExpressions - RegularExpression made easy
#regex #regularexpressions
What productivity tools/hacks do you find most effective for your day-to-day?
#discuss #productivity
Do we need standup?
#agile #discuss #productivity
Explain Hashing + salting Like I'm Five
#explainlikeimfive #webdev
How Do You Really Get Hired?
#careers #beginners #discuss
Tell me a good IT joke
#discuss #jokes #fun
Dev.To Discord Channel?
#discuss