Ben Halpern

Posted on Jan 4, 2022

How do you feel about regex?

#regex #discuss

Regex seems to have a broad array of love and hate. How do you feel about it? Do you seek to use or avoid it as a problem solver, and how much do you understand it?

Top comments (65)

Andrew Bone • Jan 4 '22

I like it but I tend to tell people to avoid it if possible just because it makes code hard to read.

If people on the team do use regex I ask them to include a link to regexper.com/ in a comment.

It generates a flow chart like this one.

// https://regexper.com/#%2F%5E%5Cw%2B%28%5B-%2B.'%5D%5Cw%2B%29*%40%5Cw%2B%28%5B-.%5D%5Cw%2B%29*%5C.%5Cw%2B%28%5B-.%5D%5Cw%2B%29*%24%2F
const emailReg = new RegExp(/^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/);
console.log(emailReg.test('user@place.com'))

Calin Baenen • Jan 4 '22

Oooh. I like this tool you introduced upon everyone.
Personally, if a regex is confusing me, I'll just pop over to regex101 for a quick test. It's also a great tool.

lepinekong • Jan 9 '22 • Edited

I prefer regexr.com because regex101 is not embeddable in figjam document figma.com I use to keep code notes as it supports code blocks.

Other tools I find pretty but buggy for complex regex are :

regexper.com/
jex.im/regulex
extendsclass.com/regex-tester.html

Pritesh Usadadiya • Jan 4 '22

I didn't know about one. definitely useful. Thanks for sharing.

Jesus Guerrero • Jan 6 '22

Wow thanks for sharing that tip.

Andy Piper • Jan 5 '22

Now, this is really cool and useful - thank you!!

Alejandro Villamarin • Jan 9 '22

didn't know that service, seems awesome, also worth a try cucumber.io/docs/cucumber/cucumber..., makes regular expressions more human friendly and readable

lepinekong • Jan 9 '22 • Edited

Have tried them all I think, that one is nice but it doesn't work for very complex regular expressions (like with some lookaround expressions) whereas it works with regexr.com

Drew Clements • Jan 4 '22 • Edited

I don't write regex. I look up what I need to when I need it and I never think about it again. 🤣

I've found that the specificity of what I need when I do need it can vary so vastly, and that the library for it is so large, that the best use of my time is figuring out what I need at that specific moment and moving on.

It's not needed often enough to warrant full deep dive into it, for me at least.

Matti Bar-Zeev • Jan 4 '22

You write a regex and don't write a comment next to it describing what it does, you and I gonna have a lil' chat ;)

Waylon Walker • Jan 4 '22

I recently had a case where the person I was reviewing added regex, and they did comment what it did well, and honestly the comment looked exactly correct. They were missing tests, and it wasn't until they ran tests that they realized that what they had done was not what they described in the comment.

Matti Bar-Zeev • Jan 4 '22

Hey, tests go without saying, right?
;)

zakwillis • Jan 7 '22

As long as it can be tested proven, and injectable (i.e. not hard coded) should be fine? :)

Waylon Walker • Jan 4 '22

Works great in my terminal and editor. I can search for something till it looks right, then replace it with what i want.

It production code its so hard to see all the edge cases. As a Data Engineer performance is about last on my list of metrics, as most of what I do is run on a schedule, not when someone clicks it, so an extra 5s on a 10 minute run is no big deal. With that said every time I see a regex in a code review I ask can we do this without a regex, even with a huge performance hit, or write tests for every possible edge case we can brainstorm.

Bill Raymond • Jan 4 '22

I am what you might call an ocassional developer. As an author, I created an app that helps me convert my work to ePub format.

One of the requirements with ePub is that you output all your files to xHTML. However, the output from my word processing software (Microsoft Word) outputs to some very haphazardly developed HTML that is not xHTML compliant.

I found a whole bunch of libraries that allow me to manipulate the output to xHTML, but in fact they did not do many of the things required to pass basic specifications. Specifically, in xHTML, tags must be lowercase. All of the libraries I worked with at the time made broad assumptions about the HTML structure and worse, did not do the basic conversion from uppercase tags to lowercase tags.

After spending way too much time dealing with this, I paid a developer to figure out the problem. One day later, he delivered three lines of regex code that handled the uppercase to lowercase issue and two other problems I was dealing with. Nearly a month of work and me trying to understand how all these libraries work and sudenly I had working code. My books are thousands of pages long and broken up into dozens of files. The regex code worked great everytime.

Having not known about regex until that moment, I went about using it everywhere. I used it when my XML output was not correct. I used it to fix the file output where special characters should be renamed with escape codes and much more.

I am willing to bet that while my code technically works, all that regex is probably a bad idea. Also, regex is not easy to read, so you really have to document it well.

Overall, I really do not like the structure because of it being hard to read, but will say that without it, I probably would have just tossed my app into the trash bin had I spent two more weeks on something as simple as fixing uppercase and lowercase letters along with a few edge case issues.

The drive to complete my app drove me to use regex in areas I probably should not have used it. For example, I used regex to modify some XML files and since I did not really know XML that well, I simply gave up learning it. Instead, I used the XML output from a library and then modified the file with REGEX. Really, I should have sat down and learned XML a little more.

Regex feels like one of those things we all refer to as monolith applications. It seems like you can do anything and everything with it, but the complexities in how you write proper regex and then create test cases all feels very convoluted. At the same time, there is something very tantalizing about 1-3 lines of regex code that would otherwise require customizing libraries, creating a custom API, or doing something else to solve some basic problems.

I am curious, do we know if there are alternatives that are easier to understand and use?

Ben Sinclair • Jan 5 '22

If there's a problem where it makes sense, I love it.
Regular expressions can be written in a clear and easy-to-read way, across multiple lines and with comments. They can condense a lot of logic into something easy to parse by humans, even though their reputation says otherwise. Trying to replicate what they do with a bunch of separate if contains(..) and startsWith(..) and not contains(..) methods is a hack, imo.

Using them for anything where a simple single, named method would suffice is a bad idea.

Jonathan • Jan 5 '22 • Edited

How do you feel about it?

Good. It's a useful tool.

Do you seek to use or avoid it as a problem solver

I treat it similarly to SQL or CSS – use it where appropriate, but try to keep it behind an abstraction if possible. E.g. I would wrap a phone number RegEx in a function such as isValidPhoneNumber.

If it's simple to solve the problem without a RegEx then I'll solve it without the RegEx. But sometimes a RegEx is simpler, e.g. the above example.

how much do you understand it

RegEx language generally – I know the basic concepts well enough, but I always check reference materials and/or use a RegEx tool when implementing one.

Specific RegExs – I almost never re-use a RegEx without first taking it apart and making sure I understand what's going on (same with any code snippet really).

bob.ts • Jan 4 '22

I love working with Regular Expressions.

I use them frequently in VS Code and have a few articles out here on implementing them for Search-and-Replace.

I had a project where I had to replace an AS400 Custom Script Search Language with a JavaScript version. I quickly learned how slow they are when running hundreds of them per line. It also prompted me to create a new tool for documenting Regular Expressions (github.com/bob-fornal/reggie-docs).

zakwillis • Jan 7 '22

I think they are great. They don't work in all situations.

ultrapico.com/expresso.htm is for .Net and a great tool for creating regex expressions, it doesn't support all elements of regex. Am not a super genius on them but they are an essential element of development for me when handling information.

Here is an example of configuration from a translation file from my application which is an awesome DevOps Deployment application am hoping to market in future.

The configuration below is to either find a value or a regex expression in an input file (typically itself an application configuration file) to clean a development configuration file. Sounds complex, but a great way of translating source configuration into something generic for different environmental deployment.

Regexes provides more control, but aren't always the best approach.

inforhino.co.uk/beta/automation-an...

                        {"IsRegex": false, "FindValue" : "C:\\\\InfoRhino\\\\cms\\\\IRWebsite\\\\IRWebsiteCMS\\\\wwwroot\\\\ClientConfiguration\\\\TConfigType.json", "ReplacementValue" : "{TargetServerRoot}\\\\ClientConfiguration\\\\TConfigType.json"}
                        ,
                        {"IsRegex": false, "FindValue" : "C:\\\\InfoRhino\\\\cms\\\\Adverts", "ReplacementValue" : "{TargetServerRoot}\\\\Store\\\\Adverts"}
                        ,
                        {"IsRegex": false, "FindValue" : "C:\\\\InfoRhino\\\\cms\\\\IRWebsite\\\\IRWebsiteCMS\\\\wwwroot", "ReplacementValue" : "{TargetServerRoot}\\\\wwwroot"}
                        ,
                        {"IsRegex": false, "FindValue" : "content\\\\Cards", "ReplacementValue" : "Cards"}
                        ,
                        {"IsRegex": true, "FindValue" : "(?is-nx:(?<=\"CardDomainHeaderCaption\"\\:(\\s{0,1})\")([a-z0-9\\s\\,\\.]+))", "ReplacementValue" : "{CardDomainHeaderCaption}"}
                        ,
                        {"IsRegex": true, "FindValue" : "(?is-nx:(?<=\"CardDescription\"\\:(\\s{0,1})\")([a-z0-9\\s\\,\\.]+))", "ReplacementValue" : "{CardDescription}"}
                        ,                       
                        {"IsRegex": true, "FindValue" : "(?is-nx:(?<=\"CompanyName\"\\:(\\s{0,1})\")([a-z0-9\\s\\,\\.]+))", "ReplacementValue" : "{CompanyName}"}

PNS11 • Jan 5 '22

For being terse it's surprisingly hard to read compared to other forms of computer programming line noise, like old Perl that runs on oak barrels and mules or everyday J.

Requiring visualisation tools to be somewhat interpretable outside the trivial case means one can't just skim it and be fairly confident about what it does, unlike the surrounding application code (hopefully).

If I can take the performance hit or development time I'll probably avoid regex if I can, usually it's possible to implement a parser that's easier to understand at a quick glance.

So I mostly use them in CLI settings, like patterns for ripgrep or sed.

View full discussion (65 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

How do you feel about regex?

Top comments (65)

Read next

Domain Driven Design: making a domain modeling

8 Type of Load Balancing

Day 02: Learning JavaScript APIs: Web Share API

Merry Christmas!🎄