loading...
Cover image for Boost your Productiveness with RegEx (a little)

Boost your Productiveness with RegEx (a little)

stealthmusic profile image Jan Wedel ・4 min read

I love RegEx, I use it every day and I will show you how to use it to easily get some smaller and larger tasks done.

But...

Don’t use it in production

Ok, first things first: Be very careful using RegEx for anything in production code if you're not absolutely certain it's actually necessary.

This is an example of what could happen. In 95% of the cases, it's much safer and easier to comprehend to use simple loops to go over data, using something like String.contains() or String.split(delimiter) to search and break strings up in a simple and readable way.

[EDIT] To be very clear: I mean what I said above. Don’t use anything I show you here in production. I personally only use that on log files, test data and manual data creation.

Tools

There is actually no special tool I use. Every more or less sophisticated text editor or IDE supports RegEx in search an replace. Most of the work I personally do in Sublime Text, sometimes in IntelliJ.

Useful RegEx

This is how I most often use RegEx in my day-to-day life.

Replace start end of line

Consider you have the following text

Flour
Eggs
Milk
Salt
Maple sirup

And you want to make a bulleted list. You could obviously enter a * in front of every line manually. But, you can use RegEx, of course.

Search Replace by
^ *

This will result in:

* Flour
* Eggs
* Milk
* Salt
* Maple sirup

The ^ is a special character that matches the beginning of a line. Replacing this with one or more characters will prefix each line.

The same goes for end of a line. Let's say you need to add a comma at the end of each line.

"Foo"
"Bar"
"Baz"
Search Replace by
$ ,
"Foo",
"Bar",
"Baz",

The last comma might be unnecessary and thus must be removed manually. There is a more sophisticated search to fix this but most of the time it's not worth the effort. It's always good to let RegEx do the heavy lifting and fix the resulting 2% manually.

Swapping Columns

Assume we got the following data

"foo":8,
"bar":42,
"baz":13,
Search Replace by
"(\w+)":(\d+), "$2":"$1",
"8":"foo",
"42":"bar",
"13":"baz",

What's happening here? We are using groups. A group is delimited by parentheses. So we have (group1)(group2)(group3). The cool thing about groups is to use them later on. In Sublime, $n is used where n is the group index starting with 1. Notice that we did not include the , and " inside the groups. Inside each group, I am using \d which matches a single digit and \w matching a word character like a-z, A-Z, 0-9 and _, but no - e.g. + matches one ore more characters of the kind.

Convert CSV to JSON

Let's assume we have the following CSV:

1,35,"Bob"
2,42,"Eric"
3,27,"Jimi"
Search Replace by
(\d+),(\d+),"(\w+)" {"id":$1,"age":$2,"name":"$3"},

Result:

{"id":1,"age":35,"name":"Bob"},
{"id":2,"age":42,"name":"Eric"},
{"id":3,"age":27,"name":"Jimi"},

Again, we're using groups and digit or word matchers.

The transformed result could easily turned into valid JSON by adding a wrapper object and arrays as well as removing the last comma. But the heavy lifting is done by RegEx.

Create Test Data

Sometimes I need test data, a lot.

What I usually do, is to create a sequence of numbers using...Excel. Yep, Excel. Excel is pretty smart when it comes to sequences. E.g. you can enter something like:

#
10
20

Then select both an drag on the right bottom corner to fill the cells below. Excel is able to determine that the next number is 30. So based on that that, copy the rows in to Sublime:

10
20
30
40

Then I apply the same strategy as before:

Search Replace by
(\d+) {"id":$1,"username":"user$1"},
{"id":10,"username":"user10"},
{"id":20,"username":"user20"},
{"id":30,"username":"user30"},
{"id":40,"username":"user40"},

Learning

RegEx101

There is RegEx101 where you can test if RegEx matches. Modern editors like Sublime and IntelliJ will dynamically highlight matches in your current window. However, this page is also great to find errors and to learn what actually matches and why by using hover and the explanation section.

RegEx Golf

Then, you can use RegEx Golf as a fun way to learn RegEx.

And of course, here on dev.to

Summary

As you can see there are plenty of use cases for RegEx to help you with small and larger tasks that would manually take hours, especially with large data sets.

Posted on by:

stealthmusic profile

Jan Wedel

@stealthmusic

Senior Software Developer + Group Lead + Father + Musician + Loves Technology

Discussion

pic
Editor guide
 

Great article, good topic. If you’re an expert, there’s no reason not to add regexes to your bag of tricks. The key is to understand not only what happens logically, but also the runtime consequences. For example, take PCRE2, an ubiquitously available library. In this flavor of extended regex, you can use greedy matching (i.e., \d++). Used right, along with other constructs, you can judiciously avoid backtracking by the regex state machine and make your regexes fast and lean. I would advise not to be afraid of them, but like swords, to respect them and understand how to work with them. So it is often with powerful things. :)

 

Thanks 🙏
Since I would not consider myself as an expert, I would not do it :)
I would still vote against if there is any more readable alternative. Strive for readability/maintainability and only optimize for speed if it’s necessary.

 

Of course! Makes sense.

 

As an alternative to RegEx Golf, I've found Regex Crossword to be pretty fun!

 
 

Thanks, I will have a look!
Looks like you’ve started a markdown link but missed the url... ;)

 
 

I really have to learn more regex. I use the online tools to figure what I need, but I really need to learn more on it, so it's more ingrained. Especially on search and replace in editors.
Thanks for the article.

 

I love regular expressions! I was able to circumvent using two whole different APIs by employing some very clever regex string manipulation in one of my projects. The speed improvement is unparalleled.

 

It depends on the circumstances and requirements but I’d still reply with:
dev.to/stealthmusic/comment/cnm2

 

I'd say that regex should be used if they can make a significant difference and you're aware of the scope of the problem being solved by it. That's where Cloudflare went wrong, I'd say. I use it for url formatting so even if it goes wrong, all I get is a 404 hopefully :P

 

Thanks for your advise. I absolutely share your views, so I have to ask if you actually read the first section about „not to use it in production“? ;)
I even use the results of such an operation only for testing purposes.

 

BTW, I just added a disclaimer, just in case what I wrote here could be misunderstood. I don’t mean something like „using regex is dangerous but I will show you how to do it right“. That’s absolutely not what I intended.

 

I read this, and fatefully was given the task of taking taking two excel columns of 6,000 zip codes and turning them in to arrays. Made incredibly quick work of that, so thanks!

 

Haha, I‘m glad my article could help! 😊

 

Thanks @stealthmusic . It was very helpful. I tried it, and it is amazing !

 

Glad it helps. There are certainly more things to explore and learn. :)

 

Nice article. Another alternative, an online visual regex tester: extendsclass.com/regex-tester.html