loading...
Cover image for VerbalExpressions - RegularExpression made easy

VerbalExpressions - RegularExpression made easy

bachnxhedspi profile image Felix ・1 min read

When you start learning a new programming language, maybe you had been learning follow those steps: variable, assignment, string, operators... One major theme you need to focus is string operations. Fox example: get first name from fullname, find and censor all mobile numbers in message,...

Along the operation we usually need to process some common procedure. One repeated procedure is finding a substring and implement some operations over the substring. Maybe you had done like something like this in the very beginning of your learning path.

int checkMatchStubPattern(char* string) {
    for(int i = 0; i < strlen(string); i++) {
        // logic for checking string pattern
        ...
    }
    return ...
}

Not a wrong way, but a time consuming. You must change the checking logic in every case. More code, more bug and of course hard for maintainance. Luckily, Regular Expression - Regex come as a hero to solve those kind of problems: find, input validation... As a confirmation for the usage of Regex, every programming language supports Regex for string operations.

Regular expression is a sequence of characters that define a search pattern. This pattern is then used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. (Source: Wikipedia)

Regex is an efficient tool to solve that problem; but it comes at a price. It is really hard to read and understand (but not hard to learn). First try to read the example below.

Check whether a string is a valid url or not

Because of complicated syntax, it is very hard to read and understand Regex. Furthermore, you seems not to work with Regex too often. ROI (return-on-investment) is too low; almost common Regex you need to use can be found on the internet (password, url, IP address,...). Are you willing to spend some weeks for learning something that you only use 4 or 5 times a year? Or just skimming over some sites for the result in around 5 minutes? That way of thinking make developers tend to google some Regex and modify to fit into their needs. Sometime it can cost some hours to a whole day for the repetition process of searching - modifying...

In every angel a demon hides...

Regex solve the string operation problem; but how about the problem of Regex? Fortunately, it can be solved with Verbal Expressions. Try to look at this example.

VerEx()
.startOfLine()
.then('http')
.maybe('s')
.then('://')
.anythingBut(' ')
.endOfLine();

I hope that you will not be frustrated after reading this example. The Verbal Expressions above is defined following this rule:

  • The URL must start with either "http" or "https".
  • The URL must then have "://".
  • The URL can have anything following "://", as long as is not a space.

The generated Regex from the above code is: /^(?:http)(?:s)?(?:\:\/\/)(?:[^ ]*)$/. A bit diffenret but the functional is the same. You can find the implementation of Verbal Expressions in several languages here.

VerbalExpressions solve the biggest probelem of Regex. It is readable and easy-to-understand regular expressions. In my opinion, the transition from Regex to VerbalExpressions is great as the movement from SQL to ORM.

Anyways, VerbalExpressions still have some drawbacks. You need to install a new library to your project, sometimes it is quite painful (e.g. you client, manager... don't think it's neccessary). In that case, you can go to VerbalRegex; write the code and it will generate the Regex for you.

Online tool for generating Regex

Try this tool by accessing verbalregex.com.

Conclusion

VerbalExpressions is not a replacement of Regex; but an easy way to write readable Regex. It can ease the pain of Regex, and actually make writing expressions fun again. But keep in mind that Regex still seems to be the best choice in some complicated cases.

Discussion

pic
Editor guide
Collapse
tterb profile image
Brett Stevenson

That seems like a pretty interesting tool!

In the past, when I've run into issues with my regex I usually seek out a regex tester like regex101, but it looks like this might be a more helpful alternative.

Collapse
jsn1nj4 profile image
JSn1nj4‍‍👨‍💻

I would probably still use regex101 at least occasionally. Regex is something that's interesting for me to learn whenever I have the opportunity.

Collapse
cyr1l profile image
Collapse
citizen428 profile image
Michael Kohl

Nice post! Though every time I read something like this, I'm a bit surprised that apparently many people find regular expressions so hard. Maybe it's because Perl was one of my earlier programming experiences, but I find them compact and easy to read in most cases. Of course you can go overboard, but that's true of pretty much any feature in a programming language. And if your regexes really get too complicated, breaking them down with freespacing mode can help a lot.

Collapse
bachnxhedspi profile image
Felix Author

I tried to learn but still have to look at the documentation after some months ;(

Thank you, free spacing mode mades it more readable. I will try it next time.

Collapse
simonhaisz profile image
simonhaisz

First of all, when I saw the title I was hoping for a reference to bobince's epic html regex parse answer - was not disappointed :)

But more importantly, I think this a great solution for 90% of regex use-cases. In my experience most of these time you have simple patterns like yours where you don't need the full power of regex but there isn't a better alternative in the language. So your end up with PRs where a single character typo will result in an error can easily end up in prod.

Collapse
tux0r profile image
tux0r

every programming language supports Regex for string operations.

COBOL disagrees.

Collapse
fc250152 profile image
Nando

... is COBOL a true programming language? disclaimer: I've used it for more than 40 years, but I think that it's the worst language in the computer world!

Collapse
tux0r profile image
tux0r

Try APL before you talk badly about COBOL! ;-)

Collapse
moopet profile image
Ben Sinclair

I don't hate the idea of these, but I do find them awkward.

When I read anythingBut(' ') I wonder, does it mean [^ ] or [^ ]+ or [^ ]*? How do I tell it that I mean any whitespace, like \s instead of a space? How is it clear that maybe(' ') and maybe(' ') are different1 and mean spaces and tabs respectively?

Back references? Discarded groups? All that shenanigans? I think I'd spend longer looking up how to do something with this sort of wrapper than I would just using regex in the first place. You can split regex over lines and add comments for them, so there shouldn't be any ambiguity. You can test them just like you test anything else.

To me, verbal expressions seem like a cut-down wrapper rather than an abstraction, and I'm not sure how they would help in anything but the simplest cases.


  1. Pretend there's a tab in that, I can't put one in markdown :) 

Collapse
malgosiastp profile image
Malgosia

I like this tool.

But I think I would rather use it as a helper to learn the regex, not the replacement. Especially at the beginning when you can start writing expression with VerbalExpressions and see the result in regex after that :)

Collapse
lobsterpants66 profile image
Chris Shepherd

Personally I have always found regex to be a write-only language and this is unlikely to change as I only use them once in a blue moon.

So this is a great post as I had not come across verbal expressions before. They look really useful and another tool I can use to make my code easier to understand. Must have a play next time i reach for a regex.

Collapse
benccwai profile image
benccwai

Thanks for posting a great tool,it brings too much convenience to my life but why do you say so :"the transition from Regex to VerbalExpressions is great as the movement from SQL to ORM."

I am thinking the regex can be applied for the SQL as well

Collapse
bachnxhedspi profile image
Felix Author

Using ORM instead of SQL is more easier; It like writing VerbalExpressions instead of Regex.

Overall, I only mentioned about the convenience :)

Collapse
hoelzro profile image
Rob Hoelz

This is really neat! One place I would really like to see this is with regular expression libraries that don't support extended mode like Perl's regular expressions do - in particular, I'd like to see a Vimscript implementation. One, because of that lack of extended mode support, and two, because Vim's pattern language is pretty different from POSIX/Perl's regular expression language. I dream of the day where I can rewrite this:

  syn match perlVarPlain "\%([@$]\|\$#\)\$*\%(\I\i*\)\=\%(\%(::\|'\)\I\i*\)*\%(::\|\i\@<=\)" nextgroup=perlVarMember,perlVarSimpleMember,perlPostDeref

...into something immediately understandable.

Collapse
jeremy profile image
Jeremy Schuurmans

Really like this post. I’m a student and I’ve talked to quite a few others who work through some Regexp exercises and then hope they won’t have to use it ever again. I could see this as a nice tool for getting people comfortable with the logic behind regular expressions.

Collapse
hritik14 profile image
Hritik Vijay

While this seems a very nice readable library, I doubt the usability. It requires the user to know beforehand what functions she can use (maybe(), anythingBut() ...) If the user already knows what she can do, it's not too hard to do the same in regex.

Collapse
bachnxhedspi profile image
Felix Author

You're right, but it's only take around some minutes for reading the API and in my case it is easier than reading Regex documentation.

And one more advantages of VerbalExpressions is making the code more readable.

Collapse
mt3o profile image
mt3o

If you don't know how to use regular expressions, then just don't use them. Ask someone to help you. Libs like this one help understand how do the regexps work, but for production use - use "regular" regular expressions. They perform better (real, effective code is faster than created with vreg), and more people know regexp than this. If have to - perhaps extract regexps to separate file and apply proper dovumentation. You can use multiline regexps and add comments, you know? And load them from separate file, with all support your IDE can provide. Check the freespacing mode, Michael Kohl mentioned.

Collapse
theoutlander profile image
Nick Karnik

I'm very comfortable with Regex and love it, but I still think this lib is brilliant! What prompted you to start this project? When was it first released?

I have a project on my list to work on where I give it a set of strings and it generates the regular expression automatically.

I think a cool feature for this library would be to be able to reuse a set of existing expressions that others write.

Good work!

Collapse
bachnxhedspi profile image
Felix Author

Oh I am not behind the project; you can find the detail of VerbalExpressions at github.com/VerbalExpressions

I only developed the online tool verbalregex.com which wrapped JSVerbalExpressions for easy writing and testing the VerbalExpressions.

Anyways, can you share some ideas about the algorithm for generating regex from a set of strings??? It sounds like the appliance of some Machine Learning algorithms.

Collapse
lalunamel profile image
Collapse
perigk profile image
Periklis Gkolias

Very nice tool, thanks for sharing

Collapse
ben profile image
Ben Halpern

Great post!

Collapse
bachnxhedspi profile image
Felix Author

Thank you :)
Hope it can solve your problems sometimes.

Collapse
muehan profile image
André

thanks for sharing!! this will help me one day.
If you want to learn real Regex with fun, try this:
regexcrossword.com/

Collapse
bachnxhedspi profile image
Felix Author

Thank you, what an interesting game :)

Collapse
jianfangbh profile image
Collapse
moopet profile image
Collapse
vdedodev profile image
Vincent Dedo

I've used verbal expressions a bit before and didn't find it to be that useful, it doesn't make regex that much easier to read and it's just another dependency on your project.

Collapse
marcelobritowd profile image
Marcelo Brito

Awesome post man, I'm excited to use that in my personal projects. I'm the office I'll try to use the verbalregex.com yet.

Collapse
bachnxhedspi profile image
Felix Author

Yeah, I think you can use the native library of Verbal Expressions at your personal projects. Try out this link
verbalexpressions.github.io/

Collapse
tamouse profile image
Collapse
slavius profile image
Slavius

Enyone who tells me Regular Expressions are easy has probably never done anything serious in it. Like matching e-mail address fully based on it's RFC. An example:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Try to guess what this simple Regex does:

<div>(?:[^<]++|<(?!div>)(?!\/div>))*+<\/div>

Hint: It matches full div tag content that has no other div inside it.

You can read about it here.

Collapse
mikeschinkel profile image
Mike Schinkel

It appears to be missing at least .whitespace(), and the online tool would be much better with auto-complete.

BTW, if anyone struggles to learn common-case RegEx they way I learned RegEx, after thinking I would never be able to learn RegEx, is to use an IDE or Editor that uses RegEx for search and then force yourself to always search by RegEx when applicable.

After doing that I one day realized that I could write working RegEx on the first try, much like how you notice you no longer have a headache.