DEV Community

Cover image for 🚀 Human-Regex: Write Readable Regular Expressions Like English

🚀 Human-Regex: Write Readable Regular Expressions Like English

Ridwan Ajibola on February 03, 2025

Created by Ridwan Ajibola Sick of trying to understand those confusing regex patterns? Let's change that. // Before: Cryptic regex f...
Collapse
 
code42cate profile image
Jonas Scholz •

I never really understood whats so hard about Regex? I think if you just learn the grammar once you dont really forget it anymore and its perfectly understandable. Nice idea anyway:)

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks Jonas

Collapse
 
pengeszikra profile image
Peter Vivo •

Good work, but I missing the captialLetter() function because now in this password example can be passed without using capitalLetter.

If you try to make a harder regexp example, sure to found a few more missing function.

Check this code: github.com/Pengeszikra/flogon-gala...
it is a markdown parser part of this game: dev.to/pengeszikra/javascript-grea... , try to recreate those regexp with your module and you will be found a missings.

Collapse
 
rajibola profile image
Ridwan Ajibola •

Good! I will make sure to include the missing methods in the new release.

Thanks

Collapse
 
manuchehr profile image
Manuchehr •

Skill issues to be honest

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks

Collapse
 
rajibola profile image
Ridwan Ajibola •

I’d appreciate it if you could star the repo.

Collapse
 
msamgan profile image
Mohammed Samgan Khan •

this is cool man, like really cool...

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks a lot @msamgan! Your contributions are really appreciated. If you find this project useful, it would mean a lot if you could star the repo, it helps others discover it too!

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks a lot @msamgan, I’d appreciate it if you could star the repo.

Collapse
 
best_codes profile image
Best Codes •

I starred the repo!
Very cool idea. How does it do as far as performance compared to a plain Regex?

Collapse
 
rajibola profile image
Ridwan Ajibola • • Edited

Thanks a lot @best_codes

Collapse
 
rajibola profile image
Ridwan Ajibola • • Edited

Benchmark results seem inconsistent, so I'll use performance.now() for more accurate testing.

Suite1: Human Regex x 0.15 ops/sec ±0.97% (3874 runs sampled)
Suite1: Native RegExp x 0.07 ops/sec ±0.47% (3596 runs sampled)
Suite1 Fastest is Human Regex
Suite2: Native RegExp x 0.05 ops/sec ±0.31% (3346 runs sampled)
Suite2: Human Regex x 0.04 ops/sec ±0.24% (3187 runs sampled)
Suite2 Fastest is Native RegExp

Thread Thread
 
best_codes profile image
Best Codes •

In my tests, human regex is several times slower, but the performance is negligible except in large-scale or when parsing regexes quite frequently.

Email regex benchmark

Collapse
 
samuelmunoz profile image
Samuel Munoz •

Ported this to C# (It's .NET 8+ exclusive for now). Check it out:

Repo: github.com/SamuelMunoz/human-regex...
Nuget Package: nuget.org/packages/HumanRegexBuilder/

Collapse
 
rajibola profile image
Ridwan Ajibola • • Edited

Good job! I’ve started the repo

Collapse
 
nabous profile image
Mohamed Nabous • • Edited

Finally i won't get the excuse by my peers that regex is too hard!!

Great work!!!

Collapse
 
rajibola profile image
Ridwan Ajibola •

I’d really appreciate it if you could star the repo—it would be a huge help!

Thanks!

Collapse
 
himanshu_code profile image
Himanshu Sorathiya •

Man, this is just amazing, gonna start using this, big applause

Collapse
 
rajibola profile image
Ridwan Ajibola •

Really appreciate it. I’d love it if you could star the repo; it helps a lot!

Collapse
 
bbkr profile image
Paweł bbkr Pabian • • Edited

I see a bug / inconsistency:

If .digit() is matching \d then it converts to Decimal_Number Unicode property. For example (I'm not familiar with JS so I'll use Raku):

$ raku -e 'say "1๖" ~~ /\d+/;'   # DIGIT ONE and THAI DIGIT SIX codepoints
「1๖」

$ raku -e 'say "1๖" ~~ /<:Decimal_Number>+/;' # same character class
「1๖」

$ raku -e 'say "1๖" ~~ /<:digit>+/;' # also the same
「1๖」

Enter fullscreen mode Exit fullscreen mode

Then .letter() should convert consequently to Letter Unicode property, not a-z. For example:

$ raku -e 'say "aت" ~~ /<:Letter>+/';    #  LATIN SMALL LETTER A and ARABIC LETTER TEH
「aت」
Enter fullscreen mode Exit fullscreen mode

You should not make implicit ASCII / non-ASCII assumptions, where one method works differently than the other sibling.

Another bug you have is anchoring:

$ perl -E 'say "match" if "a\n" =~ /a$/' # oops!
match
Enter fullscreen mode Exit fullscreen mode

Token $ means end of logical string. What you are probably looking for is \z:

$ perl -E 'say "match" if "a" =~ /a\z/'
match

$ perl -E 'say "match" if "a\n" =~ /a\z/'
$  # no match, most likely expected result
Enter fullscreen mode Exit fullscreen mode

I don't want to discourage you, but I really dislike those "regex to human" modules. They make code crazy error-prone, because - as I just shown - you don't see explicitly what you are matching. Things get worse when you are working on multi language stack and you want to exchange your PCRE regexps with someone using other language. Basically all "Why This Matters" points are just the opposite - new developers will not understand regexes more, there will be more archeology because you will need to decipher additional layer of abstraction, and collaboration will be more difficult.

My advice would be to stick directly (or at least closely) to Unicode properties. Drop ambiguous method letter() and add Uppercase_Letter() mapping directly to Lu property. And build modifiers on top of that like Uppercase_Letter('ascii')orUppercase_Letter('script'=>'Latin')`. Otherwise this will be false friend - module that is supposed to make your life easier but it introduces weird errors and security risks because it hides too much assumptions under the hood.

Collapse
 
rajibola profile image
Ridwan Ajibola •

This is explicitly for JavaScript and not for any languages

Collapse
 
bbkr profile image
Paweł bbkr Pabian • • Edited

Sure. I pointed out universal issues. Imagine developer joining some project that uses this module. If he/she already has regular expression experience this interface will be confusing, because your assumption of what "letter" or "endAnchor" are is completly different than what those things mean in terms of Unicode properties and PCRE standard.

Same goes for "tld". Your module does not match TLDs. It only matches what you consider to be TLD. Exactly 3 items out of 1589 currently known TLDs, so right out of the box it has 99.81% failure rate.

I'm not trying to be mean, I'm just saying that pseudo-standards or partially implemented specs are universally bad and sooner or later backfire in every project.

Collapse
 
adesina_abdulraheem_f0ded profile image
Adesina Abdulraheem •

Wow! This is a great discovering and an insight to the developers world!

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks

Collapse
 
rajibola profile image
Ridwan Ajibola •

I’d appreciate it if you could star the repository

Collapse
 
hafiz_abdullahi_421eb0176 profile image
hafiz abdullahi •

Great work, sir.

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks

Collapse
 
fstrube profile image
Franklin Strube •

This is great! I love the way you can chain operators. It's so fluent.

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks @fstrube! I appreciate the feedback.

Collapse
 
mkvillalobos profile image
Manrike Villalobos Báez •

Good work!! Thanks!!

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks! I appreciate the feedback.

Collapse
 
textbrew profile image
TextBrew •

Like this project!

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks @textbrew

Collapse
 
casperrubaekm profile image
Casper Rubæk •

I think it is a great innovation - I hope it will be ported to C#/.net as well :)

Collapse
 
rajibola profile image
Ridwan Ajibola •

Thanks @casperrubaekm! I’m glad you liked it.

Some comments have been hidden by the post's author - find out more