DEV Community

Cover image for Regular Expressions And Template Literals
bob.ts
bob.ts

Posted on • Updated on

Regular Expressions And Template Literals

Setup

Somewhere along the line, I heard a comment about template literals being a great tool for making regular expressions a bit easier to read. I started this article with the idea that I wanted to see if that was true and come up with some examples of this type of use.

Given the glimmer of an idea, I started a new project. This is an exercise ... plain and simple. This pattern "could" be used in a production environment, but I am in now way recommending that.

There are probably some vetted tools out there that can do this for the front-end. Please list some of these in the comments, if you know of them; if only for the sake of my readers.

Previous Work With Regular Expressions

Having worked on a project for a client where I had to recreate a script parser and engine for a 30-year old, mainframe driven client language, I had a lot of respect for Regular Expressions. I learned a lot (translate that into ... a lot of poor code was written and refactored). After two major refactors, I had a working set of code ... and HUNDREDS of Regular Expressions to make things work.

I used every trick I knew to make the Parser Regular Expression Service more readable. I abstracted and combined together all sorts of interesting patterns, knowing that someday this code would be managed by someone else.

Having struggled with this, using Template Literals this way sounded very efficient and clean. Certainly, something that deserved some research.

What I Want To Do ...

First, I found a regular expression; something like this. I want to take this ...

Matches text avoiding additional spaces

// ^[\s]*(.*?)[\s]*$
Enter fullscreen mode Exit fullscreen mode

And, generate it from something more legible, like this ...

const code0001 = `
  /* Matches text avoiding additional spaces
  */
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace
  (.*?)   // Any characters, zero to unlimited,
          //   lazy (as few times as possible, expanding as needed)
  [\\s]*  // Zero or more whitespace
  $       // End of line
`;
Enter fullscreen mode Exit fullscreen mode

NOTE here that the \s still needs to be escaped ... seems odd, but there it is.

Beginning

First, I needed to get rid of comments ...

// Borrowed Function (stripComment uses the regex
// ... https://stackoverflow.com/a/47312708)
function stripComments(stringLiteral) {
  return stringLiteral
    .replace(/\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm, '');
}
Enter fullscreen mode Exit fullscreen mode

The code above took the code and essentially translated it into ...

"

  ^    
  [\s]*
  (.*?)
  [\s]*
  $    
"
Enter fullscreen mode Exit fullscreen mode

Basically, now I need to get rid of line breaks, new lines, and spaces (yes, I know there can be a space in a regex pattern, but I'm choosing to ignore that for simplicity sake in this exercise). To remove unneeded characters ...

// Starting Demo Code Here
function createRegex(stringLiteral) {
  return stripComments(stringLiteral)
    .replace(/(\r\n|r\|\n|\s)/gm, '');
}
Enter fullscreen mode Exit fullscreen mode

Which then gives me the ability to do this ...

const code0001regex = new RegExp(createRegex(code0001));

//          ORIGINAL FROM ABOVE: /^[\s]*(.*?)[\s]*$/
// GENERATED code001regex value: /^[\s]*(.*?)[\s]*$/

Enter fullscreen mode Exit fullscreen mode

Let's Take A Look ...

The code0001 I defined above has been reworked for legibility (now much easier to hone in on what this regex pattern is going to do) ...

// /^[\s]*(.*?)[\s]*$/
const code0001 = `
  ^       // Beginning of line
  [\\s]*  // Zero or more whitespace

  (.*?)   // Any characters, zero to unlimited,
          //  lazy (as few times as possible, expanding as needed)

  [\\s]*  // Zero or more whitespace
  $       // End of line
`;
Enter fullscreen mode Exit fullscreen mode

code0002
Matches any valid HTML tag and the corresponding closing tag ... here, I've tried to show a bit more advanced indenting (both in the code and in the supporting comments).

// <([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
const code0002 = `
  <               // Literal
  ([a-z]+)        // Group: First Tag (one or more)
  (               // Group
    [^<]+           // Match (one or more) NOT <
  )*              // Group-END: Zero or more times
  (?:             // Group-NON-CAPTURE
    >               // Literal
    (.*)<\\/\\1>    // Up to and including SLASH and First Tag group above
    |\\s+\\/>       // OR spaces and close tag
  )               // Group-END
`;
Enter fullscreen mode Exit fullscreen mode

code0003
Matches any valid hex color inside text.

// \B#(?:[a-fA-F0–9]{6}|[a-fA-F0–9]{3})\b
const code0003 = `
  \\B#              // Non-word boundary, Literal #
  (?:               // Group-NON-CAPTURE
    [a-fA-F0–9]{6}    // 1st alternative
    |[a-fA-F0–9]{3}   // 2nd alternative
  )                 // Group-END
  \\b               // Word boundary
`;
Enter fullscreen mode Exit fullscreen mode

code0004
Matches any valid email inside text.

// \b[\w.!#$%&’*+\/=?^`{|}~-]+@[\w-]+(?:\.[\w-]+)*\b
const code0004 = `
  \\b                           // Word boundary
  [\\w.!#$%&’*+\\/=?^\`{|}~-]+  // Character in this list (and word), one to unlimited
  @                             // Literal
  [\\w-]+                       // One to unlimited word and character "-"
  (?:                           // Group-NON-CAPTURE
    \\.[\\w-]+                    // Literal ".", one to unlimited word and character "-"
  )*                            // Group-END (zero or more)
  \\b                           // Word boundary
`;
Enter fullscreen mode Exit fullscreen mode

code0005
Strong password: Minimum length of 6, at least one uppercase letter, at least one lowercase letter, at least one number, at least one special character.

// (?=^.{6,}$)((?=.*\w)(?=.*[A-Z])(?=.*[a-z])
// ... (?=.*[0-9])(?=.*[|!"$%&\/\(\)\?\^\'\\\+\-\*]))^.*
const code0005 = `
  (?=           // Group-POSITIVE-LOOKAHEAD
    ^             // BOL
    .{6,}         // Any six characters except line terminators
    $             // EOL
  )             // Group-POSITIVE-LOOKAHEAD-END
  (             // Group
    (?=.*\\w)     // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Word

    (?=.*[A-Z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (A-Z)

    (?=.*[a-z])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (a-z)

    (?=.*[0-9])   // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character (0-9)

    (?=.*[|!"$%&\\/\\(\\)\\?\\^\\'\\\\\\+\\-\\*])
                  // Group-POSITIVE-LOOKAHEAD
                  // Any Characters, zero to unlimited
                  // Any Character in the list
  )             // Group-END
  ^             // BOL
  .*            // Match Any Characters, zero to unlimited
`;
Enter fullscreen mode Exit fullscreen mode

code0006
SSN — Social Security Number (simple)

// ^((?<area>[\d]{3})[-][\d]{2}[-][\d]{4})$
const code0006 = `
  ^                   // BOL
  (                   // Group
    (?<area>            // Group-NAMED area
      [\\d]{3}            // 3-Digits
    )                   // Group-NAMED-END
    [-]                 // Literal, Dash
    [\\d]{2}            //  2-Digits
    [-]                 // Literal, Dash
    [\\d]{4}            // 4-Digits
  )                   // Group-END
  $                   // EOL
`;
Enter fullscreen mode Exit fullscreen mode

Conclusions

This whole article is a different take on generating Regular Expressions using some of JavaScript's template literals. This was an experiment. A successful one I believe.

This exercise also points out that writing tests against the regex can become much easier as the pattern becomes more understandable.

The regex generated here is much easier to read and reason about, which was the goal. This is a pattern I could get behind if there was a need for a number of regex templates within a project.

Discussion (6)

Collapse
xowap profile image
Rémy 🤖

That's an interesting take, although I wonder how you could make comments more constructive. Right now you're simply describing things out loud, maybe there is a smarter story to tell?

Also I would be interested to get your opinion on this other take that I'm currently working on. Could also lead to something going in your direction.

GitHub logo Xowap / nsre

Non-String Regular Expressions

Non-String Regular Expressions

Build Status

Regular expressions are used to match strings of characters, however the concept can be applied to anything else. This engine allows you to match any list of any type of objects using the same kind of constructs that regular expressions allow.

The algorithm is (far away but) based on this article by Russ Cox, aka uses the Thomson NFA algorithm (because it's apparently more efficient but mostly because it's the first explanation of a RE engine that I understood).

However the package doesn't support (yet?) the regular expression syntax that everybody is used to (because it allows to do different things).

Note — The current implementation is a pile of crap because I have no idea what I'm doing

Installation

pip install nsre

Then from your project you can

from nsre import *

Concept demo

By example, suppose that you have a list of dictionaries with a…

Collapse
rfornal profile image
bob.ts Author

Thanks for the comments.

I'll take a second look at my comments in the code. This was simply an experiment within JavaScript ... I'm not sure I'd be much help with your project. In fact, I'm pretty sure your project is out of my league.

Collapse
xowap profile image
Rémy 🤖

I think you are undervaluating what you did here, regex documentation and comprehension is a major issue in everyday developer life. That's a great insight you had and I think you can push this much further :)

Collapse
srobfr profile image
srobfr

Nice trick.
PHP has a dedicated modifier for this kind of regex usage. See php.net/manual/en/reference.pcre.p...

Just a remark though : trying to parse HTML tags using regex is a bad idea (mandatory reading : stackoverflow.com/questions/173234...).

Collapse
rfornal profile image
bob.ts Author

I know the idea is bad; as I said, this is a code example (simply research) and I needed something to work with).

Thanks for the comments!

Collapse
vitalcog profile image
Chad Windham

WOW this was really cool! Thanks so much for sharing, I'd never heard/thought of using template literals for regex but getting to see it laid out like that makes a lot of sense!