loading...

Master Regular Expression Through Real-World Examples

oahehc profile image Andrew ・4 min read
The regular expression is a very useful skill, no matter if you're a front-end, back-end, native app or even data engineer. But you might have the same experience as me. Even I already learned all the syntax about regular expression. When I saw some sophisticated examples on StackOverflow, it's still hard to understand why the syntax is looking like that. In this article, I will explain three examples and hope this will help us to have a better understanding of the regular expression.

Agenda

Before we start, if you're not familiar with the basic syntax about regular expression, here is a great article you can read beforehand.

Syntax for Module Import

The first example is the regex for transferring the syntax between ES6 module (import) and CommonJS (require).

/(?:const|var)\s(.+)\s=\srequire\((.+)\)(;?)/g
  • (?:const|var) : the first group for const or var, because we don't need const and var, so we can add ?: at the beginning to non-capturing this group in our match result.
  • \s : space.
  • (.+) : the second group for the naming of the module.
  • \s=\srequire : space, equals sign, space, and require syntax.
  • \( : parentheses (escaped by backslash).
  • (.+) : the third group for the module name.
  • \) : parentheses (escaped by backslash).
  • (;?) : the fourth group for the semi-colon, this is optional so we add the question mark after.

Example: https://regexr.com/52aqv

regex-import

If you understand all the syntax above, it should be easy to create another one transfer from ESM into CJS.

/import\s(.+)\sfrom\s'(.+)'(;?)/g

In case someone might not know that we can use regex to find and replace the code in the VSCode

vscode

HTML tag

Handling HTML tags through regex is a common use case. In the example below, we will be able to select all the HTML tags and get the tag name through the regex.

/<\/?(?<tag>\w+)(:?\s|\n|.)*?>/g
  • <\/? : HTML tag should start with < and with optional /.
  • (?<tag>\w+) : the first word after angle brackets is the tag name, here we can use ?<name> to name the group. named capture groups is not supported in Firefox and IE
  • (:?\s|\n|.)*? : match space(\s), newline(\n), or any attributes(.). It ends with *? mean the lazy mode which can prevent regex match cross multiple tags.
  • > : angle brackets to close the HTML tag.

Example: https://regexr.com/52asf

html-tag

The first group($1) will be the tag name, so except removing or replacing all the HTML tags, we can also adjust the HTML tags based on the types of the tag.

const results = html.matchAll(/<\/?(?<tag>\w+)(:?\s|\n|.)*?>/g);

// loop all the match results
Array.from(results).forEach((res) => {
  const { tag } = res.groups;

  // handle base on the types of the tag
  ...
});

matchAll is not supported in IE and Safari, you can check here to find the alternative

Password

Password validation is another good example of using regex. If we want to validate the password based on the below criteria.

  • at least one digital number
  • at least one lowercase letter
  • at least one uppercase letter
  • at least one special character
  • length between 8~20
  • should not include the un-support characters
  • should not include the space

Then we can use this regex to do the trick.

/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\W|_])(?!.*[\\|\/|\s]).{8,20}$/
  • ^ : beginning
  • (?=.*\d) : (?=) is called positive lookahead. It's like the pipeline, if the target string matches the criteria, then it will pass to the next(right) part and continue to validate the string. And .*\d represents that the string must contain a digital number.
  • (?=.*[a-z]) : must contain a lowercase letter.
  • (?=.*[A-Z]) : must contain a uppercase letter.
  • (?=.*[\W|_]) : must contain a special character.
  • (?!.*[\\|\/|\s]) : (?!) is negative lookahead, similar to positive lookahead but only passes when the target string NOT matches the criteria. Here we list the characters which are not allowed to be used in the password (\, \/, and space).
  • .{8,20} : Because all the above rules are lookahead, so it won't really match any result, if the string passes all the lookahead conditions, then here is the real part to do the regex match. .{8,20} represents that the string should be any character with the length between 8~20 characters.
  • $ : ending

Example: https://regexr.com/52b15


Conclusion

That's all. Thanks for reading this article, I hope this article can help you become familiar with regular expression.

If you have some good examples or suggestions about the examples I provided in this article. Please feel free to leave your comment.

--

Reference

Posted on by:

oahehc profile

Andrew

@oahehc

I'm a front-end developer, love to learn new stuff and share what I learn.

Discussion

pic
Editor guide