DEV Community

Cover image for RegEx: Decoding the Symbols
Alex Hebert
Alex Hebert

Posted on

RegEx: Decoding the Symbols

My first encounter with regular expressions was in a solution for a coding problem on Codewars, specifically Count the smiley faces!
Which asks you to count the number of smiley faces in a given array but the catch is each smiley can have 1 of 2 kinds of eyes (: or ;), may or may not have a nose represented by 1 of 2 symbols (- or ~), and 1 of 2 symbols (D or ) ) for the smile. In total there are 12 different combinations that are a valid smiley. I, being the young and naive programmer I was, hard coded all 12 of those faces and checked every element against that list. And it worked! But the top solution included some strange syntax that I had never seen before

function countSmileys(arr) {
  return arr.filter(x => /^[:;][-~]?[)D]$/.test(x)).length;
}
Enter fullscreen mode Exit fullscreen mode

What in the world is /^[:;][-~]?[)D]$/? How does this jumble of characters do anything to filter the array? After looking at it for a little while I noticed a some things that made some sense. All the symbols that can make up a face are there, grouped together by the facial feature they represent but as for how all the other symbols played into filtering out the non faces, not a clue.

After this brief run in, I saw it a few more times in other solutions but I still couldn't understand them. I did figure out that these were Regular Expressions or Regex. Researching did little to help me read them, alot of symbol and explinations that went right over my head, but I did get a better understanding of what regular expressions could do from MDN: "Regular expressions are patterns used to match character combinations in strings." So thats what the solution for Count the Smiley Faces was doing, it was searching each string for a pattern that is a valid smiley.

Now I understood that RegEx is a tool that I can use to find patterns in strings and javascript allows me to do various things with them, like testing if the pattern exists in the string or getting an array of all the times that pattern appears in a string. I still didn't know how to write regular expressions but I could identify when I might want to use them and it took a little while but that time came in the form of rewriting JSON.parse. I figured parsing a JSON string would be a great use for regular expressions since I can look for patterns like open and close brackets for arrays or curly braces for objects. While I was unsuccessful in parsing json with regular expressions, (nested arrays and objects with multiple entries put the nail in the coffin there) I did finally learn how to write regular expressions and wrote these 3 monsters to parse out object keys, /(?<!(: ?)|["\w]|(: ?{))(".*?")(?=:)(?!,|})/g, /(?<= |:)(\[.*?])|(?<=: ?)([^\s]*?}*)(?=,|( *}))/g, and /({.*?})|(\[.*?]+)|((?<= ?).*?(?=,| ))|((?<=, ?)[^\s]*)/g. And I learned that while intmidating at a glance, regular expression are actually not that hard to read and understand.

Basic Syntax

Let’s take a look at some of the basic syntax for regular expressions.

Creating a Regular Expression

In JavaScript, regular expressions are actually a complex data type like arrays and objects, as such there is 2 ways to create a new regular expression. The most common way I’ve seen is with literal syntax, encapsulating your pattern with forward slashes and adding the optional flags at the end like so: /pattern/flags. You can also use the new keyword and define a regular expression like so: new RegExp(“pattern”, “flags”). Either way you can save them as variables and have access to the .test() method of regular expressions in JavaScript.

Composing the Pattern

Writing a pattern involves a combination of simple characters and special characters.

Simple Characters

Simple characters are the literal characters you see in the string. For example /a/ will find the first lowercase a in a string whereas /cat/ will find the first time the letters c-a-t appear in the string.

NOTE : its important to keep in mind the the regular expression above is not looking for the word “cat” explicitly, just the letters c-a-t. So a word like “catastrophe” or “educate” will pass a test with that regular expression.

Special Characters

Regular expressions also use many special characters to represent multiple characters, group characters together and modify the pattern in some way.

Character Classes

Regular expressions have many character classes that can represent a group of characters with a single symbol. Each character class has a negated form that is used by capitalizing the letter. The character classes are:

  • \w => word character, includes all capitalized and lowercase letters.
  • \d => digits, incudes numbers 0-9
  • \s => white space character, includes space, tab, and any other white space Unicode character (like \n)
  • . => includes all characters except \n

Custom character classes can be made with the use of brackets. These custom character classes match one of the included characters. For example, [a4t] will accept an “a”, “4” or a “t” in that point in the pattern. Looking back at the regex in the solution for Count the Smiley Faces earlier /^[:;][-~]?[)D]$/, we can find 3 custom character classes [:;], [-~] and [)D]. Creating custom character classes to represent the eyes, nose and mouth of the smiley faces. But if you remember from the prompt, the nose may or may not be included. That is where quantifiers come into play.

Quantifiers

Quantifiers are added after a character to affect the amount of that character the regex will look for in the string to match the pattern. Like character classes you can make your own custom quantifiers but there are shorthands for some of the most common quantifiers.

  • * => 0 or more times
  • + => 1 or more times
  • ? => 0 or 1 times

Custom quantifiers use curly brace and take 3 different forms.

  • {n} => exactly n times
  • {n,m} => n to m times
  • {n,} => n or more times.

Back to the solution for Count the Smiley Faces /^[:;][-~]?[)D]$/ incudes a ? following the character group for the nose symbols to say there can be no nose or exactly 1 nose in a valid face.

Anchors

Anchors preform a little differently than other parts of the pattern as they deal with the position within the string as opposed to a character within the string. Anchors can be used to ensure the pattern occurs at a certain location in the string.

  • ^ => beginning of the string
  • $ => end of the string
  • \b => word boundary, position between a word characters and non-word character including the beginning and end of the string.
  • \B => opposite of \b

The solution we have been breaking down (/^[:;][-~]?[)D]$/) includes both the ^ and $ anchors to ensure nothing comes before or after the smiley face in the string.

Capture Groups

Regex allows you to group a part of the pattern together to apply a quantifier to the whole group. Simply placing parentheses around part of the pattern groups it together. For example if you wanted to extract my first name and my last name if its there you could make a regex that looks like this: /Alex( Hebert)?/. In this example “ Hebert” is in a capturing group so the ? quantifier apples to that whole group.

Changing Settings with Flags

The other part of regular expressions are the flags. Flags are added to the end of a regular expressions and can affect how certain special character work, how the entire pattern is read, or where in the string is looked for the pattern. Flags are powerful but ultimately optional and many regular expressions forgo flags. Some of the most common flags are:

  • g => global flag, searches the string for all matches - rather than just the first.
  • i => case insensitive - all letters are matched without regard to case
  • s => “dotall” mode, the “.” character includes all characters including “\n”

While there are other flags like y,u and m, I find that they are less common - and more complicated - than the g, i, and s flags.

Further Readings

I used a number of different resources to learn regular expressions and there is a lot that I didn’t cover here that can really power up your regex, like look-arounds and reference with capture groups. I recommend taking a look at the following resources to really dive into the word of regex.

  • RegExr.com => Allows you to visualize and break down your regular expression with a string.
  • MDN JavaScript offical docs on Regular Expressions.
  • RegExp.info A great resource to learn everything about Regex across languages.

Top comments (0)