Originally published at: Codegram's blog
Welcome back to Understanding Regular Expressions once and for all [PART 2]. The first post covered some basic regular expressions, like literal characters, the caret (
^) and the dollar sign (
$). If you have missed it, check it here: Understanding Regular Expressions once and for all [PART 1]. If you are ready, let's grab our food bowl and start with some delicious regular expressions.
Well, firstly I think we'll need a new example. And as I want to stay in the food section, I think a whole recipe might do the trick. Remember, you can copy and paste it to your editor of choice and follow along.
Scrambled Tofu (15 minutes) Easy to prepare and super delicious. A vegan scrambled eggs alternative. Give it a try! WHAT YOU NEED: 2 tbsp oil 1 small red onion (chopped) 2 garlic cloves (minced) 400 g silken tofu 1 1/2 tbsp tamari/any other soy sauce 1 tbsp nutritional yeast 1/4 tsp curcuma salt OPTIONAL: tomatoes mushrooms spinach INSTRUCTIONS: 1. Heat a (non-stick) pan. 2. Add oil and onion and stir until onions turn translucent. 3. Add garlic and stir another minute. 4. Add silken tofu - you don't need to worry about the liquid. 5. Add tamari, nutritional yeast, curcuma and salt. 6. Stir until everything is combined. 7. Cook with medium heat until some liquid vaporized. 8. Serve with bread and vegan butter. Write a comment on www.write_your_opinion.com
So, by now we should have a fair understanding of
literal characters inside regular expressions. Means, all of you should know that
/oil/ would match all characters in our recipe that contains o, i, and l (in exactly that order). But sometimes, we don't want to be so explicit but maybe we are looking for numbers, or words. Guess what, there's a special regular expression for it. Let's start with numbers. But as we've learned before, regular expressions check each character on its own. Well, 400 for example, is a number but it consists of three digits: 4, 0 and 0. That's why the first regular expression pattern we are looking for today, isn't called
\d – for digit.
I mentioned the backslash in the first part already but as a reminder: if the regular expression would look like this:
/d/ , we would match all d's in our recipe. But we are looking for digits, so we need the backslash to tell that we gonna use kinda predefined pattern to match all digits.
A bit less strict is the word character, that looks like this:
\w. Using this pattern, we'd match all alphanumeric characters. So basically all letters, no matter if capital or lowercase and all digits. Also, underscores (
_) will get matched, for example, the URL in our recipe. But no
/ or spaces. Actually, spaces are a common task to look out for. They don't belong to digits and although you could write the pattern to match spaces like this:
/ /, it's not very easy to read. Luckily, there's a special character for it: the whitespace character:
\s. Not too difficult, right? For digits, we use
\d, for words
\w and for spaces
And just when it starts making sense, I'll create a bit of chaos. But don't worry, not for long.
Sometimes, we just want to go nuts and match everything. Literally everything. We are hungry. For that, we need a very, very important special character:
.. Yep, it's a dot that will bring us the power to match the whole alphabet soup – and even the spaces in-between. Crazy, isn't it? But what's the chaos I mentioned? Well, there's no backslash. So if you want to literally want to match a dot - and only a dot, you have to escape it with a backslash. So while
. matches everything,
\. will match dots only.
I know, it was a lot to take in. So, let's review all the things we've learned. If you are in the unfortunate situation of not being able to open your editor, have a look at this short video where I apply the patterns we just learned. At first, we catch all digits. After that all words, all spaces and at the end, we use the fantastic but sometimes scary
. character to match everything:
As you have seen, when we use
\d to match all digits, we literally match all digits. Same with
\w – we match all word characters. While that is logical and correct, it also might be a bit confusing as it seems the
\w expression matches whole words. That's not the case: it matches each character on its own. One way to make it easier to understand is when we use the hat/caret character:
^. In combination with our digit expression, we'll get the following result:
Have a look at the number 400. We will only match the 4, as that's the digit at the beginning and exactly what we are looking for – for a single digit at the start of a line. You can try the other new expressions in combination with the hat as well:
/^\s/ and even
/^./. See, our expressions start to look cryptic but you are able to read them! That's amazing! We can do more. Guess what the following regular expression might match before you enter it to your editor:
\b character was the one for word boundary and we covered it in the first part.
And, did you figure the correct answer?