Intro
I was recently doing a code challenge for a job interview that required me to strip out all nonalphabetic characters. "Ah! I should use Regular Expressions for this!" I thought in triumph, impressed that I even knew what regular expressions were. That fleeting moment of glory faded once I decided to brush up on regular expressions and landed on the encouragingly-named Regular Expressions Cheatsheet. I had no idea how to use it!
If you, kind reader, are shaking your head in pity, this article is not for you. Go away. For the rest of us, here is a Cheatsheet for the Regular Expressions Cheatsheet, Part 1: Anchors. If people like this, I'll follow it up with editions for the rest of the categories.
"Anchors Edition"? Huh?
Ok, so the cheat sheet has eleven categories. I could barely get through the first one, which is Anchors, so I'm restricting this blog post to Anchors. The sad thing is that I could only figure out the first five Anchors of the total eight that are listed. Maybe some kind reader will illuminate me on how those other three bastards work, since my googling didn't get me there.
What are "Anchors", anyways?
Unlike other regular expression tokens, Anchors don't match actual characters. Anchors match a position before, after, or between characters. You'll see what I mean once you see an example.
To demonstrate the following regular expressions, I'm going to use the match() method, which retrieves the result of the matching a string against a regular expression.
Anatomy of a regular expression
- Forward slashes go on either end like so:
/
something/
- Add
g
for "global" at the end to find every instance, like so:/
something/g
- Add
m
to "multi line" to the beginning/end of each line, not just the beginning/end of each string, like/
something/g
or/
something/gm
Anchors
^
Start of string, or start of line in multi-line pattern
-
^
is used in/^The/
to find the following]: The lion roared - Example on regex101.com
- Example in Javascript:
let sentence = "The lion roared";
let regex = /^The/g;
let found = sentence.match(regex);
console.log(found) // [ 'The' ]
\A
Start of string
-
\A
is used in/\A/
to find the beginning string starts (where the pipe is): |The lion roared - Example on regex101.com
- Example in Javascript:
// This doesn't work in Javascript :(
$
End of string, or end of line in multi-line pattern
-
$
is used in/$/
to find the end of a string (where the pipe is): The lion roared| - Example on regex101.com
- Example in Javascript
let sentence = "The lion roared";
let regex = /$/;
let found = sentence.match(regex);
console.log(found);
// [ '', index: 15, input: 'The lion roared', groups: undefined ]
\Z
Start of string
-
\Z
is used in/\Z/
to find the where the string ends (where the pipe is): The lion roared| - Example on regex101.com
- Example in Javascript:
// This doesn't work in Javascript :(
\b
Word boundary
-
\b
is used in/\b/g
to find the areas between characters and spaces, like where the pipes are: |The| |lion| |roared| - Example on regex101.com
- Example in Javascript:
let sentence = "The lion roared";
let regex = /\b/g;
let found = sentence.match(regex);
console.log(found); // [ '', '', '', '', '', '' ]
\B
Not word boundary
-
\B
is used in/\B/g
to find the areas where\b
does not match: T|h|e l|i|o|n r|o|a|r|e|d - Example on regex101.com
- Example in Javascript:
let sentence = "The lion roared";
let regex = /\B/g;
let found = sentence.match(regex);
console.log(found); // [ '', '', '', '', '',
'', '', '', '', '' ]
Dunce Corner
\<
Start of word
\>
End of word
These are a mystery to me. I posted about this on Stack Overflow and all I got was (1) a -1 vote and (2) a comment linking to yet another Regular Expression Cheatsheet (where \<
and \>
are not shown). Super helpful...
Surprisingly, this has been my most popular post, so I have revamped it to be more helpful and explanatory and I'm gonna continue with a series! Next up is Character Classes!
Top comments (4)
Word boundary is any character that is not contained in the characters
A-Za-z
and_
.It is the same as writing
![A-Za-z_]
as part of the query.Thanks, @inhuofficial ! Can you show me an example of
\B
Not word boundary using the same format with the sentence ""The lion roared"?Yes so in "The Lion Roared" the matches would be where the pipes are:
T|h|e L|i|o|n R|o|a|r|e|d
and for the original
\b
the matches would be|The| |Lion| |Roared|
There is a difference on how this matches that I didn't explain very well, it is between to characters (so at a boundary) that it matches.
So what it gives you is the positions of where the boundaries are. (so instead of returning positions and then the letters matched / groups you will only ever get back the positions (I think, long time since I used it).
I don't think I have ever found a use for
\B
but\b
can be really useful for splitting strings into words.See: regex101.com/r/yVG7Gb/1 for
\b
and regex101.com/r/vvGd1d/1 for\B
You should play with regex101.com if trying to learn regex.
It is much easier to have an explanation of what you have entered and add words and phrases to match against in real time.
It also really helps you design complex regexes with match groups etc when you start getting more familiar with them.