tl;dr
At some level, most files and programs you interact with are strings. RegEx helps you find very specific string patterns. With a few simple regex building blocks, you can make remarkably complex search queries. Read on for a deeper explanation and a beginner’s guide on how to get started with regex.
So, you’re learning to be a hacker, huh? I bet you’ve run into some variant of this problem, then: go find something specific in a large set of data. I’m being deliberately vague, here. Information, and ways of searching it, comes in many formats. By now, you’ve probably got a handle on hashes, arrays, strings, and the like. And you’re getting comfortable with a few nifty enumerators (What’s your favorite? I like Ruby’s .select
method. More on that later.).
But you’re here because someone has asked you to find a whacky, difficult-to-grasp data element and your normal tools aren’t cutting it. You Google’d the problem. Some (generous, saintly)random on StackOverflow solved a similar problem for someone else a few years ago. They said something about ‘regex’, and then a shared a line of code with an alarming number of /’s and \’s. And you might be thinking, okay, WTF is this? RegEx doesn’t make a great first impression, but trust me, you want to sit down on the couch next to RegEx at your friend’s party. It has .~Special Powers~.
RegEx is short for Regular Expression. A Regular Expression is a special string used for pattern-matching and searching. One of the best features of regex is its portability: it has a syntax you learn one, but use in many contexts. Most languages have an enumerator that helps you search for a pattern (like Ruby’s .find), but what makes regex really powerful is the level of control it gives you over that search pattern.
A regular expression is comprised of characters. You know, like ASCII characters. Specifically, a regex is comprised of two types of characters: literal, and meta. Literal characters are ‘simple’ characters; letters, numbers, and punctuation. Metacharacters, like { } [ ] ( ) ^ $ . | * + ? \
, are ‘complex’ characters. I’m oversimplifying the point, but essentially metacharacters are the characters that have special meaning in the context of regex. Metacharacters convey richer information than literal characters.
Remember how I said regex was powerful? With just characters and metacharacters, you can search for any ASCII string, specifically, or strings with a more general search parameters.
Let’s look at a simple example. We’ve got a filename, and we’re trying to parse information from that filename. The filename is Talking Heads - Speaking in Tongues - This Must Be the Place (Naïve Melody).mp3
How can we grab the Artist, Album, Song Name, and Filetype? With Ruby, maybe we could try to .split around the “-” characters, but that won’t help us out with the Filetype, where we want to split based on a “.”. Take a look at this regexp cookbook and see if any of these ingredients might help us:
. is a wildcard. It represents *any* ASCII character.
\ let’s us search for metacharacters, such as .[{(\|$^
\d is for *any* number 0–9
\w is for *any* whitespace
[ ] let’s us match specific characters in a given slot
{ } let’s us catch repetition
* represents 0 or more of the characters after a character
+ represents 1 or more of the characters after a character
? we can use this to make character optional
( | ) let’s us search based on multiple parameters
What looks helpful? It looks like we’re dealing with hyphens, whitespace, AND .’s. Let’s chain a few of these together into one regex.
Let’s say string = Talking Heads - Speaking in Tongues - This Must Be the Place (Naïve Melody).mp3
Then string.split(/\w - |\./)
returns:
[“Talking Head”, “Speaking in Tongue”, “This Must Be the Place (Naïve Melody)”, “mp3”]
The easiest way to learn is to do. Use your preferred interactive coding environment to experiment with regex. Or, see the Further Reading section below for other resources for learning this powerful tool.
Further Reading:
RegExOne — Interactive exercises that teach the fundamentals of regex. Highly recommended if you want to go from zero to hero.
RegEx Wiki — history and reasoning behind regex.
Your preferred language’s documentation on regex! (Ruby, JS)
Top comments (0)