I sort of blackout when I get hit with a regex string. this seems suboptimal. This is me learning about Regex. The Rubular tool is indispensable https://rubular.com/
Regular Expressions are used to match patterns in text. Why might we want to do this?
Data Validation - is a string a valid email address or phone-number?
(?:[a-z0-9!#$%&'+/=?^_{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?\.)+a-z0-9?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-][a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
(Well, actually, email validation should be done through a confirmation link with a token.)
Web scrapping
Any text parsing needs like syntax highlighting, data wrangling, or search
Constructing Regular Expressions
They can be made several ways. With literals:
/ pattern /
or %r{pattern}
They can also be constructed using [Regex.new](http://regex.new)
with a string and Regex.union
with a list of strings or an array of strings.
As for the contents of the pattern, the following table is an excellent starting place. I couldn't make it cleaner, so here it is in it's entirety.
Using Regular Expressions
=~ is Ruby's basic pattern-matching operator.
It will return the first index of a match in the string, nil if no match is present.
#match will return a MatchData object
In the String class, there are several methods that make use of regular expressions
#gsub can be used to substitute a string for all occurrences given regular expression.
#sub substitutes the first occurrence of the supplied regex
#partition splits the string into three: the part before the match, the match, and the part after the match
#scan produces an array of the matches or passes them to a block
#split splits a string on the given pattern
What can we put in the pattern?
Characters!
Individual characters represent themselves literally
/abc/
will match 'a' followed by 'b' followed by 'c', exactly, but not something like "abdc"
/[abc]/
will match a or b or c
Here we match one of [aeiou] followed by a single t
like all meta-characters, if we want to use the brackets in our literal pattern, we have to escape them with back-slashes [.
here we match the footnote markers from wikipedia by using the special character \d, which matches any digit
In addition to literal characters, there are many special characters we can use to build our patterns.
/\w/
- A word character ([a-zA-Z0-9_]
)
/\W/
- A non-word character ([^a-zA-Z0-9_]
)
/\d/
- A digit character ([0-9]
)
/\D/
- A non-digit character ([^0-9]
)
/\h/
- A hexdigit character ([0-9a-fA-F]
)
/\H/
- A non-hexdigit character ([^0-9a-fA-F]
)
/\s/
- A whitespace character: /[ \t\r\n\f\v]/
/\S/
- A non-whitespace character: /[^ \t\r\n\f\v]/
A nice convention we see here is that by upcasing the letter, we get the complimentary set of characters.
We can also specify the number of times that a character occurs using quantifiers.
*
- Zero or more times
+
- One or more times
?
- Zero or one times (optional)
{n}
- Exactly n times
{n,}
- n or more times
{,m}
- m or less times
{n, m}
- At least n and at most m times
Here we find all double vowels.
Parentheses can be used to create capture groups, which allow us to access parts of the match later in the pattern, and also after we match, by using the match variables $1, $2, $3...
There is a lot to parentheses in regex. I'll leave it here for now.
References
https://en.wikipedia.org/wiki/Regular_expression
Top comments (0)