DEV Community

loading...
Cover image for Trying regular expressions in Ruby

Trying regular expressions in Ruby

micheladaio profile image michelada.io ・5 min read

Is understanding regular expressions difficult to you? It usually happens to me and, for such reason, I decided to read and reread information about this topic. I wrote down some notes which I think will be enough to work with them when I need it and I would like to share them with you:
We can create a Ruby RegEx with // or %r{} which are the literal constructors of it.

And both are instances of the RegEx class

Example:

The String class has also the match? method:

Example:

The number 11 was returned because the expression matched the word are and the position of its first character is 11. It can be proved:

String class has also the =~ method:

Example:

To use a special character in a ReGex, it needs to be escaped. For example:

It is also possible to get a match if the dot sign is not escaped, but it is not the match we are looking for, let's prove it:

To escape a character, the backslash character \ needs to be positioned just before the desired character, like with the previous example with the dot .

This character used on a ReGex matches everything with all characters on a string, except if there is a newline \n

If we inspect what the expression matched, we get:

This expression expects that any character exists just before the pattern ated
The dot . doesn't match a newline \n
Example:

Inside of the brackets, many characters can be listed, and any of them can be matched
Example test a string to know if it has a vowel:

And more characters can be added to the expression just after the brackets:

In this example, the character r inside the brackets, plus the characters out of them uby matches with the word ruby

Ranges can be created inside the brackets, example:

All of these abbreviations have a negative version that matches the opposite from the positive versions

For example, let's capture the strings "Lenin Godinez" and "40" from the next string:

With the pattern ([A-Za-z]+\s[A-Za-z]+) the words "Lenin Godinez" are captured and, with the pattern (\d+), the number "40" is captured
All the captures automatically are assigned to global variables. On the previous example, the two captures were stored on the global variables $1 and $2 and we can test it using puts:

The captures can be accessed the same way we get an element from an array: sending an index:

If zero is indicated as the index on the m variable, the complete match is returned:

A useful method from the MatchData object is the method captures which returns an array with the captures:

Example:

The captures can be accessed in the same way we get the value from a hash: sending a key, being the key the name of the capture
Example:

Also, there is a useful method to get the named captures: the method named_captures that returns a hash with the captures:

Example:

If we remove the modifier ? from the pattern and the letter s from "Mrs." on the string, we can see that the test fails, because it is looking for the pattern Mrs

Now, if we return the modifier ? to the pattern, this time we will get a match with "Mr." because the character s on the pattern is optional

Example:

If the modifier is removed from this expression, only the first number from 31 is matched

Example:

If we modify the string to be only "Name: Lenin", the expression continues matching since more characters after the word "Lenin" are just optional, thanks to the modifier *

Example:
If I want to match the format phone number 111-111-1111

Also, we can indicate a minimum and maximum of repetitions with a range as {1,n} The first number indicates the minimum and the second number indicates the maximum of repetitions.
Example:

On thin example, four numbers are matched, because we asked for at least one number and four as maximum
If the string to be evaluated has less characters than the maximum indicated on the pattern, then all characters are matched

If the minimum amount is not reached, then the expression returns nil

If the second number on the range is not indicated, it only will take the minimum amount and the maximum amount will be open:

Also, if the minimum amount is not reached, then the expression returns nil

Example:

The match can also be found at the beginning of a newline

If we try to look the comment up between the string, we won't get a match and a nil will be returned

Example:
If we want to match the dot that appears at the end of the line of the next string:

If we try to match the word "currently" that starts on a new line, it is not matched because this anchor does not work for that purpose.

If we try to match the end of a line, it will return nil

If the same string is evaluated with the anchor \z, then we won't get a match due to the new line indicated at the end of the string

Example:
There is a list of numbers and I want to match the numbers that ends with a dot, but I don't want to include the dot sign on the result of the match. The lookahead assertion needed is (?=.) with the dot escaped inside it:

On the other hand, there is a negative version of this lookahead assertion

Example:

The result is "123" because it was the first match that the pattern found with the indications "A series of numbers without a dot placed at the end of it"

Example:
I want to match the word "regex" only if the word "ruby" is placed just before of it in the next string:

And its negative version is:

In this example, nil is returned because the "regex" word is preceded by the word "ruby", so the match is not successful. To get a match, the string needs to be modified.

Example:

Example:

This method helps to escape characters from a string.
Example:

This method evaluates a string from left to right and returns an array with all the matches found
Example:

It is possible to make captures with the scan method: if something is matched, an array of arrays is returned with the results
Example:

If a RegEx is sent as a parameter on this method, this evaluates each element from the array and, if it matches the pattern, then it is returned on an array. All elements that pass the test are returned in the array
Example:

Only 2 elements from the array matched the expression, and it can be proved as follows:

And that is it! All you need to do is practice and practice with them to get more familiar. Feel free to reference this blog post if you have doubts.

Discussion (0)

pic
Editor guide