DEV Community

Cover image for Ruby Regexp Part 9 - Modifiers
Sundeep
Sundeep

Posted on • Updated on • Originally published at learnbyexample.github.io

Ruby Regexp Part 9 - Modifiers

Modifiers

Just like options change the default behavior of commands used from a terminal, modifiers are used to change aspects of regexp. They can be applied to entire regexp or to a particular portion of regexp, and both forms can be mixed up as well. The cryptic output of Regexp.union when one of the arguments is a regexp will be explained as well in this chapter. In regular expression parlance, modifiers are also known as flags.

Modifiers already seen will again be discussed in this chapter for sake of completeness. You'll also see how to combine multiple modifiers.

i modifier

First up, the i modifier which will ignore case while matching alphabets.

>> 'Cat' =~ /cat/
=> nil
>> 'Cat' =~ /cat/i
=> 0

>> 'Cat scat CATER cAts'.scan(/cat/i)
=> ["Cat", "cat", "CAT", "cAt"]

# same as: /[a-zA-Z]+/
# can also use: /[A-Z]+/i
>> 'Sample123string42with777numbers'.scan(/[a-z]+/i)
=> ["Sample", "string", "with", "numbers"]
Enter fullscreen mode Exit fullscreen mode

m modifier

Use m modifier to allow . metacharacter to match newline character as well.

# by default, the . metacharacter doesn't match newline
>> "Hi there\nHave a Nice Day".sub(/the.*ice/, 'X')
=> "Hi there\nHave a Nice Day"

# m modifier will allow newline character to be matched as well
>> "Hi there\nHave a Nice Day".sub(/the.*ice/m, 'X')
=> "Hi X Day"

# multiple modifiers can be specified next to each other
>> "Hi there\nHave a Nice Day".sub(/the.*day/im, 'Bye')
=> "Hi Bye"
Enter fullscreen mode Exit fullscreen mode

o modifier

The o modifier restricts the #{} interpolations inside a regexp definition to be performed only once, even if it is inside a loop. As an alternate, you could simply assign a variable with the regexp definition and use that within the loop without needing the o modifier.

>> words = %w[car bike bus auto train plane]

# as 'o' modifier is used, expression inside #{} will be evaluated only once
# and not calculated again and again every iteration
>> n = 2
?> for w in words
?>     puts w if w.match?(/\A\w{#{2**n}}\z/o)
>> end
bike
auto

# here, expression result is not constant, so don't use 'o' modifier
# with 'o' modifier, there'll be no match because #{n} will be '1' always
>> n = 1
?> for w in words
?>     puts w if w.match?(/\A\w{#{n}}\z/)
?>     n += 1
>> end
bus
auto
train
Enter fullscreen mode Exit fullscreen mode

x modifier

The x modifier is another provision like the named capture groups to help add clarity to regexp definitions. This modifier allows to use literal whitespaces for aligning purposes and add comments after the # character to break down complex regexp into multiple lines with comments.

# same as: pat = /\A((?:[^,]+,){3})([^,]+)/
>> pat = /\A(                 # group-1, captures first 3 columns
              (?:[^,]+,){3}   # non-capturing group to get the 3 columns
            )
            ([^,]+)           # group-2, captures 4th column
         /x

>> '1,2,3,4,5,6,7'.sub(pat, '\1(\2)')
=> "1,2,3,(4),5,6,7"
Enter fullscreen mode Exit fullscreen mode

As whitespace and # characters get special meaning when using the x modifier, they have to be escaped or represented by backslash escape sequences to match them literally. See ruby-doc: Free-Spacing Mode and Comments for more details.

>> 'cat and dog'.match?(/t a/x)
=> false
>> 'cat and dog'.match?(/t\ a/x)
=> true
>> 'cat and dog'.match?(/t\x20a/x)
=> true

>> 'foo a#b 123'[/a#b/x]
=> "a"
>> 'foo a#b 123'[/a\#b/x]
=> "a#b"
Enter fullscreen mode Exit fullscreen mode

Inline comments

Comments can also be added using (?#comment) grouping independent of x modifier.

>> pat = /\A((?:[^,]+,){3})(?#3-cols)([^,]+)(?#4th-col)/

>> '1,2,3,4,5,6,7'.sub(pat, '\1(\2)')
=> "1,2,3,(4),5,6,7"
Enter fullscreen mode Exit fullscreen mode

Inline modifiers

To apply modifiers to specific portions of regexp, specify them inside a special grouping syntax. This will override the modifiers applied to entire regexp definitions, if any. The syntax variations are:

  • (?modifiers:pat) will apply modifiers only for this regexp portion
  • (?-modifiers:pat) will negate modifiers only for this regexp portion
  • (?modifiers-modifiers:pat) will apply and negate particular modifiers only for this regexp portion
  • (?modifiers) when :pat is not used within the grouping, modifiers (including negation) will be applied from this point onwards

In these ways, modifiers can be specified precisely only where it is needed. And as can be observed from below examples, these do not act like a capture group.

# case-insensitive only for 'cat' portion
>> 'Cat scatter CATER cAts'.scan(/(?i:cat)[a-z]*\b/)
=> ["Cat", "catter", "cAts"]
# same thing by overriding overall modifier
>> 'Cat scatter CATER cAts'.scan(/cat(?-i)[a-z]*\b/i)
=> ["Cat", "catter", "cAts"]

# case-sensitive only for 'Cat'
>> 'Cat SCatTeR CATER cAts'.scan(/(?-i:Cat)[a-z]*\b/i)
=> ["Cat", "CatTeR"]
# same thing without overall modifier
>> 'Cat SCatTeR CATER cAts'.scan(/Cat(?i)[a-z]*\b/)
=> ["Cat", "CatTeR"]
Enter fullscreen mode Exit fullscreen mode

So, now you should be able to decode the output of Regexp.union when one of the arguments is regexp.

>> Regexp.union(/^cat/i, '123')
=> /(?i-mx:^cat)|123/

>> Regexp.union(/cat/, 'a^b', /the.*ice/im)
=> /(?-mix:cat)|a\^b|(?mi-x:the.*ice)/
Enter fullscreen mode Exit fullscreen mode

Exercises

For practice problems, visit Exercises.md file from this book's repository on GitHub.

Top comments (0)