Regex or regular expression is a text pattern. Basically we use regex for helping us search, update, or manage text. We can combine regex with grep
, sed
, or awk
command.
Basic usage :
grep -E REGEX_PATTERN FILE.txt
We'll talk about how we can create a regex pattern. Thing to note, regex is case sensitive. I will use grep
for simple examples. It will display text with matched pattern.
1. Basic matcing
-
.
is search for anything, or replace.
with any character
# matched with anything
grep -E . FILE.txt
# matched with 'linux' or 'lanux' or etc
grep -E l.nux FILE.txt
-
\d
is for digit in 0 - 9
# matched with all digits
grep -E '\d' FILE.txt
-
\D
is for non-digit
# matched with all non digit
grep -E '\D' FILE.txt
-
\w
is for word (letter
,digit
, and_
)
# matched with all word
grep -E '\w' FILE.txt
-
is for space
# matched with space
grep -E ' ' FILE.txt
-
is for space
# matched with space
grep -E ' ' FILE.txt
-
\t
is for tab
# matched with tab
grep -E '\t' FILE.txt
-
\r
is for return
# matched with return
grep -E '\r' FILE.txt
-
\n
is for new line
# matched with new line
grep -E '\n' FILE.txt
-
\s
is for whitespace
# matched with whitespace. Include space, tab, return, and new line
grep -E '\s' FILE.txt
-
\S
is for non whitespace
# matched with non whitespace
grep -E '\S' FILE.txt
2. Classes
[ ]
-> matched any character inside the square bracket.
-
[linux]
matched anything inside the square bracket
# matched with any characters inside the square bracket
grep -E '[linux]' FILE.txt
# matched with linux and xinux
grep -E '[lx]inux' FILE.txt
-
[^linux]
matched any characters except characters inside the square bracket
# matched with any characters except l, i, n, u, and x
grep -E '[^linux]' FILE.txt
-
[a-z]
matched anything in the range of characters
# matched with all lowercase
grep -E '[a-z]' FILE.txt
# matched with all uppercase
grep -E '[A-Z]' FILE.txt
# matched with all digit
grep -E '[0-9]' FILE.txt
3. Boundaries
\b
is for boundaries\B
is for non boundaries^
is for the beginning of the line
# matched with anything started with l
grep -E '^l' FILE.txt
-
$
is for the end of the line
# matched with anything ended with x
grep -E '^x' FILE.txt
4. Disjunction
-
|
is for or
# matched with linux and unix
grep -E 'linux|unix' FILE\.txt
5. Quantifier
-
*
is for zero or more repetition
# matched with lnux, linux, liinux, etc
grep -E 'li*nux' FILE.txt
-
+
is for one or more repetition
# matched with linux, liinux, etc, but doesn't matched with lnux
grep -E 'li+nux' FILE.txt
-
?
is for zero or one instances
# matched with lnux and linux
grep -E 'li?nux' FILE.txt
-
{n}
is for exactly n instances
# matched with linuxlinuxlinux
grep -E '(linux){3}' FILE.txt
# matched with linuxxx
grep -E 'linux{3}' FILE.txt
-
{n,}
is for at least n instances
# matched with linuxxx with 3 or more x
grep -E 'linux{3,}' FILE.txt
-
{m,n}
is for between m and n instances
# matched with linuxx with 2 until 4 x
grep -E 'linux{2,4}' FILE.txt
By default, quantifier are greedy
. example for word linuxlinuxlinux
- greedy
# matched with all stacko
print stackoverflow | grep -E 's.*o'
- lazy
# matched with all stackoverflo
print stackoverflow | grep -E 's.*?o'
6. Special characters
-
{ } [ ] ( ) ^ $ . | * + ? \ -
inside the square bracket must be escaped with\
# matched with period
grep -E '[\.]' FILE.txt
# matched with backslash
grep -E '[\\]' FILE.txt
7. Given two criteria
we can combine 2 or more pattern
# matched with text beginning with a digit and have linux word
grep -E '^[1-9]' FILE.txt | grep -E 'linux'
Top comments (0)