loading...

linux : regex

zakiarsyad profile image Zaki Arrozi Arsyad Updated on ・3 min read

Regex or regular expression is a text pattern. Basically we use regex for helping us search, update, or manage text. We can combine regex with grep, sed, or awk command.

Basic usage :

grep -E REGEX_PATTERN FILE.txt

We'll talk about how we can create a regex pattern. Thing to note, regex is case sensitive. I will use grep for simple examples. It will display text with matched pattern.


1. Basic matcing

  • . is search for anything, or replace . with any character
# matched with anything
grep -E . FILE.txt

# matched with 'linux' or 'lanux' or etc
grep -E l.nux FILE.txt
  • \d is for digit in 0 - 9
# matched with all digits
grep -E '\d' FILE.txt
  • \D is for non-digit
# matched with all non digit
grep -E '\D' FILE.txt
  • \w is for word ( letter, digit, and _ )
# matched with all word
grep -E '\w' FILE.txt
  • is for space
# matched with space
grep -E ' ' FILE.txt
  • is for space
# matched with space
grep -E ' ' FILE.txt
  • \t is for tab
# matched with tab
grep -E '\t' FILE.txt
  • \r is for return
# matched with return
grep -E '\r' FILE.txt
  • \n is for new line
# matched with new line
grep -E '\n' FILE.txt
  • \s is for whitespace
# matched with whitespace. Include space, tab, return, and new line
grep -E '\s' FILE.txt
  • \S is for non whitespace
# matched with non whitespace
grep -E '\S' FILE.txt

2. Classes

[ ] -> matched any character inside the square bracket.

  • [linux] matched anything inside the square bracket
# matched with any characters inside the square bracket
grep -E '[linux]' FILE.txt

# matched with linux and xinux
grep -E '[lx]inux' FILE.txt
  • [^linux] matched any characters except characters inside the square bracket
# matched with any characters except l, i, n, u, and x
grep -E '[^linux]' FILE.txt
  • [a-z] matched anything in the range of characters
# matched with all lowercase
grep -E '[a-z]' FILE.txt

# matched with all uppercase
grep -E '[A-Z]' FILE.txt

# matched with all digit
grep -E '[0-9]' FILE.txt

3. Boundaries

  • \b is for boundaries

  • \B is for non boundaries

  • ^ is for the beginning of the line

# matched with anything started with l
grep -E '^l' FILE.txt
  • $ is for the end of the line
# matched with anything ended with x
grep -E '^x' FILE.txt

4. Disjunction

  • | is for or
# matched with linux and unix
grep -E 'linux|unix' FILE\.txt

5. Quantifier

  • * is for zero or more repetition
# matched with lnux, linux, liinux, etc
grep -E 'li*nux' FILE.txt
  • + is for one or more repetition
# matched with linux, liinux, etc, but doesn't matched with lnux
grep -E 'li+nux' FILE.txt
  • ? is for zero or one instances
# matched with lnux and linux
grep -E 'li?nux' FILE.txt
  • {n} is for exactly n instances
# matched with linuxlinuxlinux
grep -E '(linux){3}' FILE.txt

# matched with linuxxx
grep -E 'linux{3}' FILE.txt
  • {n,} is for at least n instances
# matched with linuxxx with 3 or more x
grep -E 'linux{3,}' FILE.txt
  • {m,n} is for between m and n instances
# matched with linuxx with 2 until 4 x
grep -E 'linux{2,4}' FILE.txt

By default, quantifier are greedy. example for word linuxlinuxlinux

  • greedy
# matched with all stacko
print stackoverflow | grep -E 's.*o'
  • lazy
# matched with all stackoverflo
print stackoverflow | grep -E 's.*?o'

6. Special characters

  • { } [ ] ( ) ^ $ . | * + ? \ - inside the square bracket must be escaped with \
# matched with period
grep -E '[\.]' FILE.txt

# matched with backslash
grep -E '[\\]' FILE.txt

7. Given two criteria

we can combine 2 or more pattern

# matched with text beginning with a digit and have linux word
grep -E '^[1-9]' FILE.txt | grep -E 'linux'

Posted on by:

Discussion

pic
Editor guide