Zaki Arrozi Arsyad

Posted on May 27, 2020 • Edited on Jun 21, 2020

linux : regex

#linux #regex #devops #cli

Regex or regular expression is a text pattern. Basically we use regex for helping us search, update, or manage text. We can combine regex with grep, sed, or awk command.

Basic usage :

grep -E REGEX_PATTERN FILE.txt

We'll talk about how we can create a regex pattern. Thing to note, regex is case sensitive. I will use grep for simple examples. It will display text with matched pattern.

1. Basic matcing

. is search for anything, or replace . with any character

# matched with anything
grep -E . FILE.txt

# matched with 'linux' or 'lanux' or etc
grep -E l.nux FILE.txt

\d is for digit in 0 - 9

# matched with all digits
grep -E '\d' FILE.txt

\D is for non-digit

# matched with all non digit
grep -E '\D' FILE.txt

\w is for word ( letter, digit, and _ )

# matched with all word
grep -E '\w' FILE.txt

is for space

# matched with space
grep -E ' ' FILE.txt

is for space

# matched with space
grep -E ' ' FILE.txt

\t is for tab

# matched with tab
grep -E '\t' FILE.txt

\r is for return

# matched with return
grep -E '\r' FILE.txt

\n is for new line

# matched with new line
grep -E '\n' FILE.txt

\s is for whitespace

# matched with whitespace. Include space, tab, return, and new line
grep -E '\s' FILE.txt

\S is for non whitespace

# matched with non whitespace
grep -E '\S' FILE.txt

2. Classes

[ ] -> matched any character inside the square bracket.

[linux] matched anything inside the square bracket

# matched with any characters inside the square bracket
grep -E '[linux]' FILE.txt

# matched with linux and xinux
grep -E '[lx]inux' FILE.txt

[^linux] matched any characters except characters inside the square bracket

# matched with any characters except l, i, n, u, and x
grep -E '[^linux]' FILE.txt

[a-z] matched anything in the range of characters

# matched with all lowercase
grep -E '[a-z]' FILE.txt

# matched with all uppercase
grep -E '[A-Z]' FILE.txt

# matched with all digit
grep -E '[0-9]' FILE.txt

3. Boundaries

\b is for boundaries
\B is for non boundaries
^ is for the beginning of the line

# matched with anything started with l
grep -E '^l' FILE.txt

$ is for the end of the line

# matched with anything ended with x
grep -E '^x' FILE.txt

4. Disjunction

| is for or

# matched with linux and unix
grep -E 'linux|unix' FILE\.txt

5. Quantifier

* is for zero or more repetition

# matched with lnux, linux, liinux, etc
grep -E 'li*nux' FILE.txt

+ is for one or more repetition

# matched with linux, liinux, etc, but doesn't matched with lnux
grep -E 'li+nux' FILE.txt

? is for zero or one instances

# matched with lnux and linux
grep -E 'li?nux' FILE.txt

{n} is for exactly n instances

# matched with linuxlinuxlinux
grep -E '(linux){3}' FILE.txt

# matched with linuxxx
grep -E 'linux{3}' FILE.txt

{n,} is for at least n instances

# matched with linuxxx with 3 or more x
grep -E 'linux{3,}' FILE.txt

{m,n} is for between m and n instances

# matched with linuxx with 2 until 4 x
grep -E 'linux{2,4}' FILE.txt

By default, quantifier are greedy. example for word linuxlinuxlinux

greedy

# matched with all stacko
print stackoverflow | grep -E 's.*o'

lazy

# matched with all stackoverflo
print stackoverflow | grep -E 's.*?o'

6. Special characters

{ } [ ] ( ) ^ $ . | * + ? \ - inside the square bracket must be escaped with \

# matched with period
grep -E '[\.]' FILE.txt

# matched with backslash
grep -E '[\\]' FILE.txt

7. Given two criteria

we can combine 2 or more pattern

# matched with text beginning with a digit and have linux word
grep -E '^[1-9]' FILE.txt | grep -E 'linux'

DEV Community

linux : regex

1. Basic matcing

2. Classes

3. Boundaries

4. Disjunction

5. Quantifier

6. Special characters

7. Given two criteria

Top comments (0)

Read next

How does WebAssembly enhance web application performance?

A Lab Manual to Devops

Day 13: Docker Multistage Builds

Introduction to Amazon Simple Notification Service (SNS)