DEV Community

Cover image for Yet Another PostgreSQL-REGEX Article ๐Ÿคบ
Kaustubh Joshi
Kaustubh Joshi

Posted on

Yet Another PostgreSQL-REGEX Article ๐Ÿคบ

๐Ÿ˜„ Introduction

World famous Oxford English Dictionary defines the word Regular as,

Characterized by evenness, order, or harmony in physical form, structure, or organization; arranged in or constituting a constant or definite pattern.

But when it comes to 'Regular' Expressions, they are the most uneven, disordered, and dissonance structures in coding I have ever seen.
But again, When it comes to data cleaning, extracting data from a huge string or working on simple text parsing, nothing else can compete with the power of Regular Expressions.


๐Ÿ› Relevant(?) XKCD

Image description

PostgreSQL regular expressions are a combination of LIKE notations and POSIX regular expression notations.
This post will provide you all the information you need to understand these notations a better and a simpler way.


๐Ÿ’ปMetacharacters and Operators

Match Operators:

  • ~ - String matches regular expression ```python

Example:

select 'thomas' ~ 't.*ma';
Enter fullscreen mode Exit fullscreen mode

Output:

-- true


- **~*** - String matches regular expression, case-insensitively
```python


#Example:
    select 'thomas' ~* 'T.*ma';
#Output:
#-- true


Enter fullscreen mode Exit fullscreen mode
  • !~ - String does not match regular expression ```python

Example:

select 'thomas' !~ 't.*max';
Enter fullscreen mode Exit fullscreen mode

Output:

-- true


- **!~*** - String does not match regular expression, case-insensitively
```python


#Example:
    select 'thomas' !~* 'T.*ma';
#Output:
#-- false


Enter fullscreen mode Exit fullscreen mode

POSIX Regular Expressions:

  • | - Represents alternation, either of two alternatives provided in regex. ```python

Example:

select city from taxdata where state ~ 'CA|KS';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which has either CA or KS as state


- **^** - Matches begining of the line.
```python


#Example:
    select city from taxdata where city ~ '^B';
#Output:
#-- List of cities which name starts with 'B'


Enter fullscreen mode Exit fullscreen mode
  • $ - Matches end of the line. ```python

Example:

select city from taxdata where city ~ 'S$';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which name ends with S


- **. (dot)** - Matches any charcter.
```python


#Example:
    select city from taxdata where city ~ '.NN';
#Output:
#-- List of cities which has character 'NN' anywhere in the name


Enter fullscreen mode Exit fullscreen mode
  • * - Denotes repetition of the character zero or more times. ```python

Example:

select city from taxdata where city ~ '.*N';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which has zero or more repetition of character 'N'


>  **\*?** - Denotes repetition of the character **zero or more times** in **non-greedy way.**

- **\+** - Denotes repetition of the character **one or more times**.
```python


#Example:
    select city from taxdata where city ~ '.+N';
#Output:
#-- List of cities which has one or more repetition of character 'N'


Enter fullscreen mode Exit fullscreen mode

+? - Denotes repetition of the character one or more times in non-greedy way.

  • ? - Denotes repetition of the character zero or one time. ```python

Example:

select city from taxdata where city ~ '.?N';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which has zero or one repetition of character 'N'


---

**_<u>Working with Item Set:</u>_**

- **{m}** -  Denotes repetition of the previous item exactly m times.
```python


#Example:
    select city from taxdata where city ~ 'N{2}';
#Output:
#-- List of cities which has 2 repetition of character 'N'


Enter fullscreen mode Exit fullscreen mode
  • {m,} - Denotes repetition of the previous item m or more times. ```python

Example:

select city from taxdata where city ~ 'N{2,}';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which has 2 or more repetition of character 'N'


- **{m,n}** - Denotes repetition of the previous item at least m and not more than n times.
```python


#Example:
    select city from taxdata where city ~ 'N{2,5}';
#Output:
#-- List of cities which has 2 to 5 repetition of character 'N'


Enter fullscreen mode Exit fullscreen mode
  • () - Parentheses () can be used to group items into a single logical item. ```python

Example:

select distinct city from taxdata where city similar to '%(Long|New)%';
Enter fullscreen mode Exit fullscreen mode

Output:

-- List of cities which has 'Long' or 'New' in there name


- **(** - Indicates string extraction is to start
- **)** - Indicates string extraction is to end
```python


#Example:
    select substring(email from '.+@(.*)$') from taxdata;
#Output:
#-- Returns domain name from email


Enter fullscreen mode Exit fullscreen mode
  • [...] - Can be used to match characters in listed set ```python

Example:

select substring(email from '[0-9]') from taxdata;
Enter fullscreen mode Exit fullscreen mode

Output:

-- Returns emails which has any one of 0-9 digit


- **regexp_matches(input, regex)** - Returns a text array of matching substring(s) within the first match of a POSIX regular expression pattern to a string
```python


#Example:
    SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
#Output:
#-- Returns output {bar,beque}


Enter fullscreen mode Exit fullscreen mode

๐Ÿค” Greedy and Lazy

  • greedy - Greedy quantifiers first tries to repeat the token as many times as possible, and gradually gives up matches as the engine backtracks to find an overall match.
    tl;dr; Keep searching until condition is not satisfied.

  • lazy - Lazy quantifier first repeats the token as few times as required, and gradually expands the match as the engine backtracks through the regex to find an overall match.
    tl;dr; Stop searching once condition is satisfied.


๐Ÿ‘โ€๐Ÿ—จ Conclusion

This article should provide you with a birds-eye-view of different regular expressions and notations you can use in your Postgres queries.
I hope you've found this post helpful and do not hesitate to revisit whenever you are stuck finding the correct way to write your regex.๐Ÿ˜‰

NOTE: I would really encourage you to practice these operators on your own dataset or any real life examples. Remember practice is the key to become the hero who saves the day with REGEX.

HAPPY CODING!!!โค๏ธ

Image description


๐Ÿ“š References

Top comments (0)