Need to extract structured data from messy text without leaving SQL? PostgreSQL’s REGEXP_MATCH
returns the first match as a text[], mapping capturing groups to array slots.
This is the function you reach for when LIKE
is too weak and you want patterns you can version, test, and reuse.
Below: minimal syntax, copy-paste queries, and the trade-offs to watch when moving regex logic into the database.
Minimal Syntax & Behavior
regexp_match(input_string, posix_pattern [, flags ])
- Flags include
'i'
(case-insensitive), others per docs. - Returns
NULL
(no match) or atext[]
. Use[1]
to unwrap the whole match.
Copy-Paste Queries
Phone (single match):
SELECT regexp_match('Office: 212-555-0199 ext.7', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}')[1] AS phone;
-- '212-555-0199'
Email:
SELECT regexp_match('email: ops@company.dev', '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}')[1] AS email;
-- 'ops@company.dev'
URL (case-insensitive):
SELECT regexp_match('Docs → HTTPS://dev.to', 'http[s]?:\\/\\/[A-Za-z0-9.-]+\\.[A-Za-z]{2,}', 'i')[1] AS url;
-- 'HTTPS://dev.to'
All matches?
SELECT regexp_matches('Call 987-654-3210 or 555-872-2310',
'[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}', 'g') AS phones;
-- returns a set of text[] rows
Gotchas & Performance Notes
-
'g'
is not supported byREGEXP_MATCH
; switch toregexp_matches
. - Prefer
regexp_substr
(PG 15+) when you just need the matched string. - Benchmark heavy patterns; regex scans don’t typically use B-Tree indexes.
- Normalize common patterns into views or functions for reuse.
FAQ
What’s the difference vs REGEXP_LIKE
?
REGEXP_LIKE
is boolean; it doesn’t extract. REGEXP_MATCH
returns the matched substring(s) as an array.
Can I get named groups?
PostgreSQL POSIX regex doesn’t expose named captures in REGEXP_MATCH
. Use capture order, or switch to regexp_replace
with backrefs for reshaping.
Is this portable to other DBs?
Not 1:1. Behavior/flags differ across engines; verify before porting.
When should I avoid regex?
When a simple operator or parsed datatype works better (e.g., inet
, uuid
, JSON operators). Regex is flexible but can be slower and harder to index.
Conclusion
For developers, REGEXP_MATCH
is a lightweight alternative to app-side parsing, moving regex work closer to the data. It simplifies ETL, validation, and text-heavy queries, while REGEXP_MATCHES
and REGEXP_SUBSTR
cover multi-match and direct string extraction. With thoughtful use, it reduces boilerplate and improves query pipelines. Full guide here: REGEXP_MATCH SQL Function: The Complete PostgreSQL Guide.
Top comments (0)