DEV Community

MEROLINE LIZLENT
MEROLINE LIZLENT

Posted on

Mastering Regular Expressions in Go

Go has an inbuilt regexp package which supports regular expression by the RE2 engine. This is because of this single and self-sufficient design choice, Go regexps are safe, predictable and production-ready.
RE2 is guaranteed O(n) time with respect to input length, avoiding disastrous backtracking and ReDoS vulnerabilities - even with untrusted user input.

1. Compile vs MustCompile
Before matching, a pattern must be compiled into a *regexp.Regexp. Two functions handle this:

// Compile → returns an error. Use for dynamic / user-supplied patterns.
re, err := regexp.Compile(`\d{3}-\d{4}`)
if err != nil {
    log.Fatal("invalid pattern:", err)
}

// MustCompile → panics on invalid pattern. Use for known-good package-level vars.
var phoneRe = regexp.MustCompile(`\d{3}-\d{4}`)

// Invalid pattern example
_, badErr := regexp.Compile(`[invalid`)
// error parsing regexp: missing closing ]: `[invalid`

Enter fullscreen mode Exit fullscreen mode

2. Basic Matching
Three levels of match output — boolean, first match, all matches:

var digitRe = regexp.MustCompile(`\d+`)

text := "Order #4821 placed on 2024-06-15, total: $99.99"

fmt.Println(digitRe.MatchString(text))      // true
fmt.Println(digitRe.FindString(text))       // "4821"
fmt.Println(digitRe.FindAllString(text, -1)) // [4821 2024 06 15 99 99]
fmt.Println(digitRe.FindAllString(text, 2))  // [4821 2024]  (first 2)
Enter fullscreen mode Exit fullscreen mode

3. Positional (Index) Matching
Index methods return byte offsets instead of strings — useful when you need to reconstruct or replace around the match.

loc := digitRe.FindStringIndex(text)
// loc = [7, 11]  →  text[7:11] = "4821"

allLocs := digitRe.FindAllStringIndex(text, -1)
// [[7 11] [22 26] [27 29] [30 32] [42 44] [45 47]]

Enter fullscreen mode Exit fullscreen mode
  1. Capture Groups (Numbered) Use FindStringSubmatch to get the full match plus all capture group contents. Index 0 is always the full match.
var ipRe = regexp.MustCompile(`(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})`)

m := ipRe.FindStringSubmatch("Server: 192.168.1.42")
// m[0] = "192.168.1.42"   (full match)
// m[1] = "192"             (group 1)
// m[2] = "168"             (group 2)
// m[3] = "1"               (group 3)
// m[4] = "42"              (group 4)

// FindAllStringSubmatch — all matches with their groups
allMatches := ipRe.FindAllStringSubmatch(input, -1)
Enter fullscreen mode Exit fullscreen mode

5. Named Capture Groups
Syntax: _(?P...) _ makes code far more readable and survives pattern refactoring.

var dateRe = regexp.MustCompile(
    `(?P<year>\\d{4})-(?P<month>\\d{2})-(?P<day>\\d{2})`)

func ParseDate(s string) map[string]string {
    match := dateRe.FindStringSubmatch(s)
    if match == nil { return nil }

    result := make(map[string]string)
    for i, name := range dateRe.SubexpNames() {
        if name != "" { result[name] = match[i] }
    }
    return result
}

// ParseDate("2024-11-28") → map[year:2024 month:11 day:28]

//  direct index lookup by name
yearIdx := dateRe.SubexpIndex("year")  // 1
Enter fullscreen mode Exit fullscreen mode

6. Replace Methods

// 1. ReplaceAllString — static, supports $N back-references
wsRe   := regexp.MustCompile(`\\s+`)
wsRe.ReplaceAllString("foo   bar  baz", " ")  // "foo bar baz"

swapRe := regexp.MustCompile(`(\\w+)=(\\w+)`)
swapRe.ReplaceAllString("a=1 b=2", "$2=$1")  // "1=a 2=b"

// 2. ReplaceAllStringFunc — dynamic via a function
digitRe.ReplaceAllStringFunc("item1 qty5", func(s string) string {
    n, _ := strconv.Atoi(s)
    return strconv.Itoa(n * 2)
})  // "item2 qty10"

// 3. ReplaceAllLiteralString — $ signs are NOT interpreted
litRe  := regexp.MustCompile(`foo`)
litRe.ReplaceAllLiteralString("foobar", "$1")  // "$1bar"
Enter fullscreen mode Exit fullscreen mode

7. Split

sepRe := regexp.MustCompile(`[,;\\s]+`)

sepRe.Split("one,two; three  four", -1)
// ["one" "two" "three" "four"]

sepRe.Split("a,b,c,d", 3)  // ["a" "b" "c,d"]  (n=3 → at most 3 pieces)
Enter fullscreen mode Exit fullscreen mode

8. Longest match

re.Longest() switches the engine to leftmost-longest (POSIX) semantics before the first call.

greedyRe := regexp.MustCompile(`a+`)
greedyRe.Longest()
greedyRe.FindString("aaa")  // "aaa"  (not just "a")
Enter fullscreen mode Exit fullscreen mode

9. LiteralPrefix
LiteralPrefix() returns the fixed-string prefix before the first metacharacter. The engine uses this to fast-skip non-matching positions.

urlRe              := regexp.MustCompile(`^https://`)
prefix, complete := urlRe.LiteralPrefix()
// prefix = "https://"   complete = true

partialRe          := regexp.MustCompile(`^https?://`)
p2, c2          := partialRe.LiteralPrefix()
// p2 = "http"   c2 = false  (the ? breaks the literal)

Enter fullscreen mode Exit fullscreen mode

10. Real-World: Email pattern checker

package main

import (
    "fmt"
    "regexp"
)

func main() {
    email := "test@example.com"

    // Simple email regex pattern
    pattern := `^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$`

    re := regexp.MustCompile(pattern)

    if re.MatchString(email) {
        fmt.Println("Valid email")
    } else {
        fmt.Println("Invalid email")
    }
}
Enter fullscreen mode Exit fullscreen mode

When Should You Use RegExp?
Use it when:

Validating input (email, phone, etc.)
Parsing logs
Extracting structured data

Further Reading
Official docs: https://pkg.go.dev/regexp
RE2 syntax ref: https://pkg.go.dev/regexp/syntax
Run all examples: go run regexp_complete.go

Top comments (1)

Collapse
 
danikeya profile image
Daniel Keya

wonderful artical