DEV Community

Tony Metzidis
Tony Metzidis

Posted on

Streaming regex scanner — regexpscanner

Go's regexp module falls short with stream processing-- nearly all methods require a string or []byte. The regexpscanner module makes it easy to extract tokens that match regular expression patterns.

https://pkg.go.dev/github.com/tonymet/regexpscanner

Install Module

go get github.com/tonymet/regexpscanner@latest
Enter fullscreen mode Exit fullscreen mode

Example Usage

use ProcessTokens when a simple callback-based stream tokenizer is needed .
ProcessTokens calls handler(string) for each matching token from the Scanner.

package main

import (
    "fmt"
    "regexp"
    "strings"

    rs "github.com/tonymet/regexpscanner"
)

func main() {
    rs.ProcessTokens(
        strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"),
        regexp.MustCompile(`</?[a-z]+>`),
        func(text string) {
            fmt.Println(text)
        })
}
Enter fullscreen mode Exit fullscreen mode

Output

<html>
<body>
<p>
</p>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Give it a try and see the Go Module Page for more examples

Top comments (0)