DEV Community

loading...

Run a regex on each line of a file (Go)

christalib profile image chris ・1 min read

Want to find a word in a file? Want to run regexes on a specific pattern? Easy peasy lemon squeezie.

    package main

    import (
        "bufio"
        "fmt"
        "log"
        "os"
        "regexp"
    )

    func main() {
        file, err := os.Open("file.txt")
        if err != nil {
            log.Fatal(err)
        }
        defer file.Close()

        scanner := bufio.NewScanner(file)
        r, err := regexp.Compile("treasure") // this can also be a regex

        if err != nil {
            log.Fatal(err)
        }

        for scanner.Scan() {
            if r.MatchString(scanner.Text()) {
                fmt.Println(scanner.Text())
            }
        }

        if err := scanner.Err(); err != nil {
            log.Fatal(err)
        }
    }
Enter fullscreen mode Exit fullscreen mode

Discussion (4)

pic
Editor guide
Collapse
kevincolemaninc profile image
Kevin Coleman

The problems I see with this are:

  1. bufio.NewScanner can't handle long lines, you need to use bufio.NewReader
  2. This is slow. Is there a faster way?
Collapse
christalib profile image
chris Author

Hi! Thanks for your comment.

This part of code is within a bigger logic working with queues that bufferise the data. Indeed the i/o between the file and the program isn't the fastest but it did the job for what I needed at that time.

Maybe there is something to do with:

     for scanner.Scan() {
            if r.MatchString(scanner.Text()) {
                fmt.Println(scanner.Text())
            }
        }
Enter fullscreen mode Exit fullscreen mode

and send each line to a own thread with go func(). The fmt.Println is slow, so in my code this does something else and you don't really need it as such. If you want to output the result, I'll print the results at the end of the loop.

Collapse
kevincolemaninc profile image
Kevin Coleman

I ended up speeding this up by reading the entire file at once instead of line by line. I think the issue was my program was constantly going back and forth with the disk, instead of streaming all of the information it needs at once.

Fortunately for me, the files I’m scanning are small, so it was ok to keep them in RAM for my processing.

Thread Thread
christalib profile image
chris Author

Indeed, you would want to minimize the i/o between your program and the file as those are costly in memory. Good that you found a solution! :)