DEV Community

Cover image for Rebuilding Grep in Go: What I Learned About Unix Text Processing
Uthman Oladele
Uthman Oladele

Posted on

Rebuilding Grep in Go: What I Learned About Unix Text Processing

I use grep every day but had no idea how it works. So I built a basic version in Go.

Not to replace grep. Just to stop being the guy who pipes to grep without understanding what's actually happening.

What I Built

A stripped-down grep that does pattern matching with these flags:

  • -i - Case-insensitive
  • [](url)-n - Line numbers
  • -c - Count matches
  • -v - Invert (show non-matches)
  • -r - Recursive search
  • -l - Just list filenames

Plus binary file detection and color output. That's it. No context lines, no fancy regex modes, no performance optimizations.

The Interesting Parts

Binary Files Don't Print Garbage

When grep hits a binary file (executable, image, whatever), it says "Binary file matches" instead of filling your terminal with nonsense.

How does it know?

func IsBinary(filePath string) (bool, error) {
    f, err := os.Open(filePath)
    if err != nil {
        return false, err
    }
    defer f.Close()

    buffer := make([]byte, 1024)
    n, err := f.Read(buffer)
    if err != nil && err != io.EOF {
        return false, err
    }

    if bytes.IndexByte(buffer[:n], 0) != -1 {
        return true, nil
    }

    return false, nil
}
Enter fullscreen mode Exit fullscreen mode

Read 1KB. If there's a null byte (\0), it's binary. Text files don't have null bytes.

Simple check. Works.

Recursive Search Without Exploding

The -r flag searches directories. This means reading entries, checking if they're files or directories, and recursing when needed.

The trick: Don't stop on errors. One locked file shouldn't kill your entire search.

if info.IsDir() {
    if !opts.Recursive {
        return 0, fmt.Errorf("%s is a directory", fileName)
    }

    entries, err := os.ReadDir(fileName)
    if err != nil {
        return 0, fmt.Errorf("error reading directory: %v", err)
    }

    total := 0
    for _, entry := range entries {
        path := filepath.Join(fileName, entry.Name())
        subCount, err := grepFile(pattern, path, opts)
        if err != nil {
            fmt.Fprintf(os.Stderr, "warning: %v\n", err)
            continue  // Keep going
        }
        total += subCount
    }
    return total, nil
}
Enter fullscreen mode Exit fullscreen mode

Log it, move on. Real grep does this. So should you.

Flags Interact in Weird Ways

Case-insensitive search: Modify the pattern before compiling the regex.

if opts.CaseInsensitive {
    pattern = "(?i)" + pattern
}
re, err := regexp.Compile(pattern)
Enter fullscreen mode Exit fullscreen mode

List files only: Stop after the first match.

if matched {
    count++
    if opts.ListFilesOnly {
        fmt.Printf("%s:\n", fileName)
        return count, nil  // Done
    }
}
Enter fullscreen mode Exit fullscreen mode

Invert match: Flip the boolean.

matched := re.MatchString(line)
if opts.Invert {
    matched = !matched
}
Enter fullscreen mode Exit fullscreen mode

Getting these combinations right took a few tries. -r -l should recurse and list filenames. -v -i should invert case-insensitive matches. Test all the combinations.

Color Output

Grep highlights matches in red. To do this:

  1. Find all regex matches in the line
  2. Replace each match with a colored version
  3. Print the result
matchColor := color.New(color.FgRed).SprintFunc()

coloredLine := re.ReplaceAllStringFunc(line, func(m string) string {
    return matchColor(m)
})
Enter fullscreen mode Exit fullscreen mode

ReplaceAllStringFunc walks through matches and applies your function. Easy.

What I Actually Learned

Null bytes are the binary file signal. One check, problem solved.

Compile regex once, use it everywhere. Compiling per-line is stupid slow. Compile once at the start.

Scanner vs ReadFile depends on the file type. Text files get bufio.Scanner for line-by-line reading. Binary files get ReadFile to check the whole thing. Different tools for different jobs.

Recursive operations need error tolerance. One bad file can't crash everything. Log it, continue.

Manual flag parsing sucks. Next time I'm using a library. Checking string prefixes gets old fast.

What's Missing

Real grep has a lot more:

  • Context lines (-A, -B, -C)
  • Extended regex (-E)
  • Fixed string search (-F)
  • Parallel search
  • Memory-mapped files for huge files
  • Proper CLI argument parsing

Mine doesn't. It does basic pattern matching. That's the point - understand the core, skip the extras.

Try It

git clone https://github.com/codetesla51/go-coreutils.git
cd go-coreutils/grep
go build -o grep grep.go
./grep "pattern" file.txt
Enter fullscreen mode Exit fullscreen mode

Basic usage:

./grep "error" logs.txt
./grep -i -n "warning" logs.txt
./grep -r "TODO" ./src
./grep -c "func" main.go
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Grep is everywhere. Understanding how it works changes how you use it. You stop thinking "magic search command" and start thinking "regex matcher with file handling."

Plus, now when someone asks "how does grep detect binary files?" you actually know.


Source: github.com/codetesla51/go-coreutils

More at devuthman.vercel.app

Top comments (0)