DEV Community

Bala Paranj
Bala Paranj

Posted on

Designing Errors Out of Your Go CLI

Most Go CLIs have too many error checks. Not because error handling is wrong — because the errors themselves are wrong.

I read the Power of Go Tools by John Arundel. Based on his book, I audited a 50,000-line Go CLI for functions that return errors unnecessarily. The result: 10 functions refactored, 50+ if err != nil checks eliminated, and the remaining error checks now mean something.

This is John Ousterhout's philosophy from A Philosophy of Software Design: don't handle errors better — design them out of existence.

The Four Patterns

I searched for four specific anti-patterns:

  1. Idempotent No-ops — returning an error for a state that already matches the desired outcome
  2. Empty-Set Errors — returning an error when a slice is empty instead of treating it as "no work to do"
  3. Recoverable Internals — returning an error when the function could make a sensible default decision
  4. Crash-Only Candidates — returning an error from initialization code where failure is truly fatal

Here's what I found and how I fixed each one.

Pattern 1: Idempotent No-ops

The temp file cleanup

// BEFORE: returns error if temp file is already gone
func IsDirectoryWritable(dir string) error {
    f, err := os.CreateTemp(dir, ".stave-write-check-*")
    if err != nil {
        return err
    }
    path := f.Name()
    _ = f.Close()
    return os.Remove(path) // ← why does this error matter?
}
Enter fullscreen mode Exit fullscreen mode

The function's job is to check if a directory is writable. If os.Remove fails because the temp file is already gone, the check already passed. The post-condition is met.

// AFTER: cleanup errors are irrelevant
func IsDirectoryWritable(dir string) error {
    f, err := os.CreateTemp(dir, ".stave-write-check-*")
    if err != nil {
        return err
    }
    path := f.Name()
    _ = f.Close()
    _ = os.Remove(path)
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Rule: If the post-condition is satisfied, the function succeeds. Period.

Pattern 2: Empty-Set Safety

The empty registry

// BEFORE: empty registry is treated as an error
func NewIndex(data []byte) (*Index, error) {
    var idx registryIndex
    if err := yaml.Unmarshal(data, &idx); err != nil {
        return nil, err
    }
    if len(idx.Packs) == 0 {
        return nil, ErrEmptyRegistry // ← is zero packs really an error?
    }
    // ...
}
Enter fullscreen mode Exit fullscreen mode

An empty registry means "no packs enabled." That's a valid state — the caller already checks len(packs) before iterating. The error just forces every caller to handle a case that isn't exceptional.

// AFTER: empty registry is a valid zero-work state
func NewIndex(data []byte) (*Index, error) {
    var idx registryIndex
    if err := yaml.Unmarshal(data, &idx); err != nil {
        return nil, err
    }
    // Zero packs is valid — the registry simply has nothing enabled.
    r := &Index{
        version: strings.TrimSpace(idx.Version),
        // ...
    }
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Rule: If an empty input means "nothing to do," the function should succeed with a zero-value result, not fail.

The empty predicate

// BEFORE: empty expression is an error
if expr == "" {
    return CompiledPredicate{}, fmt.Errorf("predicate produced empty expression")
}

// AFTER: empty predicate = always-pass (no rules = compliant)
if expr == "" {
    expr = "true"
}
Enter fullscreen mode Exit fullscreen mode

An empty predicate in a security tool means "no rules defined for this control." That's not an error — it's compliance by default.

Pattern 3: Recoverable Internals

The nil dependency

// BEFORE: nil loader is a fatal error
func ListSnapshotFilesFlat(ctx context.Context, dir string, opts ScannerOptions) ([]SnapshotFile, error) {
    if opts.MetadataLoader == nil {
        return nil, errMetadataLoaderRequired
    }
    // ...
}
Enter fullscreen mode Exit fullscreen mode

The MetadataLoader extracts timestamps from snapshot files. If it's nil, there's a perfectly good fallback: use the file's modification time.

// AFTER: nil loader falls back to filesystem metadata
if opts.MetadataLoader == nil {
    opts.MetadataLoader = func(filePath, _ string) (time.Time, error) {
        fi, err := os.Stat(filePath)
        if err != nil {
            return time.Time{}, err
        }
        return fi.ModTime(), nil
    }
}
Enter fullscreen mode Exit fullscreen mode

Rule: If a sensible default exists, use it instead of forcing the caller to provide a value they probably don't care about.

The zero time

// BEFORE: zero time is rejected
func NewActiveWindow(openedAt time.Time) (ExposureWindow, error) {
    if openedAt.IsZero() {
        return ExposureWindow{}, fmt.Errorf("opened_at is required")
    }
    return ExposureWindow{openedAt: openedAt, active: true}, nil
}
Enter fullscreen mode Exit fullscreen mode

A zero time means "not specified." The window should still be created — the caller can check w.OpenedAt().IsZero() if they care. Most don't.

// AFTER: zero time is accepted — callers check if they care
func NewActiveWindow(openedAt time.Time) ExposureWindow {
    return ExposureWindow{openedAt: openedAt, active: true}
}
Enter fullscreen mode Exit fullscreen mode

This one removed error checks from 30+ call sites. Most were already using tl, _ := NewActiveWindow(...) — the Go equivalent of saying "I know this error doesn't matter."

Pattern 4: Crash-Only Candidates

The programming error

// BEFORE: returns error that no caller can recover from
func NewExposureLifecycle(a Asset) (*ExposureLifecycle, error) {
    if a.ID.IsEmpty() {
        return nil, fmt.Errorf("asset ID must not be empty")
    }
    return &ExposureLifecycle{ID: a.ID, asset: a}, nil
}
Enter fullscreen mode Exit fullscreen mode

An empty asset ID is a programming error at the call site — the caller constructed the asset. If the ID is empty, there's a bug upstream. No caller can meaningfully recover from this. They all either t.Fatal(err) in tests or propagate it up the stack until someone logs it and exits.

// AFTER: panic on contract violation
func NewExposureLifecycle(a Asset) *ExposureLifecycle {
    if a.ID.IsEmpty() {
        panic("contract violated: NewExposureLifecycle requires non-empty asset ID")
    }
    return &ExposureLifecycle{ID: a.ID, asset: a}
}
Enter fullscreen mode Exit fullscreen mode

This matches the checkContracts() pattern already used in the same file for internal state validation. It eliminated error-check boilerplate from 40+ call sites.

Rule: If the error means "you have a bug," use panic. If the error means "the user gave bad input," return an error. Don't conflate programming errors with operational errors.

The Impact

Before:

10 functions returning errors
50+ if err != nil checks across the codebase
Many callers using tl, _ := ... (ignoring the error anyway)
Enter fullscreen mode Exit fullscreen mode

After:

10 functions with clear contracts
50+ fewer error checks
Every remaining if err != nil represents a truly exceptional event
Enter fullscreen mode Exit fullscreen mode

How to Find These in Your CLI

Search for these patterns:

# Functions where callers ignore the error
grep -rn ', _ :=' --include='*.go' | grep -v vendor/ | sort | uniq -c | sort -rn

# Empty-set errors
grep -rn 'len(.*) == 0' --include='*.go' | grep 'Errorf\|errors.New'

# Existing panic patterns (your team already uses crash-only where appropriate)
grep -rn 'panic(' --include='*.go' | grep -v vendor/ | grep -v _test.go
Enter fullscreen mode Exit fullscreen mode

The first command is the most revealing. If you see 40 call sites ignoring an error, the error shouldn't exist.

When NOT to Do This

  • User input boundaries: Always validate and return errors for user-supplied data
  • Network/IO operations: These fail for real external reasons
  • Cross-package contracts: If the caller is in a different package, errors are the right contract
  • Data integrity: Schema validation, hash mismatches, malformed input — these are real errors

The goal isn't to eliminate error handling. It's to make every if err != nil mean something. When a developer sees an error check, they should know it represents a truly exceptional event — not a no-op, not an empty set, not a sensible default, and not a programming bug.


These refactorings were applied to Stave, an offline configuration safety evaluator.

Top comments (0)