Most Go CLIs have too many error checks. Not because error handling is wrong — because the errors themselves are wrong.
I read the Power of Go Tools by John Arundel. Based on his book, I audited a 50,000-line Go CLI for functions that return errors unnecessarily. The result: 10 functions refactored, 50+ if err != nil checks eliminated, and the remaining error checks now mean something.
This is John Ousterhout's philosophy from A Philosophy of Software Design: don't handle errors better — design them out of existence.
The Four Patterns
I searched for four specific anti-patterns:
- Idempotent No-ops — returning an error for a state that already matches the desired outcome
- Empty-Set Errors — returning an error when a slice is empty instead of treating it as "no work to do"
- Recoverable Internals — returning an error when the function could make a sensible default decision
- Crash-Only Candidates — returning an error from initialization code where failure is truly fatal
Here's what I found and how I fixed each one.
Pattern 1: Idempotent No-ops
The temp file cleanup
// BEFORE: returns error if temp file is already gone
func IsDirectoryWritable(dir string) error {
f, err := os.CreateTemp(dir, ".stave-write-check-*")
if err != nil {
return err
}
path := f.Name()
_ = f.Close()
return os.Remove(path) // ← why does this error matter?
}
The function's job is to check if a directory is writable. If os.Remove fails because the temp file is already gone, the check already passed. The post-condition is met.
// AFTER: cleanup errors are irrelevant
func IsDirectoryWritable(dir string) error {
f, err := os.CreateTemp(dir, ".stave-write-check-*")
if err != nil {
return err
}
path := f.Name()
_ = f.Close()
_ = os.Remove(path)
return nil
}
Rule: If the post-condition is satisfied, the function succeeds. Period.
Pattern 2: Empty-Set Safety
The empty registry
// BEFORE: empty registry is treated as an error
func NewIndex(data []byte) (*Index, error) {
var idx registryIndex
if err := yaml.Unmarshal(data, &idx); err != nil {
return nil, err
}
if len(idx.Packs) == 0 {
return nil, ErrEmptyRegistry // ← is zero packs really an error?
}
// ...
}
An empty registry means "no packs enabled." That's a valid state — the caller already checks len(packs) before iterating. The error just forces every caller to handle a case that isn't exceptional.
// AFTER: empty registry is a valid zero-work state
func NewIndex(data []byte) (*Index, error) {
var idx registryIndex
if err := yaml.Unmarshal(data, &idx); err != nil {
return nil, err
}
// Zero packs is valid — the registry simply has nothing enabled.
r := &Index{
version: strings.TrimSpace(idx.Version),
// ...
}
// ...
}
Rule: If an empty input means "nothing to do," the function should succeed with a zero-value result, not fail.
The empty predicate
// BEFORE: empty expression is an error
if expr == "" {
return CompiledPredicate{}, fmt.Errorf("predicate produced empty expression")
}
// AFTER: empty predicate = always-pass (no rules = compliant)
if expr == "" {
expr = "true"
}
An empty predicate in a security tool means "no rules defined for this control." That's not an error — it's compliance by default.
Pattern 3: Recoverable Internals
The nil dependency
// BEFORE: nil loader is a fatal error
func ListSnapshotFilesFlat(ctx context.Context, dir string, opts ScannerOptions) ([]SnapshotFile, error) {
if opts.MetadataLoader == nil {
return nil, errMetadataLoaderRequired
}
// ...
}
The MetadataLoader extracts timestamps from snapshot files. If it's nil, there's a perfectly good fallback: use the file's modification time.
// AFTER: nil loader falls back to filesystem metadata
if opts.MetadataLoader == nil {
opts.MetadataLoader = func(filePath, _ string) (time.Time, error) {
fi, err := os.Stat(filePath)
if err != nil {
return time.Time{}, err
}
return fi.ModTime(), nil
}
}
Rule: If a sensible default exists, use it instead of forcing the caller to provide a value they probably don't care about.
The zero time
// BEFORE: zero time is rejected
func NewActiveWindow(openedAt time.Time) (ExposureWindow, error) {
if openedAt.IsZero() {
return ExposureWindow{}, fmt.Errorf("opened_at is required")
}
return ExposureWindow{openedAt: openedAt, active: true}, nil
}
A zero time means "not specified." The window should still be created — the caller can check w.OpenedAt().IsZero() if they care. Most don't.
// AFTER: zero time is accepted — callers check if they care
func NewActiveWindow(openedAt time.Time) ExposureWindow {
return ExposureWindow{openedAt: openedAt, active: true}
}
This one removed error checks from 30+ call sites. Most were already using tl, _ := NewActiveWindow(...) — the Go equivalent of saying "I know this error doesn't matter."
Pattern 4: Crash-Only Candidates
The programming error
// BEFORE: returns error that no caller can recover from
func NewExposureLifecycle(a Asset) (*ExposureLifecycle, error) {
if a.ID.IsEmpty() {
return nil, fmt.Errorf("asset ID must not be empty")
}
return &ExposureLifecycle{ID: a.ID, asset: a}, nil
}
An empty asset ID is a programming error at the call site — the caller constructed the asset. If the ID is empty, there's a bug upstream. No caller can meaningfully recover from this. They all either t.Fatal(err) in tests or propagate it up the stack until someone logs it and exits.
// AFTER: panic on contract violation
func NewExposureLifecycle(a Asset) *ExposureLifecycle {
if a.ID.IsEmpty() {
panic("contract violated: NewExposureLifecycle requires non-empty asset ID")
}
return &ExposureLifecycle{ID: a.ID, asset: a}
}
This matches the checkContracts() pattern already used in the same file for internal state validation. It eliminated error-check boilerplate from 40+ call sites.
Rule: If the error means "you have a bug," use panic. If the error means "the user gave bad input," return an error. Don't conflate programming errors with operational errors.
The Impact
Before:
10 functions returning errors
50+ if err != nil checks across the codebase
Many callers using tl, _ := ... (ignoring the error anyway)
After:
10 functions with clear contracts
50+ fewer error checks
Every remaining if err != nil represents a truly exceptional event
How to Find These in Your CLI
Search for these patterns:
# Functions where callers ignore the error
grep -rn ', _ :=' --include='*.go' | grep -v vendor/ | sort | uniq -c | sort -rn
# Empty-set errors
grep -rn 'len(.*) == 0' --include='*.go' | grep 'Errorf\|errors.New'
# Existing panic patterns (your team already uses crash-only where appropriate)
grep -rn 'panic(' --include='*.go' | grep -v vendor/ | grep -v _test.go
The first command is the most revealing. If you see 40 call sites ignoring an error, the error shouldn't exist.
When NOT to Do This
- User input boundaries: Always validate and return errors for user-supplied data
- Network/IO operations: These fail for real external reasons
- Cross-package contracts: If the caller is in a different package, errors are the right contract
- Data integrity: Schema validation, hash mismatches, malformed input — these are real errors
The goal isn't to eliminate error handling. It's to make every if err != nil mean something. When a developer sees an error check, they should know it represents a truly exceptional event — not a no-op, not an empty set, not a sensible default, and not a programming bug.
These refactorings were applied to Stave, an offline configuration safety evaluator.
Top comments (0)