DEV Community

Bala Paranj
Bala Paranj

Posted on

Error Handling in a Go CLI That Security Researchers Trust

How sentinel hierarchies, exit code taxonomy, actionable hints, structured error info, and 'design errors out of existence' transformed a Go CLI's error UX from 'something went wrong' to 'here's what failed, why, and what to do next.'

A security researcher runs your CLI. It fails. The output says:

Error: evaluation failed
Enter fullscreen mode Exit fullscreen mode

They have no idea what failed, why, or what to do next. Was it a bad input file? A missing directory? An internal bug? The exit code is 1 — which in most tools means "something went wrong."

Compare that with:

[INVALID_INPUT] Input validation failed

  --controls path "controls/" does not exist: verify the path or create the directory

  Hint:
    stave validate --controls ./controls --observations ./observations
Enter fullscreen mode Exit fullscreen mode

Exit code 2. The researcher knows it's their input (not a bug), knows which flag is wrong, and has a command to run next.

This article shows how to build that second experience in Go — with concrete before/after code for every pattern.

1. Sentinel Error Hierarchies

The Problem

You have an integrity verification system that can fail in three ways: a file hash doesn't match, an unexpected file appears, or a required file is missing. With a single error:

var ErrIntegrityViolation = errors.New("integrity violation")

func Verify(manifest Manifest, actual *InputHashes) error {
    for name, expected := range manifest.Files {
        actual, ok := actual.Files[name]
        if !ok {
            return fmt.Errorf("%w: %s", ErrIntegrityViolation, name)
        }
        if actual != expected {
            return fmt.Errorf("%w: %s", ErrIntegrityViolation, name)
        }
    }
    return nil
}
Enter fullscreen mode Exit fullscreen mode

The caller can check errors.Is(err, ErrIntegrityViolation) but can't distinguish "hash mismatch" from "missing file." The error message is a string — parsing it is fragile.

The Fix: Nested Sentinels

var (
    ErrIntegrityViolation = errors.New("integrity violation")
    ErrHashMismatch       = fmt.Errorf("%w: file hash mismatch", ErrIntegrityViolation)
    ErrMissingFile        = fmt.Errorf("%w: required file missing", ErrIntegrityViolation)
    ErrUntrustedFile      = fmt.Errorf("%w: untrusted file found", ErrIntegrityViolation)
)

func Verify(manifest Manifest, actual *InputHashes) error {
    for name, expected := range manifest.Files {
        hash, ok := actual.Files[name]
        if !ok {
            return fmt.Errorf("%w: %s", ErrMissingFile, name)
        }
        if hash != expected {
            return fmt.Errorf("%w: %s (expected %s, got %s)", ErrHashMismatch, name, expected, hash)
        }
    }
    for name := range actual.Files {
        if _, ok := manifest.Files[name]; !ok {
            return fmt.Errorf("%w: %s", ErrUntrustedFile, name)
        }
    }
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Now callers can match at two levels:

// Broad: "was there any integrity problem?"
if errors.Is(err, ErrIntegrityViolation) { ... }

// Specific: "was it a hash mismatch specifically?"
if errors.Is(err, ErrHashMismatch) { ... }
Enter fullscreen mode Exit fullscreen mode

The hierarchy works because ErrHashMismatch wraps ErrIntegrityViolation via %w. Go's errors.Is walks the chain.

2. Exit Code Taxonomy

The Problem

Most CLIs exit with 0 (success) or 1 (failure). A CI pipeline can't distinguish "the tool crashed" from "the tool found violations" from "the user passed a bad flag."

The Fix: Semantic Exit Codes

const (
    ExitSuccess     = 0   // No issues
    ExitSecurity    = 1   // Security-audit gating failure
    ExitInputError  = 2   // Invalid input, flags, or schema validation
    ExitViolations  = 3   // Evaluation completed — findings detected
    ExitInternal    = 4   // Unexpected internal error (bug)
    ExitInterrupted = 130 // Interrupted by SIGINT (Ctrl+C)
)
Enter fullscreen mode Exit fullscreen mode

The key design decision: exit code 3 means the tool succeeded. It ran correctly, evaluated the controls, and found violations. That's not an error — it's a policy signal. Exit code 4 is the bug indicator.

Map domain errors to exit codes:

func ExitCode(err error) int {
    if err == nil {
        return ExitSuccess
    }

    switch {
    case errors.Is(err, ErrViolationsFound):
        return ExitViolations
    case errors.Is(err, ErrSecurityAuditFindings):
        return ExitSecurity
    case errors.Is(err, ErrValidationFailed):
        return ExitViolations
    case errors.Is(err, ErrInterrupted):
        return ExitInterrupted
    }

    // UserError = the user did something wrong (bad flag, missing file)
    var uErr *UserError
    if errors.As(err, &uErr) {
        return ExitInputError
    }

    // Unknown errors are internal failures, not user errors.
    return ExitInternal
}
Enter fullscreen mode Exit fullscreen mode

The last line is critical. The original code defaulted unknown errors to ExitInputError (2). That blamed the user for internal bugs. Now unknown errors map to ExitInternal (4) — the tool admits the fault is its own.

CI pipelines can now branch meaningfully:

stave apply --controls controls --observations observations
case $? in
    0)   echo "Clean" ;;
    2)   echo "Bad input — fix the command" ; exit 1 ;;
    3)   echo "Violations found — fix the infrastructure" ; exit 1 ;;
    4)   echo "Bug in stave — report it" ; exit 1 ;;
    130) echo "Cancelled" ; exit 0 ;;
esac
Enter fullscreen mode Exit fullscreen mode

3. Actionable Error Hints

The Problem

Error: no snapshots found in observations/
Enter fullscreen mode Exit fullscreen mode

The user knows what's wrong but not what to do. They have to read the docs, find the right command, figure out the correct flags.

The Fix: Sentinel Hints with Remediation Commands

Define hint sentinels that carry remediation information:

var ErrHintNoSnapshots = errors.New("hints: no snapshots")
var ErrHintControlsNotAccessible = errors.New("hints: controls not accessible")
Enter fullscreen mode Exit fullscreen mode

Register remediation guidance for each:

type RemediationHint struct {
    Reason      string
    NextCommand string
}

var hintRegistry = []struct {
    sentinel error
    hint     RemediationHint
}{
    {
        sentinel: ErrHintNoSnapshots,
        hint: RemediationHint{
            Reason:      "No observation snapshots found for evaluation.",
            NextCommand: "stave validate --controls ./controls --observations ./observations",
        },
    },
    {
        sentinel: ErrHintControlsNotAccessible,
        hint: RemediationHint{
            Reason:      "Controls directory not found or not readable.",
            NextCommand: "stave init",
        },
    },
}
Enter fullscreen mode Exit fullscreen mode

Attach hints to errors at the point of failure:

func WithHint(err error, hint error) error {
    if err == nil || hint == nil {
        return err
    }
    // Don't double-wrap the same hint
    if errors.Is(err, hint) {
        return err
    }
    return &hintedError{hint: hint, err: err}
}

// Usage at the failure site:
snapshots, err := loader.Load(ctx, dir)
if err != nil {
    return ui.WithHint(
        fmt.Errorf("load observations from %s: %w", dir, err),
        ui.ErrHintNoSnapshots,
    )
}
Enter fullscreen mode Exit fullscreen mode

The error renderer searches for hints and displays them:

func SuggestForError(err error) RemediationHint {
    for _, entry := range hintRegistry {
        if errors.Is(err, entry.sentinel) {
            return entry.hint
        }
    }
    return RemediationHint{}
}
Enter fullscreen mode Exit fullscreen mode

Output:

Error: load observations from observations/: no JSON files found

  Hint:
    stave validate --controls ./controls --observations ./observations
Enter fullscreen mode Exit fullscreen mode

The hint is attached to the error chain, not to the error message. Different renderers (text, JSON) can extract and format it differently. The error chain preserves the original error for programmatic matching while carrying the human-readable guidance.

4. Structured ErrorInfo

The Problem

Text errors are for humans. JSON errors are for machines. Most CLIs do one or the other. A security tool needs both — a human reads the terminal output, a CI system parses the JSON.

The Fix: ErrorInfo with Builder Pattern

type ErrorCode string

const (
    CodeIOError       ErrorCode = "IO_ERROR"
    CodeParseError    ErrorCode = "PARSE_ERROR"
    CodeSchemaError   ErrorCode = "SCHEMA_ERROR"
    CodeInvalidInput  ErrorCode = "INVALID_INPUT"
    CodeViolationsFound ErrorCode = "VIOLATIONS_FOUND"
)

type ErrorInfo struct {
    Code     ErrorCode         `json:"code"`
    Title    string            `json:"title,omitempty"`
    Message  string            `json:"message"`
    Action   string            `json:"action,omitempty"`
    URL      string            `json:"url,omitempty"`
    Evidence map[string]string `json:"evidence,omitempty"`
}

func (e *ErrorInfo) Error() string {
    if e.Title != "" {
        return fmt.Sprintf("[%s] %s: %s", e.Code, e.Title, e.Message)
    }
    return fmt.Sprintf("[%s] %s", e.Code, e.Message)
}
Enter fullscreen mode Exit fullscreen mode

Construct with fluent builders:

func NewErrorInfo(code ErrorCode, message string) *ErrorInfo {
    return &ErrorInfo{Code: code, Message: message}
}

func (e *ErrorInfo) WithTitle(t string) *ErrorInfo {
    if e != nil { e.Title = t }
    return e
}

func (e *ErrorInfo) WithAction(a string) *ErrorInfo {
    if e != nil { e.Action = a }
    return e
}

func (e *ErrorInfo) WithURL(u string) *ErrorInfo {
    if e != nil { e.URL = u }
    return e
}
Enter fullscreen mode Exit fullscreen mode

Map sentinel errors to ErrorInfo using data-driven templates:

type errorTemplate struct {
    Code   ErrorCode
    Title  string
    Action string
}

var sentinelTemplates = map[int]errorTemplate{
    ExitSecurity: {
        Code:   CodeSecurityAuditFindings,
        Title:  "Security audit gate failed",
        Action: "Review the security-audit report and remediate findings at or above --fail-on.",
    },
    ExitViolations: {
        Code:   CodeViolationsFound,
        Title:  "Violations detected",
        Action: "Review findings and run `stave diagnose` for root-cause guidance.",
    },
    ExitInputError: {
        Code:   CodeInvalidInput,
        Title:  "Input validation failed",
        Action: "Run `stave validate` with the same inputs to get actionable fix hints.",
    },
}

func errorInfoFromError(err error, message string) *ErrorInfo {
    if tmpl, ok := sentinelTemplates[ExitCode(err)]; ok {
        return NewErrorInfo(tmpl.Code, message).
            WithTitle(tmpl.Title).
            WithAction(tmpl.Action)
    }
    return NewErrorInfo(CodeInternalError, message)
}
Enter fullscreen mode Exit fullscreen mode

Adding a new error category requires one map entry instead of a switch case.

5. Design Errors Out of Existence

The Problem

A function returns (Window, error) where the error only fires for zero-value time. Every caller must handle an error that adds no information:

func NewActiveWindow(t time.Time) (ExposureWindow, error) {
    if t.IsZero() {
        return ExposureWindow{}, errors.New("time must not be zero")
    }
    return ExposureWindow{openedAt: t, active: true}, nil
}

// 30+ call sites all do this:
w, err := NewActiveWindow(ts)
if err != nil {
    return err
}
Enter fullscreen mode Exit fullscreen mode

The error doesn't help the user (zero time is a programming error, not a runtime condition). It just adds boilerplate.

The Fix: Accept the Input, Return a Sensible Default

func NewActiveWindow(t time.Time) ExposureWindow {
    return ExposureWindow{openedAt: t, active: true}
}

// 30+ call sites simplified:
w := NewActiveWindow(ts)
Enter fullscreen mode Exit fullscreen mode

If t is zero, the window has a zero openedAt — which is detectable by callers who care, but doesn't force every caller to handle an error for a condition that shouldn't happen.

This is John Ousterhout's principle: define errors out of existence. If a function can accept all inputs and produce a meaningful result, it shouldn't return an error. Reserve errors for conditions that genuinely require caller intervention — missing files, network failures, validation rejections.

Where to apply this:

  • Constructors where zero values are safe defaults
  • Parsers where empty input means "use default"
  • Lookup functions where "not found" is a valid result (return (T, bool) not (T, error))

Where NOT to apply this:

  • I/O operations (file reads, network calls)
  • Validation of user input (control IDs, durations)
  • Security boundaries (integrity checks, permission validation)

6. Silent Error Swallowing — Making It Visible

The Problem

h, err := hasher.HashFile(lockPath)
if err != nil {
    return nil  // Error silently swallowed
}
Enter fullscreen mode Exit fullscreen mode

Is this a bug or intentional? Reading the code, you can't tell. The nilerr linter flags it as suspicious — "error is not nil but function returns nil."

The Fix: Annotate Intentional Swallows

h, err := hasher.HashFile(lockPath)
if err != nil {
    return nil //nolint:nilerr // lock file is optional — absence is not an error
}
Enter fullscreen mode Exit fullscreen mode

The annotation does three things:

  1. Tells the linter this is intentional (suppresses the warning)
  2. Tells the next developer why it's intentional
  3. Makes accidental swallows visible — they won't have the annotation and the linter will catch them

We found 9 intentional swallows across the codebase. Each was annotated with its specific reason: "no snapshots is not an error for optional asset properties," "TTY detection failure means non-interactive," "unparseable policy treated as empty."

7. Error Wrapping That Preserves the Chain

The Problem

// BEFORE: direct type assertion breaks wrapped errors
if ee, ok := err.(*exitError); ok {
    return ee.code
}
Enter fullscreen mode Exit fullscreen mode

If err is wrapped (e.g., fmt.Errorf("evaluate: %w", exitErr)), the type assertion fails. The exit code falls through to the default — the wrong exit code for the wrong reason.

The Fix: errors.As Traverses the Chain

// AFTER: errors.As unwraps to find the target type
var ee *exitError
if errors.As(err, &ee) {
    return ee.code
}
Enter fullscreen mode Exit fullscreen mode

The same principle applies to %v vs %w in fmt.Errorf:

// BEFORE: %v discards the inner error's chain
return fmt.Errorf("%w: signature verification failed: %v", ErrIntegrityViolation, err)

// AFTER: %w preserves both error chains (Go 1.20+ multi-wrapping)
return fmt.Errorf("%w: signature verification failed: %w", ErrIntegrityViolation, err)
Enter fullscreen mode Exit fullscreen mode

With %v, callers can't errors.Is(err, innerSentinel). With %w, the full chain is preserved — both ErrIntegrityViolation and the inner crypto error are matchable.

The Error Architecture

All these patterns compose into a layered system:

User types a command
  → CLI parses flags (UserError if invalid → exit 2)
  → App layer loads files (I/O error with hint → exit 2)
  → Core evaluates controls (sentinel findings → exit 3)
  → Output rendered (ErrorInfo for text/JSON)
  → Exit code mapped from error chain
Enter fullscreen mode Exit fullscreen mode

Each layer adds context without losing the original error:

// Core: produces the sentinel
return ErrViolationsFound

// App: wraps with context
return fmt.Errorf("evaluate controls: %w", err)

// CLI: attaches hint
return ui.WithHint(err, ui.ErrHintDiagnose)

// Executor: maps to ErrorInfo + exit code
info := errorInfoFromError(err, err.Error())
ui.WriteErrorText(stderr, info)
os.Exit(ui.ExitCode(err))
Enter fullscreen mode Exit fullscreen mode

The error travels through 4 layers. At each layer, errors.Is still matches the original sentinel. The hint is extractable. The exit code is deterministic. The ErrorInfo is renderable as text or JSON.

That's the difference between "Error: evaluation failed" and a system that tells the researcher what happened, why, and what to do next.


These error handling patterns were implemented across a 50,000-line Go security CLI Stave. The sentinel hierarchy, exit code taxonomy, and hint system handle 7 error categories with 5 distinct exit codes and actionable remediation for every failure mode.

Top comments (0)