Bala Paranj

Posted on Apr 21

Design by Contract in Go: Panics, Preconditions, and checkContracts()

#go #programming #architecture #security

How panic for programming errors, error for user input, CONTRACT comments, checkContracts() invariant methods, and sorted-slice preconditions create a defense layer that catches bugs before they corrupt security verdicts.

Your function receives a time value. It computes a duration. The duration is negative. The comparison says "exposure is less than the threshold." The control passes. The bucket is public. The security verdict is wrong.

This happened because the caller passed now as a time before the exposure window started — a programming error, not a user input error. The function should have refused to compute with an impossible input.

Design by Contract gives you three tools to prevent this:

Preconditions: what must be true before a function runs
Postconditions: what must be true after a function returns
Invariants: what must always be true about an object's state

Go doesn't have language-level contracts. But you can enforce them with panics, errors, and methods — each for a different kind of violation.

The Decision: Panic vs Error vs Ignore

Before writing any contract check, answer one question: who caused the violation?

Violator	Response	Go Mechanism
The programmer (impossible state, broken invariant)	Crash immediately — this is a bug	`panic("contract violated: ...")`
The user (bad input, missing file, invalid config)	Return an error — this is expected	`return fmt.Errorf(...)`
Neither (optional field, empty collection, zero value)	Accept and handle gracefully	Default value or early return

This distinction is the foundation. Mixing them up creates either a tool that crashes on bad user input (terrible UX) or a tool that silently continues with corrupted state.

1. Constructor Preconditions — Panic for Programming Errors

The Problem

An ExposureLifecycle tracks how long a cloud resource has been unsafe. It requires a non-empty asset ID — without one, the lifecycle can't be correlated to anything. An empty ID is not "missing input" — it's a programming error in the code that constructs the lifecycle.

The Contract

func NewExposureLifecycle(a Asset) *ExposureLifecycle {
    if a.ID.IsEmpty() {
        panic("contract violated: NewExposureLifecycle requires non-empty asset ID")
    }
    return &ExposureLifecycle{ID: a.ID, asset: a}
}

Why panic, not error: The caller is internal code (the evaluation engine), not the user. If the engine constructs a lifecycle with an empty asset, that's a bug — the engine should have validated the asset before reaching this point. Returning an error would force every caller to handle a condition that should never happen.

Why not silently default: Returning a lifecycle with an empty ID would propagate the bug further — findings would be generated with no asset correlation, reports would have empty IDs, and the researcher would see "violations for asset (empty)" with no way to trace back to the source.

The panic message format: "contract violated: <what was expected>". This format is grep-able. Every contract violation in the codebase starts with "contract violated:" — you can find them all with one search.

2. Invariant Checking — checkContracts() After Mutation

The Problem

ObservationStats tracks observation timestamps incrementally. After each RecordObservation() call, three invariants must hold:

observationCount >= 0
firstSeenAt <= lastSeenAt (when count > 0)
coverageSpan == lastSeenAt - firstSeenAt

If any of these break, every duration calculation that uses the stats will produce wrong results. The security engine will compute wrong exposure durations, wrong SLA comparisons, wrong verdicts.

The Contract

// CONTRACT: coverageSpan is always derived from (lastSeenAt - firstSeenAt).
// CONTRACT: out-of-order timestamps are ignored.
type ObservationStats struct {
    firstSeenAt      time.Time
    lastSeenAt       time.Time
    maxGap           time.Duration
    coverageSpan     time.Duration
    observationCount int
}

func (s *ObservationStats) RecordObservation(t time.Time) error {
    if t.IsZero() {
        return ErrZeroTimestamp
    }

    s.observationCount++

    if s.observationCount == 1 {
        s.firstSeenAt = t
        s.lastSeenAt = t
        s.checkContracts()
        return nil
    }

    // Out-of-order timestamps are silently ignored (CONTRACT comment above)
    if t.Before(s.firstSeenAt) {
        s.checkContracts()
        return nil
    }

    gap := t.Sub(s.lastSeenAt)
    if gap > s.maxGap {
        s.maxGap = gap
    }
    s.lastSeenAt = t
    s.coverageSpan = s.lastSeenAt.Sub(s.firstSeenAt)

    s.checkContracts()
    return nil
}

// checkContracts panics on invariant violations that indicate a programming error.
func (s *ObservationStats) checkContracts() {
    if s.observationCount < 0 {
        panic("contract violated: ObservationStats.observationCount must be >= 0")
    }
    if s.observationCount > 0 && s.firstSeenAt.After(s.lastSeenAt) {
        panic("contract violated: ObservationStats.firstSeenAt must be <= lastSeenAt when count > 0")
    }
}

Why Check After Every Mutation

checkContracts() is called at the end of RecordObservation() — after every state change. This catches bugs at the point of corruption, not hours later when a downstream function produces a wrong result.

Without the check, a bug that sets firstSeenAt after lastSeenAt would silently produce a negative coverageSpan. The coverage validator would see "coverage span is -24 hours, which is less than the required 168 hours" and mark the evaluation as inconclusive. The researcher would see "insufficient observation coverage" and think they need more data, when actually the engine has a bug.

With the check, the panic fires immediately: "contract violated: firstSeenAt must be <= lastSeenAt". The developer sees the exact invariant that broke and can trace it to the mutation that caused it.

3. Precondition Errors — For User-Facing Boundaries

The Problem

The Assessor needs a Clock to compute deterministic timestamps. If no clock is provided, the assessor can't function. But this isn't a programming error in all cases — it could be a configuration mistake by the user (forgetting --now in a test script).

The Contract

func (a *Assessor) Assess(inventory []asset.Snapshot, opts ...AssessmentOptions) (evaluation.ComplianceReport, error) {
    if a.Clock == nil {
        return evaluation.ComplianceReport{}, errors.New("precondition failed: Assessor requires a Clock")
    }
    // ...
}

Why error, not panic: The Assessor is constructed by the app layer, which wires dependencies from user configuration. A missing clock could be a wiring bug or a missing --now flag. Returning an error gives the caller a chance to show a helpful message: "the --now flag is required for deterministic evaluation."

Why "precondition failed" prefix: Same grep-ability as "contract violated". Different prefix signals different severity — "contract violated" means "fix the code," "precondition failed" means "fix the configuration."

The Lifecycle Precondition

func (l *ExposureLifecycle) ExposureDuration(now time.Time) (time.Duration, error) {
    if l.activeWindow == nil || !l.activeWindow.IsActive() {
        return 0, nil  // No active exposure — zero duration is correct
    }
    if now.Before(l.activeWindow.OpenedAt()) {
        return 0, fmt.Errorf("exposure duration: 'now' (%s) must not be before window start (%s)",
            now.Format(time.RFC3339), l.activeWindow.OpenedAt().Format(time.RFC3339))
    }
    return now.Sub(l.activeWindow.OpenedAt()), nil
}

This is the function that would produce a negative duration if now is before the window start. Instead of silently returning a negative number (which would pass threshold comparisons), it returns an error with both timestamps — the caller can diagnose why now is wrong.

4. CONTRACT Comments — The Human Contract

The Pattern

// CONTRACT: only resolved windows are archived.
func (h *ExposureHistory) Record(w ExposureWindow) {
    if w.IsActive() {
        return  // Silently ignore active windows — they're not archivable
    }
    // ... archive the resolved window
}

// CONTRACT: Property paths are dot-separated breadcrumbs (e.g., "properties.cpu.cores").
func diffProperties(before, after map[string]any, path string) []PropertyChange {
    // ...
}

// Items MUST be sorted by CapturedAt ascending (oldest first). The function
// exits early once it encounters an item newer than the cutoff.
func PlanPrune(items []Candidate, criteria Criteria) []Candidate {
    // ...
}

// CONTRACT: vs // MUST: Both express preconditions. CONTRACT: documents structural invariants (data shape, ownership rules). MUST documents caller obligations (sort order, non-nil values).

These comments don't enforce anything at runtime. They're contracts between the author and future maintainers. When a function says // Items MUST be sorted by CapturedAt ascending, the caller is responsible for sorting. If they don't, the function produces wrong results — silently.

When to Upgrade a Comment to a Check

If the contract violation:

Corrupts security verdicts → runtime check (error or panic)
Produces wrong but non-dangerous output → comment only
Is caught by the type system → neither (the compiler enforces it)

The PlanPrune function has a sort-order precondition but no runtime check. Adding a sort-order check would cost O(n) per call. The comment documents the contract; the caller (PlanPrune is called from one place) is audited manually.

The ExposureLifecycle constructor has a non-empty-ID precondition with a runtime panic. An empty ID corrupts findings — it's worth the check.

5. Postcondition Guarantees — Deterministic Output

The Problem

The evaluation engine processes controls × assets. The iteration order of maps in Go is non-deterministic. If findings are appended in iteration order, two runs with identical inputs produce different output — different finding order, different JSON, different hashes.

For a security tool, non-deterministic output destroys trust. "I ran the same command twice and got different results" is the end of credibility.

The Contract

func (s *assessmentSession) compileReport() evaluation.ComplianceReport {
    // POSTCONDITION: findings are sorted deterministically
    evaluation.SortFindings(s.collector.findings)

    // POSTCONDITION: exempted assets are sorted by ID
    slices.SortFunc(s.collector.exemptedAssets, func(a, b asset.ExemptedAsset) int {
        return cmp.Compare(a.ID, b.ID)
    })

    // POSTCONDITION: checks are sorted by control+asset
    slices.SortFunc(s.collector.checks, func(a, b evaluation.ResourceCheck) int {
        if c := cmp.Compare(a.ControlID, b.ControlID); c != 0 {
            return c
        }
        return cmp.Compare(a.AssetID, b.AssetID)
    })

    // ... assemble report from sorted data
}

func SortFindings(fs []Finding) {
    slices.SortFunc(fs, func(a, b Finding) int {
        return cmp.Or(
            cmp.Compare(a.ControlID, b.ControlID),
            cmp.Compare(a.AssetID, b.AssetID),
            cmp.Compare(a.Evidence.TemporalRisk, b.Evidence.TemporalRisk),
        )
    })
}

Three postcondition sorts enforce deterministic output. The compileReport function is the only place that assembles the final report — putting the sorts here guarantees that every report, regardless of how it was built, has deterministic ordering.

The Iteration Guard

The engine also sorts asset IDs before iterating:

func (s *assessmentSession) applyControl(ctl *policy.ControlDefinition, lifecycles map[asset.ID]*asset.ExposureLifecycle) {
    // PRECONDITION for deterministic evaluation order
    assetIDs := make([]asset.ID, 0, len(lifecycles))
    for id := range lifecycles {
        assetIDs = append(assetIDs, id)
    }
    slices.Sort(assetIDs)

    for _, id := range assetIDs {
        // Evaluate in deterministic order
    }
}

This is a precondition on the loop (sorted iteration) that guarantees a postcondition on the output (deterministic finding order). Without the sort, the same map with the same keys produces different iteration orders between runs.

6. Structural Invariant Panics — Catching Impossible States

The Pattern

func (s DriftSummary) matchesChangeCount(changeCount int) bool {
    total := s.provisioned + s.decommissioned + s.reconfigured
    if s.total != total {
        panic("structural contract violation: summary total mismatch")
    }
    return s.total == changeCount
}

DriftSummary has four counters: provisioned, decommissioned, reconfigured, and total. The invariant is that total == provisioned + decommissioned + reconfigured. If this is ever false, the counters are corrupted — some mutation incremented one counter but not another.

Why panic here: This is an "impossible state." The counters are private fields mutated only by Record(). If the invariant breaks, Record() has a bug. No amount of error handling downstream can fix a corrupted counter. The panic points directly at the corruption.

The Contract Hierarchy

Level	Mechanism	When to Use	Example
Type system	Compiler enforces it	Always, when possible	`kernel.ControlID` instead of `string`
Constructor panic	`panic("contract violated: ...")`	Impossible states from programming errors	Empty asset ID in lifecycle constructor
Invariant method	`checkContracts()` after mutation	State corruption that affects downstream logic	`firstSeenAt > lastSeenAt` in stats
Precondition error	`return fmt.Errorf(...)`	Missing configuration or invalid input	Assessor without a Clock
Postcondition sort	`slices.SortFunc(...)` before return	Non-deterministic output	Findings sorted before report assembly
CONTRACT comment	`// CONTRACT: ...`	Caller obligations not worth runtime checking	Sort-order precondition on PlanPrune

The hierarchy goes from strongest (type system — impossible to violate at runtime) to weakest (comment — relies on code review). Use the strongest mechanism that's practical for each contract.

When NOT to Use Contracts

// DON'T panic on user input
func ParseDuration(s string) (time.Duration, error) {
    // This is user input, not a programming error
    if s == "" {
        return 0, fmt.Errorf("empty duration")  // Error, not panic
    }
}

// DON'T check obvious things
func (s Status) String() string {
    // Status is an int enum — it's always valid by type system
    // No contract check needed
}

// DON'T check in hot loops
for _, finding := range findings {
    // Don't checkContracts() inside a loop that runs 1000 times
    // Check once after the loop if needed
}

Contracts have a cost — runtime panics add conditional branches, postcondition sorts add O(n log n) cost, and comment contracts add maintenance burden. Use them at trust boundaries (constructors, mutation methods, output assembly) and skip them inside trusted computation where the type system already provides guarantees.

These design-by-contract patterns are used in Stave, a Go CLI for offline security evaluation. The checkContracts() pattern catches state corruption at the point of mutation. The panic-vs-error distinction ensures programming errors crash immediately while user errors produce actionable messages.

DEV Community

Design by Contract in Go: Panics, Preconditions, and checkContracts()

The Decision: Panic vs Error vs Ignore

1. Constructor Preconditions — Panic for Programming Errors

The Problem

The Contract

2. Invariant Checking — checkContracts() After Mutation

The Problem

The Contract

Why Check After Every Mutation

3. Precondition Errors — For User-Facing Boundaries

The Problem

The Contract

The Lifecycle Precondition

4. CONTRACT Comments — The Human Contract

The Pattern

When to Upgrade a Comment to a Check

5. Postcondition Guarantees — Deterministic Output

The Problem

The Contract

The Iteration Guard

6. Structural Invariant Panics — Catching Impossible States

The Pattern

The Contract Hierarchy

When NOT to Use Contracts

Top comments (0)