Bala Paranj

Posted on May 2

5 Fat Structs We Split — And the Go Patterns That Replaced Them

#go #programming #refactoring #architecture

A 12-field god object, a Summary mixing counts with metadata, 5 types.go junk drawers, a 507-line command handler, and duplicated fields across sibling structs — how splitting and embedding produced focused, maintainable types.

A struct with 12 fields is doing too many things. You cannot change one concern without risking another.

Our security engine had a Runner with 12 fields. Three of them (StaveVersion, InputHashes, identitiesByTime) were per-evaluation state on a struct that was supposed to be reusable configuration. When two evaluations ran sequentially, the second evaluation's identitiesByTime was polluted by the first. The struct was too fat to reason about.

Here are five fat structs we split, with the exact before/after code and the principle behind each decomposition.

1. The God Object — Runner with 12 Fields

Before: Configuration and session state in one struct

type Runner struct {
    // Configuration (long-lived)
    Controls          []policy.ControlDefinition
    Hasher            ports.Digester
    CELEvaluator      policy.PredicateEval
    MaxUnsafeDuration time.Duration
    Exceptions        *policy.ExceptionConfig
    Exemptions        asset.ExemptionMatcher
    Clock             func() time.Time
    ContinuityLimit   int

    // Session state (per-evaluation, mutable)
    StaveVersion     string                              // only used in buildResult
    InputHashes      *evaluation.InputHashes              // only used in buildResult
    identitiesByTime map[time.Time][]asset.CloudIdentity  // mutable per-evaluation
}

12 fields. Three categories mixed together: infrastructure services (Clock, Hasher), governance config (Controls, Exceptions), and per-run state (StaveVersion, identitiesByTime). The identitiesByTime field was mutable — set during Evaluate() and read during buildResult(). If the Runner was reused, the identity map from run 1 leaked into run 2.

After: Three focused types

// Assessor: stateless configuration, reusable across evaluations
type Assessor struct {
    Logger          *slog.Logger
    Clock           ports.Clock
    Hasher          ports.Digester
    PredicateEval   policy.PredicateEval
    Confidence      evaluation.ConfidenceCalculator
    Tracer          ports.Tracer
    Controls        []policy.ControlDefinition
    Exemptions      *policy.ExemptionConfig
    Exceptions      *policy.ExceptionConfig
    SLAThreshold    time.Duration
    ContinuityLimit time.Duration
}

// AssessmentOptions: per-run parameters, passed to Assess()
type AssessmentOptions struct {
    StaveVersion string
    InputHashes  *evaluation.InputHashes
}

// assessmentSession: per-run state, garbage collected after Assess() returns
type assessmentSession struct {
    assessor   *Assessor
    inventory  []asset.Snapshot
    auditTime  time.Time
    collector  *AssessmentCollector
    idIndex    IdentityIndex
    opts       AssessmentOptions
    activeSpan ports.AssessmentSpan
}

The split is: separation by lifetime. Configuration lives for the process. Options live for one call. Session state lives for one evaluation and is discarded.

Assessor is created once, configured once, used for every evaluation.
AssessmentOptions is passed per-call as a variadic parameter.
assessmentSession is created inside Assess(), used during evaluation, and garbage collected when Assess() returns.

The mutable state is gone: identitiesByTime became IdentityIndex — a value type constructed fresh per evaluation inside the session. No leaking between runs. No mutation of the Assessor.

2. The Fat Summary — Three Concerns in One Struct

Before: Counts, gating, and metadata mixed

type Summary struct {
    Total             int               // counting
    Pass              int               // counting
    Warn              int               // counting
    Fail              int               // counting
    BySeverity        map[Severity]int  // counting
    FailOn            Severity          // gating logic
    Gated             bool              // gating logic
    GatedFindingCount int               // gating logic
    VulnSourceUsed    string            // metadata
    EvidenceFreshness string            // metadata
}

10 fields, three responsibilities. The RecomputeSummary method needed to reset counts (Total, Pass, Warn, Fail) while preserving metadata (VulnSourceUsed, EvidenceFreshness). With all fields in one struct, every recompute had to carefully skip the metadata fields — miss one and the vulnerability source disappears from the report.

After: Three cohesive sub-structs

type Summary struct {
    Counts   ResultCounts
    Gating   GatingInfo
    Metadata AuditMeta
}

type ResultCounts struct {
    Total      int                     `json:"total"`
    Pass       int                     `json:"pass"`
    Warn       int                     `json:"warn"`
    Fail       int                     `json:"fail"`
    BySeverity map[policy.Severity]int `json:"by_severity"`
}

type GatingInfo struct {
    FailOn            policy.Severity `json:"fail_on"`
    GatedFindingCount int             `json:"gated_finding_count"`
    Gated             bool            `json:"gated"`
}

type AuditMeta struct {
    VulnSourceUsed    string `json:"vuln_source_used,omitempty"`
    EvidenceFreshness string `json:"evidence_freshness,omitempty"`
}

Now RecomputeSummary resets counts by constructing a new ResultCounts — metadata and gating info are untouched because they're separate fields:

func (r *Report) RecomputeSummary() {
    r.Summary.Counts = ResultCounts{}  // Reset counts only
    for _, f := range r.Findings {
        r.Summary.Counts.Total++
        // ... tally pass/warn/fail
    }
    // r.Summary.Metadata is preserved — different struct
}

The split is: separation by mutation frequency. Counts change on every recompute. Gating info changes when the threshold changes. Metadata is set once at report creation and never changes.

JSON contract preserved: The DTO layer maps from nested structs to the same flat JSON shape. External consumers see no change.

3. The types.go Junk Drawer — 5 Files Holding 125 Types

Before: One file per package holding all types

internal/core/setup/types.go         — 40 types (doctor, config, status, generate, init, env, alias, context)
internal/app/securityaudit/evidence/types.go — 28 types (enums, params, snapshot, bundle, providers, deps)
internal/adapters/output/dto/types.go        — 23 types (finding DTO, result DTO, remediation DTO)
internal/core/reporting/types.go             — 22 types (baseline, cidiff, report, diagnose, enforce, docs, prompt)
internal/core/usecase/types.go               — 12 types (gate, fix, trace, apply, verify, fixloop)

125 types across 5 files. Each file was a "junk drawer" — every new type in the package went into types.go because there was no better home. Finding a specific type required scrolling through 200+ lines of unrelated definitions.

The real cost: Git blame showed that types.go files had the highest churn rate. Every new feature touched them. Merge conflicts can happen because developers adding unrelated types both modify the same file.

After: One file per concern

internal/core/setup/
├── doctor_types.go     — 5 types (DoctorCheck, DoctorContext, ...)
├── config_types.go     — 6 types (ConfigSetting, ConfigValue, ...)
├── status_types.go     — 8 types (ProjectState, SessionInfo, ...)
├── generate_types.go   — 4 types (GenerateRequest, TemplateType, ...)
├── init_types.go       — 7 types (InitRequest, ScaffoldResult, ...)
├── env_types.go        — 4 types (EnvVar, EnvListResponse, ...)
├── alias_types.go      — 3 types (Alias, AliasListResponse, ...)
└── context_types.go    — 3 types (ContextInfo, ContextListResponse, ...)

The 40-type types.go became 8 focused files. Each file contains types for one command or one domain concept. Finding DoctorCheck means opening doctor_types.go — not scrolling through 200 lines.

The split is: separation by domain concept, not by Go construct. A file should hold types that change together. Doctor types change when doctor logic changes. Config types change when config logic changes. They never change at the same time — different files, different churn.

4. The 507-Line Command Handler — Logic in the CLI Layer

Before: Business logic in cmd/

// cmd/diagnose/prompt.go — 507 lines
func runPromptFromFinding(cmd *cobra.Command, ...) error {
    // Load controls (30 lines)
    // Load snapshots (20 lines)
    // Match findings to assets (40 lines)
    // Build prompt data with evidence summaries (80 lines)
    // Format guidance sections (60 lines)
    // Render the prompt template (30 lines)
    // Write output to stdout or clipboard (40 lines)
    // ... 200+ more lines of interleaved concerns
}

507 lines in a Cobra command handler. The function loaded data, computed evidence summaries, built prompt context, rendered templates, and handled clipboard output — all in one file. Testing any piece required constructing a *cobra.Command with flags.

After: Split by architectural layer

// cmd/diagnose/prompt.go — 274 lines (CLI wiring only)
func runPromptFromFinding(cmd *cobra.Command, ...) error {
    // Extract CLI flags
    // Build app-layer request
    // Call app.RunPrompt(request)
    // Render output
}

// internal/app/diagnose/prompt/runner.go — 196 lines (business logic)
type Runner struct { ... }

func (r *Runner) Run(ctx context.Context, req Request) (Output, error) {
    // Load controls and snapshots
    // Match findings to assets
    // Build prompt data
    // Render template
    return Output{Rendered: rendered, FindingIDs: ids, AssetID: assetID}, nil
}

The business logic moved to internal/app/diagnose/prompt/ — testable without Cobra, importable from other commands, and following the hexagonal architecture (app layer depends on ports, not on CLI).

The split is: separation by architectural layer. CLI code handles flags, stdin/stdout, and exit codes. App code handles domain logic. Adapters handle I/O. Each layer is testable independently.

5. Duplicated Fields — Embedding to Eliminate Drift

Before: Same 4 fields in 3 sibling structs

type CleanupPlanOutput struct {
    Dir       string
    Files     []SnapshotFile
    TotalSize int64
    DryRun    bool
}

type CleanupRunOutput struct {
    Dir       string
    Files     []SnapshotFile
    TotalSize int64
    Archived  bool
}

type CleanupSummaryOutput struct {
    Dir       string
    Files     []SnapshotFile
    TotalSize int64
    Errors    []error
}

Dir, Files, TotalSize are repeated in all three structs. When we added FileCount int to CleanupPlanOutput, we forgot to add it to CleanupRunOutput. The plan showed 47 files, the run showed nothing — drift between siblings.

After: Embedded core struct

type CleanupOutputCore struct {
    Dir       string
    Files     []SnapshotFile
    TotalSize int64
}

type CleanupPlanOutput struct {
    CleanupOutputCore
    DryRun bool
}

type CleanupRunOutput struct {
    CleanupOutputCore
    Archived bool
}

type CleanupSummaryOutput struct {
    CleanupOutputCore
    Errors []error
}

Adding FileCount to CleanupOutputCore propagates to all three structs automatically. No drift possible.

The same pattern was applied to AccessFlags:

// Shared access analysis flags
type AccessFlags struct {
    AllowsPublicRead  bool
    AllowsPublicList  bool
    AllowsPublicWrite bool
}

// Policy analysis embeds the shared flags
type PolicyAnalysis struct {
    AccessFlags
    HasDenyAll bool
    Statements []StatementAnalysis
}

// ACL analysis embeds the same flags
type ACLAnalysis struct {
    AccessFlags
    Grants []GrantAnalysis
}

The embedding: extracts the common fields into a named struct when three or more siblings share the same field set. Two siblings might be coincidence. Three is a pattern.

The Decision Grid

Signal	Pattern	Example
Fields with different lifetimes	Split into config + session + options	Runner → Assessor + AssessmentOptions + assessmentSession
Fields with different mutation frequencies	Split into sub-structs	Summary → ResultCounts + GatingInfo + AuditMeta
File with 20+ types from different domains	Split into focused files	types.go → doctor_types.go + config_types.go + ...
200+ line function mixing CLI + business logic	Split by architectural layer	cmd/prompt.go → cmd/ + internal/app/
Same fields in 3+ sibling structs	Embed a shared core struct	CleanupOutputCore embedded in Plan/Run/Summary

The common thread: a struct should have one reason to change. If you can describe two independent reasons why the struct might be modified (e.g., "when we change the counting logic" AND "when we add a new metadata field"), it's two structs pretending to be one.

These 5 struct decompositions were applied across Stave, a 50,000-line Go CLI for offline security evaluation. The Runner god object split eliminated a mutable-state leak between evaluations. The types.go junk drawer split reduced the highest-churn files to zero merge conflicts. The Summary decomposition made RecomputeSummary a one-line reset.

DEV Community