Bala Paranj

Posted on May 6

Fuzz Testing a Go Security CLI: 8 Targets that Found What Unit Tests Missed

#go #testing #security #programming

How Go's built-in fuzzer — feeding millions of random inputs into parsers, compilers, validators, and evaluators — catches crashes, infinite loops, and logic gaps that hand-written tests never cover.

Unit tests check what you already know can go wrong. Fuzz tests find what you have not thought of.

You write a ParseDuration("7d") test. It passes. You test "168h". Passes. You test "". Returns an error. You test "-1h". Returns an error. Ship it.

Then a user passes "9999999999999999999h". Your parser overflows a float64 multiplication, returns a duration of negative infinity, and the evaluation engine computes that a resource has been unsafe for negative time — which is less than any threshold, so it passes. A security control that should fire doesn't.

A fuzz test will find it. The fuzzer generates random strings — including absurdly long numbers, unicode, null bytes, and strings that no one would think to type. If any of them cause a panic, a hang, or a silent wrong result, the fuzzer reports the exact input that broke your code.

How Go Fuzzing Works

Go 1.18+ has fuzzing built in. No external tools. No frameworks. Just func FuzzXxx(f *testing.F):

func FuzzParseDuration(f *testing.F) {
    // Seed corpus: known inputs to start from
    f.Add("7d")
    f.Add("168h")
    f.Add("")
    f.Add("-1h")

    // Fuzz function: called millions of times with mutations
    f.Fuzz(func(t *testing.T, input string) {
        _, _ = ParseDuration(input)  // Must not panic
    })
}

The fuzzer starts with your seeds, then mutates them — bit flips, insertions, deletions, concatenations, boundary values. Each iteration runs in microseconds. In 30 seconds, it tests hundreds of thousands of inputs.

Run it:

go test -fuzz=FuzzParseDuration -fuzztime=30s ./internal/core/kernel/

If it finds a crashing input, it saves it to testdata/fuzz/FuzzParseDuration/ and the test fails reproducibly from then on.

The 8 Fuzz Targets

Here's what we fuzz in a security CLI and why each target matters.

1. Control ID Parser — The Identity Boundary

Control IDs are typed domain identifiers (CTL.S3.PUBLIC.001). They're parsed from YAML configs, CLI arguments, and JSON artifacts. If the parser panics on malformed input, every code path that loads user-authored controls crashes.

func FuzzNewControlID(f *testing.F) {
    seeds := []string{
        "",
        "CTL.S3.PUBLIC.001",
        "CTL.S3.PUBLIC.LIST.001",
        "INV.S3.PUBLIC.001",
        "FOO",
        "...",
        "CTL.",
        "CTL.S3.PUBLIC",
        string(make([]byte, 1024)),
    }
    for _, s := range seeds {
        f.Add(s)
    }

    f.Fuzz(func(t *testing.T, input string) {
        id, err := NewControlID(input)
        if err == nil {
            // Valid ID must survive JSON round-trip.
            data, err := json.Marshal(id)
            if err != nil {
                t.Fatalf("Marshal valid ControlID %q: %v", id, err)
            }
            var rt ControlID
            if err := json.Unmarshal(data, &rt); err != nil {
                t.Fatalf("Unmarshal valid ControlID %q: %v", string(data), err)
            }
            if rt != id {
                t.Fatalf("round-trip mismatch: got %q, want %q", rt, id)
            }
        }
    })
}

What makes this fuzz test special: it doesn't just check doesn't panic. It verifies a property: if the parser accepts an input as valid, that input must survive a JSON round-trip unchanged. This catches subtle encoding bugs — a control ID that contains characters that JSON escapes differently, or a unicode normalization issue where marshaling produces a different byte sequence.

Seeds include edge cases: empty string, valid IDs with different segment counts, wrong prefix (INV), partial IDs (CTL.), and a 1KB string (tests buffer handling).

2. Duration Parser — The CLI Flag Boundary

The --max-unsafe flag accepts durations like 7d, 168h, 1.5d, 1d12h. Users type these. If the parser crashes on "999999999999d", the CLI exits with a stack trace instead of an error message.

func FuzzParseDuration(f *testing.F) {
    seeds := []string{
        "", "0", "1h", "7d", "168h",
        "1.5d", "1d12h", "24h",
        "-1h", "0d", "999d",
        "abc", "d", "h", "1x",
        "9999999999999999999h",
        string(make([]byte, 1024)),
    }
    for _, s := range seeds {
        f.Add(s)
    }

    f.Fuzz(func(t *testing.T, input string) {
        _, _ = ParseDuration(input)
    })
}

The overflow seed: "9999999999999999999h" is specifically chosen because it's 19 digits — larger than math.MaxInt64. If the parser does hours, _ := strconv.ParseFloat(s, 64) followed by time.Duration(hours * float64(time.Hour)), the multiplication overflows. The fuzzer would generate similar inputs naturally, but seeding it accelerates discovery.

3. Byte Size Parser — The Config Boundary

The max_input_file_size config field accepts values like 256MB, 1GB, 512mb. A panic here means the tool can't start.

func FuzzParseByteSize(f *testing.F) {
    seeds := []string{
        "", "0", "1", "-1",
        "256MB", "1GB", "512mb", "1024",
        "100KB", "10TB", "999PB",
        "abc", "MB", "1.5GB",
        "9999999999999999999GB",
        string(make([]byte, 1024)),
    }
    for _, s := range seeds {
        f.Add(s)
    }

    f.Fuzz(func(t *testing.T, input string) {
        _, _ = ParseByteSize(input)
    })
}

Same pattern, different boundary. The function parses a number and multiplies by a suffix (KB = 1024, MB = 1048576, GB = 1073741824). Any of those multiplications can overflow. The fuzzer finds inputs that trigger integer wrapping.

4. CEL Predicate Compiler — The Expression Engine

Security controls define unsafe conditions as predicates:

unsafe_predicate:
  all:
    - field: properties.public
      op: eq
      value: true

These are compiled to CEL (Common Expression Language) expressions. A malformed predicate — empty field, unknown operator, deeply nested structure — could crash the compiler.

func FuzzCompile(f *testing.F) {
    f.Add("public", "eq", "true")
    f.Add("", "", "")
    f.Add("properties.encryption.algorithm", "ne", "aws:kms")
    f.Add("a.b.c.d.e", "contains", "x")
    f.Add("field", "missing", "true")
    f.Add("field", "INVALID_OP", "value")
    f.Add(string(make([]byte, 256)), "eq", string(make([]byte, 256)))

    compiler, err := NewCompiler()
    if err != nil {
        f.Fatalf("create compiler: %v", err)
    }

    f.Fuzz(func(t *testing.T, field, op, value string) {
        pred := policy.UnsafePredicate{
            All: []policy.PredicateRule{
                {
                    Field: predicate.NewFieldPath(field),
                    Op:    predicate.Operator(op),
                    Value: policy.NewOperand(value),
                },
            },
        }
        _, _ = compiler.Compile(pred)
    })
}

Multi-parameter fuzzing: This fuzz function takes three string parameters. The fuzzer mutates all three independently — generating combinations like field="" + op="eq" + value="true" that might expose edge cases in how the compiler handles missing fields.

Compiler reuse: The compiler is created once outside the fuzz function. This is intentional — the compiler caches compiled programs. Fuzzing with a shared compiler tests the cache behavior under random input sequences.

5. S3 Policy Evaluator — The Security Logic

This is the highest-value fuzz target. The AWS S3 policy evaluator parses bucket policy JSON and determines if the policy grants public access. A crash here means the security tool itself is vulnerable to a malformed policy document.

func FuzzEvaluate(f *testing.F) {
    seeds := []string{
        ``,
        `{`,
        `{}`,
        `{"Version":"2012-10-17","Statement":[]}`,
        `{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":"*","Action":"s3:GetObject","Resource":"arn:aws:s3:::example/*"}]}`,
        `{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Principal":"*","Action":"s3:*","Resource":"*"}]}`,
        `{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::*:root"},"Action":"s3:*","Resource":"*"}]}`,
    }
    for _, s := range seeds {
        f.Add(s)
    }

    eval := NewEvaluator(nil, s3resolver.NewResolver())

    f.Fuzz(func(_ *testing.T, input string) {
        doc, _ := Parse(input)
        eval.Evaluate(doc)
    })
}

Why this matters most: The evaluator processes untrusted data from AWS APIs. A bucket policy is a JSON document that any AWS account holder can write. If an attacker crafts a policy document that crashes the security evaluator, they've effectively disabled the security tool for their account. The fuzzer ensures the evaluator handles arbitrary JSON without panicking.

Seeds cover the spectrum: empty, truncated ({), valid empty policy, public read grant, deny-all, cross-account principal, conditional policies. The fuzzer mutates these into millions of variations.

6. Observation Loader — The Data Ingestion Boundary

Observation snapshots are JSON files produced by extractors. A malformed snapshot shouldn't crash the loader — it should produce a clear error.

func FuzzLoadSnapshotFromReader(f *testing.F) {
    seeds := []string{
        ``,
        `{}`,
        `[]`,
        `{`,
        `{"schema_version":"obs.v0.1","captured_at":"2026-01-01T00:00:00Z","assets":[{"id":"r-1","type":"storage_bucket","vendor":"aws","properties":{}}]}`,
        `{"schema_version":"obs.v0.1","captured_at":"2026-01-01T00:00:00Z","assets":[]}`,
        `{"schema_version":"obs.v0.1"}`,
    }
    for _, s := range seeds {
        f.Add(s)
    }

    loader := NewObservationLoader()

    f.Fuzz(func(_ *testing.T, input string) {
        loader.LoadSnapshotFromReader(context.Background(), strings.NewReader(input), "fuzz")
    })
}

The loader reuse pattern: Like the CEL compiler, the loader is created once. This tests that the loader handles sequential calls with arbitrary inputs without accumulating state that causes failures on subsequent calls.

7 & 8. Schema Validators — The Contract Enforcement Layer

Two fuzz targets for the JSON Schema validator: one for observation JSON, one for control YAML.

func FuzzValidateObservationJSON(f *testing.F) {
    seeds := []string{
        ``,
        `{}`,
        `[]`,
        `{`,
        `{"schema_version":"obs.v0.1","captured_at":"2026-01-01T00:00:00Z","assets":[]}`,
    }
    for _, s := range seeds {
        f.Add([]byte(s))
    }

    v := New()

    f.Fuzz(func(_ *testing.T, data []byte) {
        v.ValidateObservationJSON(data)
    })
}

func FuzzValidateControlYAML(f *testing.F) {
    seeds := []string{
        ``,
        `{}`,
        `dsl_version: ctrl.v1
id: CTL.TEST.001
description: test
severity: high
unsafe_predicate:
  all:
    - field: public
      op: eq
      value: true`,
        `not valid yaml: [`,
    }
    for _, s := range seeds {
        f.Add([]byte(s))
    }

    v := New()

    f.Fuzz(func(_ *testing.T, data []byte) {
        v.ValidateControlYAML(data)
    })
}

Byte slice inputs: These targets take []byte instead of string. The fuzzer can generate inputs with null bytes, invalid UTF-8, and binary data — things that string-based fuzzers miss. A YAML parser that assumes valid UTF-8 might panic on \xff\xfe sequences.

Running It: The Makefile Target

fuzz: sync-schemas sync-controls
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s ./internal/core/s3/policy/
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s ./internal/adapters/observations/
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s ./internal/contracts/validator/
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s ./internal/core/predicate/
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s -run=FuzzNewControlID ./internal/core/kernel/
    $(GOTEST) -fuzz=FuzzParseByteSize -fuzztime=30s ./internal/core/kernel/
    $(GOTEST) -fuzz=FuzzParseDuration -fuzztime=30s ./internal/core/kernel/
    $(GOTEST) -fuzz=Fuzz -fuzztime=30s ./internal/cel/

make fuzz runs all 8 targets for 30 seconds each — approximately 4 minutes total, testing millions of inputs. Each target runs sequentially because Go's fuzzer uses all available CPU cores per target.

Why 30 seconds per target: Diminishing returns. The fuzzer finds most crashes in the first few seconds (simple inputs). The remaining time explores deeper mutations. 30 seconds is enough for CI; overnight runs with -fuzztime=1h explore more thoroughly.

Why -run=FuzzNewControlID for kernel: The kernel package has three fuzz functions. Without -run, only the first alphabetically runs. Each target needs its own invocation.

The Seed Corpus Design

Every fuzz target includes carefully chosen seeds. The seeds aren't random — they represent the boundary conditions that the fuzzer uses as starting points for mutation:

Seed Type	Purpose	Example
Empty input	Tests nil/empty handling	`""`, `{}`, `[]`
Valid input	Establishes the baseline shape for mutations	`"CTL.S3.PUBLIC.001"`
Truncated valid	Tests partial parse handling	`"CTL."`, `{`, `{"schema_version":"obs.v0.1"}`
Wrong type/prefix	Tests rejection paths	`"INV.S3.PUBLIC.001"`, `"abc"`
Overflow values	Tests numeric boundary handling	`"9999999999999999999h"`
Large input	Tests buffer handling	`string(make([]byte, 1024))`
Negative values	Tests sign handling	`"-1h"`, `"-1"`
Edge-case format	Tests parser corner cases	`"1.5d"`, `"1d12h"`, `"0d"`

The fuzzer mutates these seeds by:

Flipping random bits
Inserting/deleting bytes at random positions
Concatenating seeds together
Replacing substrings with values from other seeds
Trying boundary values (0, -1, MAX_INT, etc.)

What Fuzz Tests Verify

Each fuzz target verifies one or more properties:

Property 1: "Must not panic"

The minimum bar. Every parser, compiler, and evaluator must handle arbitrary input without crashing:

f.Fuzz(func(t *testing.T, input string) {
    _, _ = ParseDuration(input)  // Return value ignored — we only care about panics
})

Property 2: "Round-trip consistency"

If the parser accepts an input, serializing and deserializing the result must produce the same value:

f.Fuzz(func(t *testing.T, input string) {
    id, err := NewControlID(input)
    if err == nil {
        data, _ := json.Marshal(id)
        var rt ControlID
        json.Unmarshal(data, &rt)
        if rt != id {
            t.Fatalf("round-trip mismatch")
        }
    }
})

Property 3: "Statelessness"

Calling the function with random inputs in sequence must not corrupt internal state. Tested implicitly by reusing the compiler or loader object across fuzz iterations.

Where to Fuzz in Your Codebase

Fuzz every function that:

Criterion	Why
Accepts user input (CLI flags, config values)	Users type unexpected things
Parses external data (JSON, YAML, policy docs)	External data is untrusted by definition
Performs string-to-type conversion	Edge cases in number parsing, encoding, escaping
Implements a compiler or evaluator	Combinatorial input space too large for hand-written tests
Validates against a schema	The validator itself must handle invalid schemas

Don't fuzz:

Pure computation on validated types (the input is already trusted)
Functions with no error paths (nothing can go wrong)
Functions that only call already-fuzzed lower-level functions

The goal is to fuzz at trust boundaries — the seams where untrusted data enters the system. Everything downstream of a trust boundary operates on validated types and doesn't need fuzzing.

These 8 fuzz targets protect the input boundaries of Stave, an offline configuration safety evaluator. make fuzz runs all targets in 4 minutes, testing millions of inputs against parsers, compilers, validators, and evaluators.

DEV Community