Sub-tests Done Right: t.Run, t.Parallel, and the Cleanup-Order Trap

#go #testing #concurrency #backend

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A backend team I talked to last quarter had a flaky integration
suite that passed locally and failed once a week in CI. The
failure was always the same: a Postgres test row that should not
exist still existed when a later test queried for it. They added
sleeps. They added retries. They added a "skip flaky" tag. The
test stayed flaky for four months.

The bug was four lines of test setup. A parent test opened a
shared transaction, registered t.Cleanup to roll it back, then
spawned three sub-tests with t.Parallel(). Sometimes the
rollback ran while a parallel child was still mid-INSERT. Race
detector said nothing. go test -v said the parent passed. The
data said otherwise.

t.Run, t.Parallel, and t.Cleanup are three of the APIs every
Go test file uses. They also interact in ways the docs explain in
one sentence each, scattered across three sections.

Rule 1: t.Cleanup runs LIFO, not FIFO

t.Cleanup(fn) registers fn to run when the test (and all of
its sub-tests) finish. Multiple calls register multiple cleanups.
They run in reverse order of registration. Last registered,
first executed. Same shape as defer.

func TestCleanupOrder(t *testing.T) {
    t.Cleanup(func() { fmt.Println("1: registered first") })
    t.Cleanup(func() { fmt.Println("2: registered second") })
    t.Cleanup(func() { fmt.Println("3: registered third") })
}
// Output:
// 3: registered third
// 2: registered second
// 1: registered first

This matters when one resource depends on another. Open the DB
connection first, then start a transaction on it. Register the
connection-close cleanup first, then the transaction-rollback
cleanup. LIFO means the transaction rolls back before the
connection closes, which is the order you need.

func setupDB(t *testing.T) *sql.Tx {
    t.Helper()
    db, err := sql.Open("postgres", testDSN)
    if err != nil { t.Fatal(err) }
    t.Cleanup(func() { _ = db.Close() })  // registered 1st

    tx, err := db.BeginTx(context.Background(), nil)
    if err != nil { t.Fatal(err) }
    t.Cleanup(func() { _ = tx.Rollback() })  // registered 2nd

    return tx
}
// At test end: Rollback runs (LIFO), then Close. Correct order.

Get the order wrong and you close the DB out from under the
in-flight rollback. Most drivers swallow the error. The test
still passes. The data state is wrong.

Rule 2: t.Parallel defers parent cleanup until ALL parallel children finish

This is the rule the team I mentioned tripped on. From the
testing package docs:

A subtest is run as a goroutine, separate from the test that
spawned it. [...] When a parent test [...] calls Cleanup, the
cleanup is run after all of the parent's subtests complete,
including those that are still running because they called
Parallel.

Read that twice. The parent's Cleanup waits for parallel
children. That sounds safe. The trap is more subtle: the parent
test's body does not wait for parallel children. Only the
cleanup does. So this code is wrong:

func TestSharedTx_BAD(t *testing.T) {
    tx := beginTx(t)             // shared transaction
    defer tx.Rollback()          // <-- BAD: runs when body returns

    t.Run("insert_a", func(t *testing.T) {
        t.Parallel()
        insertA(t, tx)
    })
    t.Run("insert_b", func(t *testing.T) {
        t.Parallel()
        insertB(t, tx)
    })
    // Body returns here. defer fires. tx rolls back.
    // Parallel children are STILL RUNNING against a dead tx.
}

The parent body returns the moment the last t.Run call
returns. With t.Parallel() inside the children, those t.Run
calls return almost immediately — before the child code runs.
The defer tx.Rollback() fires, then the children try to
INSERT into a rolled-back transaction. Sometimes that surfaces.
Sometimes Postgres returns a generic error and the child test
records "transaction aborted" without much context.

The fix is t.Cleanup, not defer:

func TestSharedTx_GOOD(t *testing.T) {
    tx := beginTx(t)
    t.Cleanup(func() { _ = tx.Rollback() })  // waits for children

    t.Run("insert_a", func(t *testing.T) {
        t.Parallel()
        insertA(t, tx)
    })
    t.Run("insert_b", func(t *testing.T) {
        t.Parallel()
        insertB(t, tx)
    })
}

t.Cleanup is parallel-aware. defer is not. If your test does
anything with t.Parallel() and shared resources, defer is the
wrong tool.

There's a second variant of this bug. Sub-test setup that runs
after the child calls t.Parallel() is racing with the other
parallel siblings:

t.Run("a", func(t *testing.T) {
    t.Parallel()
    row := freshRow(t)   // runs concurrently with sibling setups
    insertA(t, row)
})

If freshRow mutates a shared fixture, you have a data race
between the children. The race detector will catch it under load
but most teams' test fixtures don't run hot enough in CI to flip
the interleaving. Move shared-fixture setup above t.Parallel()
or out of the children entirely.

Rule 3: shared parent setup needs a synchronization point

You can structure a parallel-children test so the parent does
expensive setup once, then all children read from it. The shape
is well-known but easy to get wrong:

func TestUserAPI(t *testing.T) {
    // Parent setup. Runs synchronously before any t.Run.
    db := setupDB(t)               // registers Cleanup for db
    seed := seedTestData(t, db)    // registers Cleanup for rows

    cases := []struct {
        name string
        userID int
        want string
    }{
        {"existing_user", seed.AliceID, "alice"},
        {"missing_user", 99999, ""},
        {"deleted_user", seed.DeletedID, ""},
    }
    for _, tc := range cases {
        t.Run(tc.name, func(t *testing.T) {
            t.Parallel()
            got := lookup(t, db, tc.userID)
            if got != tc.want {
                t.Errorf("got %q, want %q", got, tc.want)
            }
        })
    }
}

The parent sets up the DB and seed, registers cleanup, then
spawns parallel children that read from the shared db. The
parent body returns after the loop. Cleanup runs only after all
parallel children finish (Rule 2). LIFO means seed-cleanup runs
before db-close (Rule 1). The failure modes are below.

Two failure modes here. First, if setupDB returns before the DB
is actually ready (a docker-compose Postgres still booting), the
children all hit a half-up DB. Always block on a real readiness
check inside the parent. Second, if any child writes to the
shared DB, parallel becomes wrong by definition. Either don't
parallelize writers, or give each child its own scope (a fresh
schema, a fresh transaction with tx := db.BeginTx(...) registered
per-child, a fresh container).

The table-test loop-capture trap (and what Go 1.22 fixed)

Through Go 1.21, this code was a classic interview question:

// Pre-Go 1.22: BUG
for _, tc := range cases {
    t.Run(tc.name, func(t *testing.T) {
        t.Parallel()
        check(t, tc.input)  // tc captured by reference
    })
}

Loop variables tc were single variables reused across
iterations. The closure captured the address. By the time the
parallel children ran, the loop was done and tc held the last
case. Every parallel child tested the last case. The fix was the
unfortunate tc := tc shadow line:

// Pre-Go 1.22: FIX
for _, tc := range cases {
    tc := tc  // shadow with a per-iteration variable
    t.Run(tc.name, func(t *testing.T) {
        t.Parallel()
        check(t, tc.input)
    })
}

Go 1.22's loop variable scoping change
made every iteration get its own tc. The shadow line is no
longer needed in modules with go 1.22 or later in go.mod.

// Go 1.22+: fine without the shadow line
for _, tc := range cases {
    t.Run(tc.name, func(t *testing.T) {
        t.Parallel()
        check(t, tc.input)
    })
}

Two operational notes. One: the change is gated on the go
directive in go.mod, not the toolchain you build with. A
module pinned to go 1.21 still has the old behavior even when
compiled with Go 1.22+. Two: linters like paralleltest and
tparallel still flag the missing shadow because they don't
read the go directive. Bump the directive; updating the linter
config is the easier follow-up.

Cheat sheet you can paste into a code review comment

t.Cleanup runs LIFO. Register in dependency order: outer resource first, inner resource second.
For anything involving t.Parallel, use t.Cleanup instead of defer. defer fires when the body returns; t.Cleanup waits for parallel children.
Shared parent state for parallel children must be ready before t.Run is called. Block on real readiness inside the parent.
Parallel writers to a shared resource is almost always wrong. Give each child its own scope.
On go 1.22+, the loop-capture shadow is unnecessary. On earlier versions, it is mandatory.
Run the suite once with go test -race -count=10 -shuffle=on. Most cleanup-order bugs surface within a few iterations.

If this was useful

The Go testing package rewards reading end-to-end. The same shape
shows up in httptest fixtures, t.TempDir, and the new
testing/synctest (Go 1.24+). All three lean on the same
parent/child cleanup model.

I wrote two books on Go that go into this kind of detail without
the magazine-length intros: The Complete Guide to Go Programming
covers the language and standard library, and Hexagonal
Architecture in Go shows how to lay out a service so the test
seams land where you actually want them. Together they're
Thinking in Go, a 2-book series.