The Go Memory Model, Why Your Concurrent Code Might Be Lying to

#go #backend #softwaredevelopment #api

You write two goroutines. One sets a variable, the other reads it. You run it a thousand times and it works fine. Then it breaks in production, on a different machine, under load. You stare at the code and nothing looks wrong.

It is not fine. You just got hit by the memory model.

The Problem Is Not Timing, It Is Visibility

Most developers think about concurrency bugs in terms of two operations colliding at the same moment. That framing is useful but it misses something deeper. The real question is not just when something happens. It is whether one goroutine can see what another goroutine did at all.

CPUs and compilers do not execute your code top to bottom. They reorder instructions, cache values in registers, and delay writes to main memory for performance reasons. On a single thread this is invisible because everything is consistent from your perspective. Across multiple goroutines, that consistency breaks down.

Go does not promise that goroutine A will see the writes made by goroutine B unless you establish a specific relationship between them. That relationship has a formal name: happens-before.

What Happens-Before Actually Means

Happens-before is a partial ordering of events in a concurrent program. If event A happens-before event B, then B is guaranteed to observe everything A did. If there is no happens-before relationship between A and B, the memory model makes no guarantees at all. B might see A's writes, or it might not. Both outcomes are valid from the spec's perspective.

This is the part that surprises people. The following code has a data race:

var ready bool
var data int

func main() {
    go func() {
        data = 42
        ready = true
    }()

    for !ready {
        runtime.Gosched()
    }

    fmt.Println(data)
}

Intuitively it looks safe. You wait until ready is true, then you read data. But there is no happens-before between the goroutine's writes and the main goroutine's reads. The compiler is allowed to reorder data = 42 and ready = true. The CPU is allowed to flush them to memory in any order. You might read ready == true and still see data == 0.

The Go race detector will catch this. Your eyes will not.

What Actually Establishes Happens-Before in Go

The Go memory model (updated formally in 2022) defines a specific set of synchronization operations that create happens-before edges. These are the ones you will use constantly.

Channel sends and receives

A send on a channel happens-before the corresponding receive from that channel completes. This is the most idiomatic way to synchronize in Go:

var data int
ch := make(chan struct{})

go func() {
    data = 42
    ch <- struct{}{} // send happens-before receive
}()

<-ch
fmt.Println(data) // guaranteed to see 42

For buffered channels, the rule is slightly different. The k*th receive from a channel with capacity *C happens-before the *k+C*th send completes. This is what makes buffered channels work as semaphores.

sync.Mutex

For a given sync.Mutex or sync.RWMutex variable, the *n*th call to Unlock happens-before the *n+1*th call to Lock returns. In plain terms: whatever you did inside a locked section is visible to whoever acquires the lock next.

var mu sync.Mutex
var data int

go func() {
    mu.Lock()
    data = 42
    mu.Unlock() // this unlock happens-before the next Lock
}()

mu.Lock()
fmt.Println(data) // safe
mu.Unlock()

sync.Once

The completion of the first call to f() inside once.Do(f) happens-before any other call to once.Do returns. This is why sync.Once is the standard pattern for safe lazy initialization. The guarantee is built into the type.

sync/atomic

The atomic package provides sequentially consistent operations. If you use atomic.StoreInt64 and atomic.LoadInt64, the store happens-before the load if the load observes the stored value. This makes atomics the right tool for simple flags and counters, but not for protecting compound state changes.

Goroutine creation

The go statement that starts a goroutine happens-before the goroutine itself begins executing. This means any writes done before the go statement are visible inside the new goroutine. However, the goroutine's completion does not happen-before anything in the parent unless you explicitly synchronize.

data := 42
go func() {
    fmt.Println(data) // safe, goroutine creation establishes happens-before
}()

The sync.WaitGroup Trap

sync.WaitGroup is correct but people frequently use it in ways that create subtle races. The rule is: wg.Done() happens-before wg.Wait() returns. That guarantee covers the goroutine's work up to the Done call. It does not cover anything that happens after.

var wg sync.WaitGroup
results := make([]int, 5)

for i := 0; i < 5; i++ {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        results[i] = i * 2 // writing to separate indices is safe
    }(i)
}

wg.Wait()
fmt.Println(results) // safe to read here

This is fine because each goroutine writes to a distinct index and Wait returns only after all Done calls. But if you wrote to the same index from multiple goroutines, or read results inside another goroutine without additional synchronization, you would have a race.

When Atomic Is Not Enough

A common mistake is reaching for sync/atomic to protect state that involves more than one variable. Atomics give you per-operation guarantees. They do not give you transactional guarantees across multiple variables.

// dangerous
var count int64
var sum int64

go func() {
    atomic.AddInt64(&count, 1)
    atomic.AddInt64(&sum, value)
}()

go func() {
    c := atomic.LoadInt64(&count)
    s := atomic.LoadInt64(&sum)
    avg := s / c // s and c might not correspond to the same state
}()

Even though each individual load and store is atomic, there is no guarantee that c and s were read from a consistent snapshot. Between the two loads, another goroutine could have updated sum but not yet count, or vice versa. For this kind of compound state, use a mutex.

The Practical Mental Model

Here is a simple way to think about it when reviewing concurrent code. Ask two questions for every shared variable.

First: is there a write to this variable that could happen concurrently with a read or another write? If yes, you have a potential race. Then ask: is there a synchronization operation between the write and the read that creates a happens-before edge? If the answer is no, the code is wrong regardless of how many times it has worked in testing.

The race detector catches most of these at runtime, but only if the racy code path is actually exercised during the test run. The memory model is the tool for reasoning about whether a path is safe in the first place.