DEV Community

James Lee
James Lee

Posted on

Go Garbage Collection: Tri-Color Mark & Sweep, Write Barriers & STW Optimization

Garbage collection is one of Go's most misunderstood subsystems. It's often blamed for latency spikes, but Go's GC has evolved dramatically — from a full stop-the-world pause in Go 1.3 to a highly concurrent, low-latency collector today. Let's trace that evolution and understand exactly how it works.


1. GC Algorithm Landscape

Before diving into Go's implementation, here's a quick map of common GC strategies:

Algorithm How it works Pros Cons
Reference Counting Each object tracks how many references point to it; free when count = 0 Immediate reclaim, no STW Cannot handle cyclic references
Mark & Sweep Trace from roots; mark reachable objects; sweep unmarked Handles cycles Requires STW during mark phase
Generational Split objects by lifetime (young/old gen); collect young gen frequently High throughput Complex implementation

Go uses a concurrent tri-color mark & sweep — solving the STW problem of classic mark-sweep while avoiding the complexity of generational collection.


2. Classic Mark & Sweep (Go ≤ 1.3)

The original Go GC was a straightforward two-phase algorithm:

Phase 1: MARK
─────────────────────────────────────────
STW begins — all goroutines paused  🛑
Traverse from root objects
Mark all reachable objects
STW ends  ✅

Phase 2: SWEEP
─────────────────────────────────────────
Reclaim all unmarked (unreachable) objects
Enter fullscreen mode Exit fullscreen mode

Example:

Heap objects: 1, 2, 3, 4, 5, 6, 7
Root references: → 1 → 2 → 4

After mark:   [1✓] [2✓] [3 ] [4✓] [5 ] [6 ] [7 ]
After sweep:  [1✓] [2✓]  ✗   [4✓]  ✗    ✗    ✗
                               freed: 3, 5, 6, 7
Enter fullscreen mode Exit fullscreen mode

The problem: The entire program freezes during the mark phase. For a large heap, this pause could be hundreds of milliseconds — unacceptable for web services.


3. Tri-Color Concurrent Mark & Sweep (Go ≥ 1.5)

Go 1.5 introduced the tri-color marking algorithm, which allows the GC to run concurrently with the application. "Tri-color" is an abstraction — objects don't literally have colors, but they exist in one of three states:

Color Meaning
White Not yet visited — candidate for collection
🔘 Grey Discovered, queued for processing — children not yet scanned
Black Fully scanned — all children visited, will NOT be collected

The Algorithm

Step 1: STW (very brief) — enable write barrier, snapshot roots
─────────────────────────────────────────────────────────────────
All objects start WHITE.
Root objects → turn GREY (added to mark queue).
STW ends. GC and application run concurrently.

Step 2: Process grey objects
─────────────────────────────────────────────────────────────────
For each GREY object:
    → turn all its WHITE children GREY
    → turn itself BLACK

Step 3: Repeat until no grey objects remain
─────────────────────────────────────────────────────────────────
All reachable objects are now BLACK.
All unreachable objects remain WHITE.

Step 4: STW (very brief) — disable write barrier, finalize
─────────────────────────────────────────────────────────────────

Step 5: Sweep — reclaim all WHITE objects
─────────────────────────────────────────────────────────────────

Step 6: Reset — turn all BLACK objects back to WHITE for next cycle
Enter fullscreen mode Exit fullscreen mode

Visual Walkthrough

Initial state: All objects white. Reference graph:

root → A → B
       A → C
       A ↔ D   (mutual reference)
root → F
E          (unreachable)
G → H      (unreachable)
Enter fullscreen mode Exit fullscreen mode
All white:
[ A ][ B ][ C ][ D ][ E ][ F ][ G ][ H ]
Enter fullscreen mode Exit fullscreen mode

Step 1 — Root scan: A and F turn grey

[🔘A][ B ][ C ][ D ][ E ][🔘F][ G ][ H ]
Enter fullscreen mode Exit fullscreen mode

Step 2 — Process grey A: children B, C, D turn grey; A turns black. F has no children: turns black.

[⬛A][🔘B][🔘C][🔘D][ E ][⬛F][ G ][ H ]
Enter fullscreen mode Exit fullscreen mode

Step 3 — Process grey B, C, D: no white children; all turn black.

[⬛A][⬛B][⬛C][⬛D][ E ][⬛F][ G ][ H ]
Enter fullscreen mode Exit fullscreen mode

Step 4 — No grey objects remain. White objects = garbage.

Unreachable (white): E, G, H  → collected ♻️
Reachable  (black): A, B, C, D, F  → retained ✅
Enter fullscreen mode Exit fullscreen mode

Step 5 — Reset: all black → white for next GC cycle.


4. The STW Problem & Write Barriers

Why STW is still needed (briefly)

Even with concurrent marking, there's a race condition: while the GC is marking, the application can:

  • Create new objects (new white objects that should survive)
  • Modify pointers (a black object gains a reference to a white object — which would be incorrectly collected)

This is the tri-color invariant violation: a black object must never directly reference a white object without the GC knowing.

Write Barrier: The Solution

A write barrier is a small hook injected by the compiler around every pointer write. When enabled (during GC's mark phase), it intercepts pointer assignments and ensures the invariant is maintained.

Without write barrier:
─────────────────────────────────────────
Black object A gets a new pointer to White object X
GC never sees X → X gets collected → dangling pointer  💥

With write barrier:
─────────────────────────────────────────
Black object A gets a new pointer to White object X
Write barrier fires → X is immediately greyed
GC will scan X → X survives  ✅
Enter fullscreen mode Exit fullscreen mode

Write barrier lifecycle:

GC start (STW) → write barrier ON
    ↓
Concurrent mark phase (app + GC run together)
All pointer writes are intercepted
    ↓
Mark complete (STW) → write barrier OFF
    ↓
Sweep phase (concurrent, no barrier needed)
Enter fullscreen mode Exit fullscreen mode

New allocations during GC are immediately marked black — they are not subject to the write barrier, and will not be collected in the current cycle.


5. Mutator Assist: Keeping Allocation in Check

If the application allocates memory faster than the GC can collect it, the heap will grow unboundedly. Go's solution: Mutator Assist.

Goroutine wants to allocate memory during GC
    ↓
Runtime checks: is GC falling behind?
    ├── NO  → allocate normally
    └── YES → goroutine must do GC work proportional
              to the amount it wants to allocate
              (mark some objects before allocating)
Enter fullscreen mode Exit fullscreen mode

This creates back-pressure: heavy allocators are slowed down, giving the GC time to catch up. It's a self-regulating mechanism that prevents heap explosion without a hard pause.


6. GC Trigger Conditions

Go's GC can be triggered three ways:

Trigger Condition Notes
Heap growth current heap ≥ last_GC_heap × (1 + GOGC/100) Default GOGC=100 → GC when heap doubles
Time-based No GC for > 2 minutes Ensures GC runs even in low-allocation programs
Manual runtime.GC() called explicitly For benchmarks and profiling only

GOGC Tuning

GOGC=100   # default: GC when heap doubles (balanced)
GOGC=200   # GC less frequently → higher throughput, larger heap
GOGC=50    # GC more frequently → lower latency, smaller heap
GOGC=off   # disable GC entirely (dangerous — only for short-lived tools)
Enter fullscreen mode Exit fullscreen mode

Formula:

GC threshold = heap_size_after_last_GC × (1 + GOGC/100)

Example (GOGC=100):
Last GC left 100MB live → next GC triggers at 200MB
Enter fullscreen mode Exit fullscreen mode

7. Go GC Evolution Timeline

Go 1.3  ── Full STW mark + sweep
            Pause: hundreds of ms for large heaps

Go 1.5  ── Tri-color concurrent mark & sweep
            STW reduced to ~10ms

Go 1.6  ── Better write barrier, STW ~5ms

Go 1.8  ── Hybrid write barrier (Dijkstra + Yuasa)
            STW reduced to ~100µs

Go 1.14 ── Async preemption → goroutines can be
            stopped anywhere for STW, not just at
            safe points → shorter, more predictable pauses

Go 1.18 ── Soft memory limit (GOMEMLIMIT)
            More predictable GC under memory pressure
Enter fullscreen mode Exit fullscreen mode

8. Summary

┌──────────────────────────────────────────────────────────────┐
│              Go GC: The Full Picture                         │
│                                                              │
│  Trigger: heap doubles / 2min timeout / runtime.GC()        │
│      ↓                                                       │
│  STW #1 (µs): enable write barrier, grey root objects       │
│      ↓                                                       │
│  Concurrent mark: GC + app run in parallel                  │
│  Write barrier protects tri-color invariant                 │
│  Mutator Assist throttles fast allocators                   │
│      ↓                                                       │
│  STW #2 (µs): disable write barrier, finalize mark          │
│      ↓                                                       │
│  Concurrent sweep: reclaim white objects                    │
│      ↓                                                       │
│  Reset: all black → white for next cycle                    │
└──────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
Component Role
Tri-color marking Concurrent reachability tracing without full STW
Write barrier Maintains tri-color invariant during concurrent mark
Mutator Assist Back-pressure on fast allocators to prevent heap explosion
GOGC Controls heap growth rate before triggering GC
STW Two brief pauses (µs-level) to enable/disable write barrier

Next: Java vs Go GC: Generational vs Tri-Color, G1 vs Concurrent Mark & Sweep (Bonus)


Follow the series for more deep dives into Go's runtime internals.

Top comments (0)