The Engine Under the Hood: Go’s GMP, Java’s Locks, and Erlang’s Heaps

#go #backend #erlang #softwareengineering

As backend engineers, we often treat "concurrency" as a black box. We type go func() or spawn() and expect magic. But understanding how the runtime schedules these tasks is what separates a Senior Engineer from an Architect.
This article dives into Go's GMP Scheduler, explains the "secret agent" sysmon thread, and contrasts this model with Java’s Shared Memory headaches and Erlang’s Gold Standard isolation.

Go’s GMP Architecture: The "Muxing" Strategy

In traditional threading (like Java Pre-Loom), 1 Thread = 1 OS Thread. This is heavy (~1MB RAM per thread). You cannot spawn 100,000 of them without crashing.

Go solves this with the GMP Model, which multiplexes millions of Goroutines (user-space threads) onto a small number of Kernel Threads.

The Components

G (Goroutine): This is your code. It is lightweight (starts at 2KB stack). It contains the instruction pointer (PC) and stack. It wants to run.
M (Machine): This is the OS Thread. It is expensive. The OS kernel manages this. It is the actual "worker" that executes CPU instructions.
P (Processor): This is a purely logical resource (a "token"). It represents the context required to run Go code (local run queue, memory cache). - The Rule: An M must hold a P to execute a G. - P = Logical Cores: By default, the number of Ps is set to GOMAXPROCS (usually your CPU core count). This limits parallelism (simultaneous execution) while allowing infinite concurrency (managing overlapping tasks).

The Lifecycle: When is a G vs. M Created?

When is a G created? Whenever you call go func(). It is created in User Space inside the Go runtime. It is cheap (~2KB) and goes into the Local Run Queue of the current P.
When is an M created? The runtime tries to keep M count low. However, it spawns a new M (OS Thread) when:

A Goroutine makes a Blocking System Call (like CGO or complex File I/O) that cannot be handled asynchronously.
The current M gets "stuck" inside the OS kernel.
The runtime sees other Ps waiting to run Gs but has no M to serve them. This is expensive (~1-2MB).

The Watcher: sysmon and the SIGURG Signal

This is the most misunderstood part of Go. How does the scheduler stop a Goroutine that has been running for too long (e.g., a for {} loop)?

Enter sysmon (System Monitor).

What is sysmon?

It is a special runtime thread that breaks the GMP rules:

It runs without a P (no Processor token needed).
It runs on a dedicated M.
It wakes up periodically (20µs – 10ms).

The Mechanism: Asynchronous Preemption via SIGURG

Since Go 1.14, Go uses signals to force "work stealing" and fairness.

The Trigger: sysmon scans all Ps. It sees that Goroutine A has been running on Processor 1 for more than 10ms.
The Signal: sysmon sends a SIGURG (Urgent Signal) to the Thread (M) running that Goroutine.
Why SIGURG?
1. Out-of-Band: It is designed for "Urgent Socket Data," which modern apps rarely use, so it doesn't conflict with user signals.
2. Non-Destructive: Unlike SIGINT (Ctrl+C), it doesn't kill the process.
3. Libc Safe: It doesn't interfere with C libraries mixed into Go (CGO).
The Interruption: The OS interrupts the M. Go's signal handler injects a call to asyncPreempt into the Goroutine's stack.
The Yield: The Goroutine pauses, is moved to the Global Run Queue, and the P picks a new G to run.

Java: The Failure of "Shared Memory"

The Model: "Communicate by Sharing Memory." All threads share the Same Heap. To pass data, they modify the same object.

The Failure Mode

Look at the code below

// Java: Explicit Locking (The Bottleneck)
class Counter {
    private int count = 0;

    // "synchronized" forces the OS to pause other threads (Context Switch)
    public synchronized void increment() {
        count++; 
    }
}

Race Conditions: If you forget synchronized, two threads write at once, and data is corrupted.
Performance: Locks require OS intervention. This is slow (thousands of cycles).
Deadlocks: Thread A holds Lock 1 waiting for Lock 2. Thread B holds Lock 2 waiting for Lock 1. The app freezes.

Erlang: The Gold Standard (Per-Process Heaps)

I have worked with erlang for over 3 years and I am convinced that it is one of the best languages that support concurrency out of the box, but suffers in cases where there is a lot of number crunching and loops, even though many of the library functions are written in C and made available to erlang users through nifs but with a caveat that unsafe NIFs can block schedulers and break Erlang’s isolation guarantees.

The Model: "Share Memory by Communicating." Every process has its Own Private Heap.

Why Erlang is "Better" (A Bank Example)

In Go, stacks are isolated, but heap pressure and GC are global, so runaway allocations can still impact tail latency. In Erlang, it is isolated.

Look at the code snippet below.

-module(bank_server).
-behaviour(gen_server).
%% ... exports ...

%% 1. The Safe Bank Process
init([]) -> {ok, 100}. %% Balance is $100

%% 2. The Dangerous Crash Process
trigger_crash() ->
    spawn(fun() -> 
        %% A. This allocates 1GB on a PRIVATE heap
        CrashList = lists:seq(1, 100000000), 
        %% B. Crashes immediately
        1 / 0 
    end).

The Sequence:

Allocation: The spawned process allocates 1GB. In Java/Go, this would fill the Global Heap and trigger a "Stop-The-World" Garbage Collection (GC) for everyone.
The Crash: The process dies (divide by zero).
The Cleanup: The Erlang VM simply deletes the pointer to that private heap.
- Zero GC Cost: No need to scan memory.
- Zero Impact: The bank_server (holding the $100) continues running with microsecond latency. It didn't even feel the crash.

Final takeaway:

Java's shared-memory model places a heavier correctness burden on engineers, making large-scale concurrency harder to reason about.
Erlang is the reliability king because Private Heaps prevent "noisy neighbors" from killing the system.
Go is the pragmatic middle ground: It uses Shared Heaps for raw speed (no copying data) but uses CSP (Channels) to avoid the complexity of locks.