Threads in Languages like c++ and JAVA
In the above mentioned languages, threads are a means of concurrency that takes a lot of cpu time in context switching and takes a relatively huge memory at the time of their creation. A single thread takes ~ 1 MB. Therefore, if you were to spawn 100,000 threads, you would need about 100 GB of RAM, which is not economically feasible for most of software projects. Also for maintaining concurrency, CPU generally uses timeslicing, to give equal number of cpu cycle to each thread. But while doing this, the CPU have to perform context switching. This whole thing is quite expensive, in terms of time because of saving current thread details in TCB(Process Control Block), loading the new thread TCB in memory and context switches destroy cache locality, causing frequent L1/L2 cache misses.
As a result of this, when you have thousands of threads, your CPU spends more time switching context than actually executing code.
How goroutines optimize this?
Goroutines are "lightweight threads" managed entirely in User Space by the Go Runtime, rather than the OS Kernel.
The first massive optimization is memory. While a standard OS thread reserves a fixed 1 MB stack, a Goroutine initializes with a stack of just 2 KB.
- The Math: 2 KB is roughly 0.2% of 1 MB.
- The Impact: Instead of capping out at thousands of threads, you can easily spawn millions of Goroutines on a standard laptop without running out of RAM.
The "Infinite" Stack
Unlike OS threads, which typically have a fixed stack size (e.g., 1 MB) determined at creation, Goroutines are dynamic. They start at 2 KB and grow automatically as needed.
If a Goroutine runs out of space, the Go runtime allocates a larger segment of memory (usually double) and moves the stack there.
- OS Thread Limit: Fixed (~1-8 MB). Hitting this causes a crash.
- Goroutine Limit: Dynamic (up to 1 GB on 64-bit systems).
This means for all practical purposes, Goroutine recursion depth is limited only by available memory, while OS threads are limited by their initial reservation.
Faster Context Switches
Just like OS threads, Goroutines need to save their state when paused so they can resume later.
However, while an OS thread switch requires saving all CPU registers (including heavy floating-point registers) and trapping into Kernel Mode, a Goroutine switch is much cheaper.
- OS Thread Switch: ~1-2 microseconds. Saves huge state (AVX/SSE registers) to the TCB.
- Goroutine Switch: ~200 nanoseconds (~10x faster). Saves only 3 registers (PC, SP, DX) to a simple Go struct called g.
Because this happens entirely in User Space, the CPU stays hot, caches stay valid, and the overhead is negligible.
So how does goroutine allocate stack size dynamically?
To achieve this, the Go compiler uses a technique called the Function Prologue.
During compilation, the compiler inserts a few assembly instructions at the very start of every function.
- The Check: These instructions compare the current Stack Pointer (SP) against a limit called the Stack Guard.
- The Trigger: If there isn't enough space for the function to run, it triggers a runtime function called runtime.morestack.
- The Growth: The runtime allocates a new, larger stack segment (usually 2x the size).
- The Copy & Fix: It copies the user's data to the new stack. Crucially, it also adjusts all pointers to ensure they point to the new addresses.
Once this "surgery" is complete, the function resumes execution on the new, spacious stack.
func main() {
fmt.Println("Hello Ayush")
}
Above is a sample go code.
Now, when we run the command
go build -gcflags -S main.go
Summary
You will see a part
main.main STEXT size=83 args=0x0 locals=0x40 funcid=0x0 align=0x0
0x0000 00000 (/Users/ayushanand/concurrency/main.go:7) TEXT main.main(SB), ABIInternal, $64-0
0x0000 00000 (/Users/ayushanand/concurrency/main.go:7) CMPQ SP, 16(R14)
0x0004 00004 (/Users/ayushanand/concurrency/main.go:7) PCDATA $0, $-2
0x0004 00004 (/Users/ayushanand/concurrency/main.go:7) JLS 76
0x0006 00006 (/Users/ayushanand/concurrency/main.go:7) PCDATA $0, $-1
0x0006 00006 (/Users/ayushanand/concurrency/main.go:7) PUSHQ BP
0x0007 00007 (/Users/ayushanand/concurrency/main.go:7) MOVQ SP, BP
0x000a 00010 (/Users/ayushanand/concurrency/main.go:7) SUBQ $56, SP
0x000e 00014 (/Users/ayushanand/concurrency/main.go:7) FUNCDATA $0, gclocals·g5+hNtRBP6YXNjfog7aZjQ==(SB)
0x000e 00014 (/Users/ayushanand/concurrency/main.go:7) FUNCDATA $1, gclocals·EVwPOTmEGNnKe4zqm0ZbFQ==(SB)
0x000e 00014 (/Users/ayushanand/concurrency/main.go:7) FUNCDATA $2, main.main.stkobj(SB)
0x000e 00014 (/Users/ayushanand/concurrency/main.go:8) LEAQ type:string(SB), DX
0x0015 00021 (/Users/ayushanand/concurrency/main.go:8) MOVQ DX, main..autotmp_8+40(SP)
0x001a 00026 (/Users/ayushanand/concurrency/main.go:8) LEAQ main..stmp_0(SB), DX
0x0021 00033 (/Users/ayushanand/concurrency/main.go:8) MOVQ DX, main..autotmp_8+48(SP)
0x0026 00038 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ os.Stdout(SB), BX
0x002d 00045 (<unknown line number>) NOP
0x002d 00045 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ go:itab.*os.File,io.Writer(SB), AX
0x0034 00052 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) LEAQ main..autotmp_8+40(SP), CX
0x0039 00057 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVL $1, DI
0x003e 00062 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) MOVQ DI, SI
0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) PCDATA $1, $0
0x0041 00065 (/usr/local/Cellar/go/1.25.4/libexec/src/fmt/print.go:314) CALL fmt.Fprintln(SB)
0x0046 00070 (/Users/ayushanand/concurrency/main.go:9) ADDQ $56, SP
0x004a 00074 (/Users/ayushanand/concurrency/main.go:9) POPQ BP
0x004b 00075 (/Users/ayushanand/concurrency/main.go:9) RET
0x004c 00076 (/Users/ayushanand/concurrency/main.go:9) NOP
0x004c 00076 (/Users/ayushanand/concurrency/main.go:7) PCDATA $1, $-1
0x004c 00076 (/Users/ayushanand/concurrency/main.go:7) PCDATA $0, $-2
0x004c 00076 (/Users/ayushanand/concurrency/main.go:7) CALL runtime.morestack_noctxt(SB)
0x0051 00081 (/Users/ayushanand/concurrency/main.go:7) PCDATA $0, $-1
0x0051 00081 (/Users/ayushanand/concurrency/main.go:7) JMP
You can see the assembly code that checks for stack size.
Ending Notes:
Goroutines aren't just "threads but smaller." They are a fundamental rethink of how we manage concurrency. By moving the stack management from the OS Kernel to the Go Runtime, we gain:
- Massive Scalability: From 100k limit to millions.
- Dynamic Memory: Pay for what you use (2KB), not what you might use (1MB).
- Low Latency: Context switches that are 10x faster.
Next time you type go func(), remember: there is a tiny 2KB stack and a smart compiler working in the background to make it "infinite."



Top comments (0)