ANKUSH CHOUDHARY JOHAL

Posted on May 5 • Originally published at johal.in

How to Implement Concurrency in Go 1.24 with Goroutines and Channels for 100k+ Concurrent Users

#implement #concurrency #goroutines #channels

How to Implement Concurrency in Go 1.24 with Goroutines and Channels for 100k+ Concurrent Users

Go’s built-in concurrency primitives—goroutines and channels—have long been the gold standard for building high-throughput, concurrent systems. With Go 1.24’s runtime optimizations for goroutine scheduling and reduced channel overhead, scaling to 100k+ concurrent users is more accessible than ever. This guide walks through practical implementation steps, best practices, and pitfalls to avoid when building high-concurrency Go applications.

Prerequisites

Go 1.24 or later installed locally
Basic understanding of Go syntax and functions
Familiarity with HTTP server basics (we’ll use net/http for examples)

Understanding Goroutines and Channels

Goroutines are lightweight, user-space threads managed by the Go runtime. They consume ~2KB of initial stack space (adjustable in Go 1.24 via runtime.SetMaxStack for edge cases), making it feasible to spawn 100k+ goroutines without exhausting system resources. Channels are typed conduits for goroutines to communicate and synchronize, following the Go proverb: "Do not communicate by sharing memory; instead, share memory by communicating."

Goroutine Basics

Spawning a goroutine is as simple as prefixing a function call with go:

package main

import "fmt"
import "time"

func printGreeting(name string) {
    fmt.Printf("Hello, %s!\n", name)
}

func main() {
    go printGreeting("World") // Runs concurrently
    // Add a small sleep to let the goroutine execute before main exits
    time.Sleep(100 * time.Millisecond)
}

For 100k+ concurrent users, you’ll spawn one or more goroutines per incoming request, but unregulated goroutine creation can lead to resource exhaustion. We’ll cover throttling later.

Channel Basics

Channels allow goroutines to pass data. Unbuffered channels block until both sender and receiver are ready; buffered channels hold a fixed number of values before blocking.

// Unbuffered channel
ch := make(chan int)
go func() { ch <- 42 }() // Send to channel
val := <-ch // Receive from channel, blocks until value is sent

// Buffered channel (capacity 10)
bufferedCh := make(chan string, 10)
bufferedCh <- "first" // Does not block if buffer has space

Go 1.24 Optimizations for High Concurrency

Go 1.24 introduces two key improvements for high-concurrency workloads:

Reduced goroutine scheduling latency: The Go 1.24 runtime uses a redesigned work-stealing algorithm that reduces context switch overhead for goroutines waiting on channels or I/O, critical for 100k+ concurrent users with mixed I/O and compute workloads.
Lower channel allocation overhead: Small buffered channels (capacity ≤ 16) are now stack-allocated instead of heap-allocated in most cases, reducing GC pressure when using many short-lived channels.

Building a 100k+ Concurrent User HTTP Server

We’ll build a sample HTTP server that handles high concurrency using goroutines for request handling and channels for rate limiting and result aggregation. First, a naive implementation (then we’ll optimize it):

Naive High-Concurrency Server

package main

import (
    "fmt"
    "net/http"
    "time"
)

func handler(w http.ResponseWriter, r *http.Request) {
    // Simulate 10ms of I/O work (e.g., database query, API call)
    time.Sleep(10 * time.Millisecond)
    fmt.Fprintf(w, "Request processed: %s\n", r.URL.Path)
}

func main() {
    http.HandleFunc("/", handler)
    fmt.Println("Server starting on :8080")
    http.ListenAndServe(":8080", nil)
}

Wait—this already uses goroutines! The net/http server spawns a new goroutine per incoming request by default. For 100k+ concurrent users, this naive approach works until you hit resource limits: too many open file descriptors, unregulated I/O, or GC pressure from excessive allocations.

Optimization 1: Rate Limiting with Buffered Channels

Unregulated goroutine creation can lead to resource exhaustion. Use a buffered channel as a semaphore to limit concurrent goroutines:

package main

import (
    "context"
    "fmt"
    "net/http"
    "time"
)

// Semaphore channel to limit concurrent goroutines to 1000
var sem = make(chan struct{}, 1000)

func rateLimitedHandler(w http.ResponseWriter, r *http.Request) {
    // Acquire semaphore (blocks if 1000 goroutines are active)
    sem <- struct{}{}
    defer func() { <-sem }() // Release semaphore when done

    // Simulate I/O work
    time.Sleep(10 * time.Millisecond)
    fmt.Fprintf(w, "Request processed: %s\n", r.URL.Path)
}

func main() {
    http.HandleFunc("/", rateLimitedHandler)
    fmt.Println("Server starting on :8080")
    http.ListenAndServe(":8080", nil)
}

Adjust the semaphore capacity based on your system’s CPU and memory: 1000 concurrent goroutines is a safe starting point for most 4-8 core machines, but you can tune this for 100k+ total concurrent users (since each goroutine handles a request in ~10ms, 1000 concurrent goroutines can handle 100k requests in ~1 second).

Optimization 2: Using Worker Pools for CPU-Heavy Work

If your request handler does CPU-heavy work (not just I/O), spawning a goroutine per request will saturate your CPU. Use a worker pool pattern with channels to reuse goroutines:

package main

import (
    "fmt"
    "net/http"
    "time"
)

type job struct {
    w    http.ResponseWriter
    r    *http.Request
}

const numWorkers = 500 // Tune based on CPU cores
var jobCh = make(chan job, 10000) // Buffered job queue

// Worker goroutines process jobs from the queue
func worker() {
    for j := range jobCh {
        // Simulate CPU-heavy work (e.g., JSON serialization, computation)
        time.Sleep(5 * time.Millisecond)
        fmt.Fprintf(j.w, "Request processed: %s\n", j.r.URL.Path)
    }
}

func workerPoolHandler(w http.ResponseWriter, r *http.Request) {
    // Send job to worker pool
    jobCh <- job{w: w, r: r}
}

func main() {
    // Start worker goroutines
    for i := 0; i < numWorkers; i++ {
        go worker()
    }

    http.HandleFunc("/", workerPoolHandler)
    fmt.Println("Server starting on :8080")
    http.ListenAndServe(":8080", nil)
}

Worker pools reduce goroutine creation overhead and prevent CPU saturation, making them ideal for mixed workloads at 100k+ concurrent users.

Optimization 3: Minimize Channel and Goroutine Leaks

Leaking goroutines or channels (goroutines that never exit, channels that are never closed) will crash your application under high concurrency. Follow these rules:

Always use context.Context to cancel long-running goroutines when requests are cancelled (e.g., client disconnects).
Close channels only from the sender side, and only when no more values will be sent.
Use defer to release resources (like semaphore slots) even if the handler panics.

Example with context cancellation:

func contextAwareHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context() // Context cancelled when client disconnects
    sem <- struct{}{}
    defer func() { <-sem }()

    select {
    case <-time.After(10 * time.Millisecond): // Simulate I/O
        fmt.Fprintf(w, "Request processed: %s\n", r.URL.Path)
    case <-ctx.Done(): // Client disconnected
        fmt.Println("Request cancelled by client")
        return
    }
}

Testing for 100k+ Concurrent Users

Use tools like wrk or hey to load test your server. For example, to simulate 100k concurrent connections with wrk:

wrk -t12 -c100000 -d30s http://localhost:8080

Monitor metrics like goroutine count (runtime.NumGoroutine()), GC pause time (runtime.ReadMemStats), and request latency to tune your configuration.

Best Practices for 100k+ Concurrent Users

Prefer buffered channels for high-throughput workloads to reduce blocking.
Avoid sharing variables between goroutines without synchronization (use channels instead).
Tune the GOMAXPROCS environment variable (defaults to CPU core count in Go 1.24) only if you have a specific workload that benefits from more OS threads.
Use Go 1.24’s runtime/metrics package to collect detailed concurrency metrics in production.

Conclusion

Go 1.24’s goroutines and channels make it straightforward to build applications that handle 100k+ concurrent users, especially with runtime optimizations for scheduling and channel allocation. By combining rate limiting, worker pools, and leak prevention patterns, you can build reliable, high-throughput systems that scale with your user base. Start with the naive implementation, load test iteratively, and tune based on your specific workload.

DEV Community

How to Implement Concurrency in Go 1.24 with Goroutines and Channels for 100k+ Concurrent Users

How to Implement Concurrency in Go 1.24 with Goroutines and Channels for 100k+ Concurrent Users

Prerequisites

Understanding Goroutines and Channels

Goroutine Basics

Channel Basics

Go 1.24 Optimizations for High Concurrency

Building a 100k+ Concurrent User HTTP Server

Naive High-Concurrency Server

Optimization 1: Rate Limiting with Buffered Channels

Optimization 2: Using Worker Pools for CPU-Heavy Work

Optimization 3: Minimize Channel and Goroutine Leaks

Testing for 100k+ Concurrent Users

Best Practices for 100k+ Concurrent Users

Conclusion

Top comments (0)