Problems this pattern can solve:
- If your service makes 1,000 queries per second to a database, but it can only handle 100, a worker pool will protect the database from crashing.
- In the event of a sudden traffic spike, using go func() for each request could spawn 100,500 goroutines and consume all your memory. The pool limits concurrency.
- If you have a pool of socket connections or file descriptors, a worker pool ensures you don't exceed the OS limit.
The Essence:
We create a fixed number of goroutines (workers) that are started in advance and wait for tasks. The main goroutine (the dispatcher) puts tasks into a channel (the task queue). The workers concurrently take these tasks from the channel and execute them. They can send the results back through another channel.
The Idea: Limiting the number of concurrently executed operations and reusing goroutines.
Difference between Worker Pool and other approaches:
go func() (Spawning raw goroutines):
- Cons compared to Worker Pool: Uncontrolled growth in the number of goroutines can lead to resource exhaustion (memory, file descriptors) and a panic. There is no control over concurrency.
- Pros compared to Worker Pool: Simpler to write, lower startup latency (no need to wait for a free worker).
- When to use it: For handling signals, very lightweight tasks, or when the load is guaranteed to be low.
Pipeline:
- How it's different: This is about sequential processing. Data flows through a chain of stages, where each stage is executed by its own goroutine (or pool), connected by channels.
- Example:
stage1 (generate) -> stage2 (multiply) -> stage3 (save) - Cons compared to Worker Pool: More difficult to cancel and handle errors; the throughput is limited by the slowest stage.
- Pros compared to Worker Pool: Ideal for tasks that can be broken down into distinct, independent processing steps.
Semaphore:
- How it's different: A semaphore is a synchronization primitive used to limit access to a resource. You still spawn a goroutine for each task, but before starting the "heavy" part, they acquire a slot from the semaphore.
- How this relates: A Worker Pool is often implemented using a semaphore, but a semaphore is a lower-level tool.
- If the task is heavy (e.g., an HTTP request, complex calculation, disk I/O) and there aren't millions of them — use a Semaphore. The overhead of creating a goroutine is negligible compared to the task's execution time (e.g., parallel scraping of 50 websites).
- If the task is very light (e.g., parsing a string, a simple transformation) and the stream is infinite — use a Worker Pool. Otherwise, the GC will be overwhelmed by the creation of thousands of goroutines (e.g., real-time log processing, handling events from Kafka).
MapReduce:
- How it's different: A higher-level pattern for distributed computations. It involves a "Map" phase (parallelization) and a "Reduce" phase (aggregation). A Worker Pool is often used as an implementation for the "Map" phase.
- Cons compared to Worker Pool: Overkill for simple concurrent processing.
Example:
package main
import (
"fmt"
"sync"
"time"
)
func worker(id int, wg *sync.WaitGroup, jobs <-chan int) {
defer wg.Done()
defer func() {
if r := recover(); r != nil {
fmt.Printf("Worker %d: panic: %v\n", id, r)
}
}()
for job := range jobs {
fmt.Printf("Worker %d started the task %d\n", id, job)
time.Sleep(time.Second) // Simulating a task
fmt.Printf("Worker %d completed the task %d\n", id, job)
}
}
func main() {
const numJobs = 10
const numWorkers = 3
jobs := make(chan int, numJobs)
var wg sync.WaitGroup
for w := 1; w <= numWorkers; w++ {
wg.Add(1)
go worker(w, &wg, jobs)
}
for j := 1; j <= numJobs; j++ {
jobs <- j
}
close(jobs)
wg.Wait()
fmt.Println("All tasks completed")
}

Top comments (0)