Moving articles from Medium to dev.to
Concurrency won’t solve your CPU bound problems, but it can help with your IO problem.
Many people feel they can improve the speed of their program by adding some concurrency. This is true and not true. concurrency can make your code slower than just running it serially.
What is concurrency to begin with? concurrency is dealing with multiple things at a time. It does not run two tasks simultaneously at the same time, which is parallelism. it runs tasks out of order so they appear to run at the same time.
Concurrency is mostly achieved using system-level threads that are spread across the available processors.
Concurrency with system-level threads has a cost for switching context. Some languages try to avoid it (Node.js), while some reduce it by using user-level threads (Go, Elixir) and some just use it (Java*).
The cost of context switching is a good price to pay for IO tasks, but not a good price for CPU intensive tasks.
Meet our Example functions
We are gonna separate our functions into 2 categories:
CPU functions (serial & concurrent)
IO function (serial & concurrent)
The CPU example functions call a poorly written Fibonacci function to simulate running a CPU intensive task. One function, cpuExampleSequential, runs the Fibonacci function in a loop serially, while cpuExampleConcurrent runs the Fibonacci function in a loop concurrently.
IO example function calls a struct that implements an io.Reader
. The implementation of this reader pretends to make a long blocking call by sleeping for some time. A function in this example calls the reader serially, while the other calls it concurrently.
Below is the implementation of the example functions.
package main
import (
"io"
"sync"
"time"
)
// a function that does much cpu bound work
func fibo(n int) int {
if n < 2 {
return 1
}
return fibo(n-2) + fibo(n-1)
}
// run sequentially
func cpuExampleSequential() {
for i := 0; i < 10; i++ {
fibo(40)
}
}
// run concurrently
func cpuExampleConcurrent() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fibo(40)
}()
}
wg.Wait()
}
type sleepReader struct {
delay time.Duration
}
func (sr sleepReader) Read(b []byte) (int, error) {
time.Sleep(sr.delay)
copy(b, []byte("hello world"))
return 0, nil
}
func read(r io.Reader) {
b := make([]byte, 10)
r.Read(b)
}
func ioExampleSequential() {
for i := 0; i < 10; i++ {
read(sleepReader{time.Second * 3})
}
}
func ioExampleConcurrent() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
read(sleepReader{time.Second * 3})
}()
}
wg.Wait()
}
Now for some benchmark test functions
package main
import (
"runtime"
"testing"
)
// limit to a single processor
var _ = runtime.GOMAXPROCS(1)
func Benchmark_cpuExampleSequential(b *testing.B) {
for n := 0; n < b.N; n++ {
cpuExampleSequential()
}
}
func Benchmark_cpuExampleConcurrent(b *testing.B) {
for n := 0; n < b.N; n++ {
cpuExampleConcurrent()
}
}
func Benchmark_ioExampleSequential(b *testing.B) {
for n := 0; n < b.N; n++ {
ioExampleSequential()
}
}
func Benchmark_ioExampleConcurrent(b *testing.B) {
for n := 0; n < b.N; n++ {
ioExampleConcurrent()
}
}
The result from the benchmark
The result shows the following:
The difference between serially and concurrent execution of the CPU examples is not much. The concurrent benchmark takes a bit longer to execute.
There is a 10X difference between the serial and concurrent execution of the IO function. the concurrent call completes in approximately 3s, while the serial function call takes 30S.
The Reason
Let’s start with the CPU op. The reason for this is kind of simple. This is because the only processor we have spends most of its time doing work. There is no call to a blocking operation that can run in a background thread. Running the task concurrently won’t improve its performance, but can slow it down due to the cost from context switching. In Go’s case, there is little or no context switching (but there is a tiny cost for switching goroutines). The processor is always busy and we just added more busyness to it.
The IO concurrent call is 10X faster because the read function releases the CPU when a call is made to the reader. This gives other goroutines time with the processor. In the case of the serial call, the function spends most of the time waiting for the reader to complete before calling the next reader. Most of the work by the IO examples are blocking operation, so the CPU is free to perform other tasks.
Conclusion
From the examples above we can say that concurrency does not improve the performance of CPU intensive tasks, but we can use it in improving IO intensive tasks.
One way of solving the CPU intensive task is by making use of multiple processors. This solves the problem by adding more CPUs to run the task.
Note: the algorithm used should be written to support parallel execution. We have had cases where adding more processors slow down the system. Cases where all the processors have to wait on a lock before they can perform a task. Adding more processors, in this case, will slow down the performance.
There are still other ways of improving CPU heavy task. Some are :
- Batching calls that perform similar tasks
- Caching the response of expensive task
- Improving the algorithm
More on improving CPU performance Here.
For more on the cost of context switching check Here. check Here for deeper knowledge
I think this is the End✌🏾. Thanks for reading.
Top comments (0)