DEV Community

Ghanithan Subramani
Ghanithan Subramani

Posted on

How to squeeze every drop of CPU resource using Concurrent parallel programming in GO

Aren't concurrency and parallelism different ?? can they coexist ? First let's understand what is concurrency and parallelism.

Concurrency is the way of handling multiple tasks at the same time. For example, you have two tasks at hand which needs to be completed with the same priority. One task is to update the user-manual of your application and another task is to build the application for different platform and OS combinations say windows/386, windows/amd64, linux/386, etc. Both needs to be done with same priority but you are the only resource available (single OS thread available for processing). You would start building the application for each OS/Platform combination and when the the build is running, you would be idle. You would use that idle time to update the user-manual. Both the tasks are being handled at the same time. This is called concurrency.

Parallelism is the way of doing multiple tasks at the same time. Let us assume the same example where you need to update user-manual and build the application at the same time. But this time around, you got your colleague to support you in your tasks ( 2 OS threads are available for processing). You would be assigning the task of building the application for different OS/Platform combinations to your colleague and you would take up the task to update the user-manual . So, both tasks are being run in parallel with separate resources dedicated to them.

In the above example for parallelism, your colleague who is running application builds might be idle when the build is running and you might also feel exhausted updating the user-manual at times. In such scenario, you two might explore the possibility of sharing your work with each other and complete both the tasks much earlier. Both tasks are being executed at the same time and also they are being handled by both the persons. This is called concurrent parallelism.

Concurrency vs Parallelism vs Concurrent Parallelism

Concurrency in GO

GO offers concurrency out of the box. Just by prefixing a function call with the keyword go would make the function a goroutine which is a lightweight thread managed by the go runtime and it would be executed concurrently with the main thread.

go writeUserManual() \\ this would create a separate goroutine which runs concurrently
Enter fullscreen mode Exit fullscreen mode

But now the goroutines need to communicate between eachother to get some work done. Traditional multithreading approach would be to have a shared memory between the threads to communicate between each other and each thread would use a lock on the memory to access them. This was to avoid race conditions. But GO takes a different approach called CSP (Communication Sequential Process). Goroutines do not communicate by sharing memory, instead they share memory by communicating. Goroutines share the value between eachother using channels. A receiver goroutine would be waiting (blocked) until the sender goroutine writes a value to the channel. Thereby bringing a synchronisation between the goroutines.

package main

import (
    "fmt"
)

func main() {

    c := make(chan int) // create a channel for the goroutine to send value to the main thread

    go getNum(1, 100, c) // start a goroutine which runs in parallel to the main thread

        fmt.Print("\n")
    for m := range c { // Loop and wait for the value to be received on the channel.
        // Loop breaks when the channel in closed by the sender
        fmt.Print(m, ",")
    }
    fmt.Print("\n\n")

}

// function to generated 100 numbers in sequence and push them in a channel
func getNum(init, end int, c chan int) {

    for i := init; i <= end; i++ {
        c <- i // write value on the channel
    }
    close(c) // close the channel once the task is completed

}


Enter fullscreen mode Exit fullscreen mode

Output Generated for the above code:
Output

In the above code, how channels are being used to communicate between goroutines is established. Untill c <- i is used to write value to the channel, for m := range c blocks the main routine from being executed. Thus, synchronisation is established between the two goroutines. But we are not able to see how concurrency works from the above example even though the getNum goroutine is executed concurrently alongside the main goroutine. To visibly observe concurrency, we shall another goroutine in the mix to print Hello World and introduce a timedelay so that we can observe the concurrent behaviour.

In the below code, the printHello goroutine is spun out and is executed concurrently along side the number generation and printing goroutines. We shall also make the number of OS threads used by the go runtime as 1 so that we observe the concurrent behaviour more visibly.

package main

import (
    "fmt"
    "runtime"
    "time"
)

func main() {

    runtime.GOMAXPROCS(1) // instructing the go runtine to use only one OS thread to observe the concurrency

    c := make(chan int) // create a channel for the goroutine to send value to the main thread

    go getNum(1, 100, c) // start a goroutine which runs in parallel to the main thread

    go printHello(50) // print hello 50 times concurrently

    fmt.Print("\n")
    for m := range c { // Loop and wait for the value to be received on the channel.
        // Loop breaks when the channel in closed by the sender
        fmt.Print(m, ",")
    }

    fmt.Print("\n\n")

}

// function to generated numbers in sequence from init till end and push them in a channel
func getNum(init, end int, c chan int) {

    for i := init; i <= end; i++ {
        c <- i                                          // write value on the channel
        time.Sleep(time.Duration(2) * time.Microsecond) // introduce a timedelay to observe concurrency
    }
    close(c) // close the channel once the task is completed

}

// function to print Hello World repeatedly for num times
func printHello(num int) {
    for i := 0; i < num; i++ {
        fmt.Print("Hello World,")
        time.Sleep(time.Duration(10) * time.Microsecond) // introduce a timedelay to observe concurrency
    }
}


Enter fullscreen mode Exit fullscreen mode

Output for the above code:
Image description

The above code doesn't print the 50 'Hello World' as instructed but prints the number sequence correctly. This is because the main thread is made to wait until the channel is closed by the for m := range c loop. Once the channel is closed by close(c) in the go routine getNum(init,end int, c chan int), the loop in the main routine breaks and the 'printHello' function is closed prematurely. We have used an unbuffered channel which has no length. Unbuffered channels work such a way that when a receiver goroutine is waiting on the buffer the receive a value, the channel would be open for the sender to send the value on the channel. Thus establishing a synchronous working. On buffered channels, even when the receiver is not waiting for the value on the channel, the channel would be open for the sender to write values on to it.
Sender will be blocked when the buffered channel is filled and they would be released only when a receiver reads a value from the channel.

For in-depth understanding of concurrency is GO, please visit the link. And for more concepts about concurrency and parallelism visit link.

But wait, don't they only talk about only concurrency in their slides ?? Before GO 1.5, the default number of OS threads that was used by the GO runtime was 1, since the context switching between the threads were expensive for the GO scheduler at that time. But from GO 1.5 all the OS threads would be used by the GO runtime by default, since the scheduler was optimised and go routines were scheduled in such a way that the context switching was minimum. By enabling multiple processors, it doesn't mean parallelism is achieved by default. The program needs to be written in such a way that the goroutines are running in parallel on all processors. This is explained well in GO FAQs :

The Go language provides concurrency primitives, such as goroutines and channels, but concurrency only enables parallelism when the underlying problem is intrinsically parallel. Problems that are intrinsically sequential cannot be sped up by adding more CPUs, while those that can be broken into pieces that can execute in parallel can be sped up, sometimes dramatically.

Concurrent Parallelism in GO

To understand concurrent parallelism in GO, we shall take a problem which can be broken into pieces and executed in parallel.

We shall take up a task to generate 1 million crypto random numbers and print them on screen. The solution to the problem statement can be designed to run in parallel.

At first, we shall program a sequential solution to set the base line to compare the performance improvement when introducing concurrent parallelism.

We are using an intel i5 processor with 6 threads and running on Windows 10.

Below is the sequential solution for the problem statement. We shall use a loop that iterates for 1 million times and call the crypto random function to generate a random number of 128 bits length and then print them on the screen. The GO runtime actually uses all 6 CPU threads but since we have written a sequential program, it is running on only 1 CPU thread and other threads are used by runtime and Garbage colletor. It took 600.509 second to generate 1 million random numbers. This was using only ~20% CPU resource.

Sequential program:

package main

import (
    "crypto/rand"
    "fmt"
)

// Function to generate 'count' number of random numbers and print them on screen
func randomFeed(count int) {
    for i := 0; i < count; i++ {
        random, _ := rand.Prime(rand.Reader, 128) // use the crypto random number generator to create random numbers

        fmt.Println("rn -> ", random) // print the random number generated on the screen
    }

}

func main() {
    randomFeed(1000000) // call the function to generate 1 million random number
}

Enter fullscreen mode Exit fullscreen mode

Concurrent Parallel program:

Now, we shall make the solution work concurrent and parallel in all 6 CPU threads. Simplest solution would be calling the randomFeed() function as separate 3 goroutines with 1 million split into 3 parts so that they run parallelly on all CPU threads. But this is not how solutions would be designed for real world problems. There would be atomic processes and they would be communicating between eachother via channels. We tried multiple solution and benchmarked their performance.

First solution was a fan out method, we created a single random number generation goroutine which would be feeding data into 2 printing goroutines over an unbuffered channel. This solution took 502.695 second to generate 1 million random numbers and average CPU utilisation was close to 40% and running on all 6 CPU threads. There was no much increase in performance because only 1 goroutine was assigned for the CPU intensive task of generating random numbers.

Second solution was a fan in method, we created 3 goroutines to generate random numbers and one goroutine to print the numbers using only one buffered channel of capacity 3. This solution took 181 second to generate 1 million random numbers and average CPU utilisation was close to 94% and running on all 6 CPU threads. This was a significant increase in performance when compared with the base line which we set with sequential program.

On the philosophy that We shall never stop trying to improvise, we kept optimising the solution. The final solution that we arrived was complete atomic processes and dedicated channels for each set of processes. We created 3 goroutines to generate random numbers and 3 goroutines to print them and then assigned 1 printing goroutine exclusively to each generator goroutines using a dedicated buffered channel of capacity 3. This solution took 172.78 second to generate the 1 million random numbers and average CPU utilisation was close to 97% and running on all 6 CPU threads. This is almost 3.5 times faster than the baseline sequential program.

The final solution is provided below.

package main

import (
    "crypto/rand"
    "fmt"
    "math/big"
    "sync"
)

// Function to generate 'count' number of random number
func randomFeed(c chan *big.Int, count *int, wg *sync.WaitGroup) {
    for ; *count > 0; (*count)-- {

        random, _ := rand.Prime(rand.Reader, 128) // using crypto random number function
        c <- random                               // write the random number onto the channel

    }
    wg.Done() // once the loop is completed inform the workgroup that the task is completed
}

// function to print the number written on to the channel
func printRandom(c chan *big.Int, str string, wg *sync.WaitGroup) {
    count := 0

    for m := range c { // wait until the data is available in the channel and then execute the for loop
        fmt.Println(str, " -> ", m)
        count++
    }

    fmt.Println("rn1 -> count : ", count)

    wg.Done() // once the process is completed inform the workgroup that the task is completed

}

func concur() {

    // create 3 buffered channels
    c1 := make(chan *big.Int, 3)
    c2 := make(chan *big.Int, 3)
    c3 := make(chan *big.Int, 3)

        // create 3 counters splitting 1 million into 3 parts
    count1 := 333333
    count2 := 333333
    count3 := 333334

    var wg sync.WaitGroup
    var wg1 sync.WaitGroup

    wg1.Add(1)
    go printRandom(c1, "rn1", &wg1) // spin a goroutine with a dedicated channel to write on to and map it to a workgroup
    wg1.Add(1)
    go printRandom(c2, "rn2", &wg1) // spin a goroutine with a dedicated channel to write on to and map it to a workgroup
    wg1.Add(1)
    go printRandom(c3, "rn3", &wg1) // spin a goroutine with a dedicated channel to write on to and map it to a workgroup

    wg.Add(1)
    go randomFeed(c1, &count1, &wg) // spin a goroutine with a dedicated channel to read from on to and map it to a workgroup
    wg.Add(1)
    go randomFeed(c2, &count2, &wg) // spin a goroutine with a dedicated channel to read from on to and map it to a workgroup
    wg.Add(1)
    go randomFeed(c3, &count3, &wg) // spin a goroutine with a dedicated channel to read from on to and map it to a workgroup
    wg.Wait() // wait until all the generator goroutines are completed

    //close all the channels
    close(c1)
    close(c2)
    close(c3)
    wg1.Wait() // wait until the values in the channels are printed by the 3 printer goroutines
    fmt.Println("All go routines finished executing")

}

func main() {

    concur() 

/* we wrote the complete logic in a function to use the
 inbuilt testing module which will show the execution 
 time of the program */

}

Enter fullscreen mode Exit fullscreen mode

Conclusion

In the above discussion, we exhibited how a problem with a typical sequential solution can be split up and made to run concurrent and parallel on all CPU cores, thus exploiting the compute resource that otherwise would have been left idle. We would have noticed in most instances where the compute resource would have been underutilised and still we would be paying for them. If the solutions were architected with concurrent parallelism in mind, then we would not be paying for more than what we use and also save a precious human resource which is time. This doesn't just apply for GO but this philosophy should be applied in all languages.

Top comments (0)