Go - Race Condition: Detection and Prevention

#go #programming

A race condition occurs when two or more things (process, threat, goroutine etc.) access the same location in memory at the same time, and at least one of the accesses is a write.

Let’s analyze a quick and simple example

package main

import (
    "fmt"
    "sync"
)

const (
    stepCount    = 100000
    routineCount = 2
)

var counter int64

func main() {
    var wg sync.WaitGroup

    for i := 0; i < routineCount; i++ {
    wg.Add(1)
        go incr(&wg)
    }

    wg.Wait() // wait until all goroutines executed
    fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}

func incr(wg *sync.WaitGroup) {
    for i := 0; i < stepCount; i++ {
        counter++
    }

    wg.Done()
}

incr function is responsible to increase counter as value of the stepCount. In this example we have 2 goroutines and each one increase the counter 100000 times. Then we expect to see 200000 (routineCount x stepCount) in value of the counter variable.

Let’s look at the output of the code 🙃

Step Count: 100000
LastValue : 192801
Expected  : 200000

Ooops, we got 192801 instead of 200000 😮 . But why? What’s wrong?

Critical Section

Before go over the problem, we should know the what is the critical section. Critical section is nothing but segment of code that should not be executed by multiple processes at the same time. Only one process/routines can execute in the critical section, other ones have to wait their turns. If not, the result will be the same as the above one.

Analyze The Problem

Now we know what is critical section. Let’s go over the problem. We have 2 same routines that are increasing the counter value by applying these steps:

1. Read value of the counter
2. Add 1 to counter
3. Store increased value in counter

The critical section consists of these two steps. Routines should not execute these steps at the same time.

Let’s imagine the scenario:

Routine 1: Read value of the counter (counter = 12)
Routine 2: Read value of the counter (counter = 12)
Routine 1: Add 1 to counter (12 + 1 = 13)
Routine 1: Store increased value in counter (counter = 13)
Routine 2: Add 1 to counter (12 + 1 = 13)
Routine 2: Store increased value in counter (counter = 13)

Oops, two routines execute some steps at the same time. Although 2 routines increase the counter by 1 that means we expect the counter will be increased by 2, the counter will only increase by 1. That’s why we got the unexpected answer in the output. Routine 2 should have waited for the critical section of Routine 1 to be executed. (Race Condition 🙋)

How to Detect?

If code written in go, you’re lucky. Golang has internal race detector tool (you don’t have to install explicitly) which is written in C/C++ using ThreadSanitizer runtime library. Tool watches for unsynchronized accesses to shared variables. If it detects some race condition case, prints a warning. Please be careful when using this tool, do not run it in production. It can consume ten times the CPU and memory.

Because of its design, the race detector can detect race conditions only when they are actually triggered by running code, which means it’s important to run race-enabled binaries under realistic workloads. However, race-enabled binaries can use ten times the CPU and memory, so it is impractical to enable the race detector all the time. One way out of this dilemma is to run some tests with the race detector enabled. Load tests and integration tests are good candidates, since they tend to exercise concurrent parts of the code. Another approach using production workloads is to deploy a single race-enabled instance within a pool of running servers.

How to Use Race-Detector Tool?

No need to install anything. It’s fully integrated with the Go tool chain. Just add a -race flag while compiling/running your application.

$ go test -race mypkg    // test the package
$ go run -race mysrc.go  // compile and run the program
$ go build -race mycmd   // build the command
$ go install -race mypkg // install the package

Let’s run it for our racy code.

$ go run -race main.go

==================
WARNING: DATA RACE
Read at 0x000001279320 by goroutine 8:
  main.incr()
      main.go:29 +0x47

Previous write at 0x000001279320 by goroutine 7:
  main.incr()
      main.go:29 +0x64

Goroutine 8 (running) created at:
  main.main()
      main.go:20 +0xc4

Goroutine 7 (running) created at:
  main.main()
      main.go:20 +0xc4
==================
Step Count: 100000
LastValue : 192801
Expected : 200000

Found 1 data race(s)
exit status 66

Results shows that an unsynchronized events of the variable counter from different routines. We’ll go over the solution in next section.

Another points to prevent and detect the race conditions are that

qualified code reviews
designing and modeling applications that use as little shared resources as possible
increasing know-how about such situations
unit-tests for concurrent things

How to Handle?

Until know, we understood the problem and detect the bug. Let’s fix it!

Using Mutex

Mutex (mutual exclusion) is a lock/unlock mechanism for critical sections. If it’s locked, the critical section is reserved for one goroutine, other ones have to wait until unlocking. In our code, we should lock the code that increases the counter. Other goroutines should not be able to increase the counter if one goroutine is already working on it.

package main

import (
    "fmt"
    "sync"
)

const (
    stepCount    = 100000
    routineCount = 2
)

var counter int64

func main() {
    var wg sync.WaitGroup
    var mx sync.Mutex // initialize mutex

    for i := 0; i < routineCount; i++ {
    wg.Add(1)
        go incr(&wg, &mx) // pass mutex to each routine
    }

    wg.Wait() // wait until all goroutines executed
    fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}

func incr(wg *sync.WaitGroup, mx *sync.Mutex) {
    for i := 0; i < stepCount; i++ {
        mx.Lock() // lock critical section for this routine
        counter++  // critical section
        mx.Unlock() // unlock critical section then other routines can use it.
    }

    wg.Done()
}

Using Channels

According to the go document;

Channels are the pipes that connect concurrent goroutines. You can send values into channels from one goroutine and receive those values into another goroutine.

It’s a simple pipe actually. In this scenario we can use buffered channels with capacity 1 to synchronize our goroutines. That means channel accept only one data in it, does not accept the new data until current one read.

Long story short, we should pass the channel to fired routines, and each routine should send a some value to channel to block other routines. When it’s done, routine should clean out the channel to allow other routines. It’s like a lock/unlock mechanism provided by mutex.

package main

import (
    "fmt"
    "sync"
)

const (
    stepCount    = 100000
    routineCount = 2
)

var counter int64

func main() {
    var wg sync.WaitGroup
    ch := make(chan struct{}, 1) // define buffered channel 

    for i := 0; i < routineCount; i++ {
        wg.Add(1)
        go incr(&wg, ch)
    }

    wg.Wait() // wait until all goroutines executed
    fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}

func incr(wg *sync.WaitGroup, ch chan struct{}) {
    ch <- struct{}{} // send empty struct into channel to block other routines.
    for i := 0; i < stepCount; i++ {
        counter++
    }
    <- ch // clear out the channel

    wg.Done()
}

Using Atomic Package

Atomic function does not need any lock, they are implemented at hardware level. If performance really important for you, atomic package can be used to create lock-free application. But, you or your team should know how atomic functions work in-behind. For example atomic variables should be controlled by only atomic functions. Don’t read or write like a classic variable.

package main

import (
    "fmt"
    "sync"
    "sync/atomic"
)

const (
    stepCount    = 100000
    routineCount = 2
)

var counter int64

func main() {
    var wg sync.WaitGroup

    for i := 0; i < routineCount; i++ {
        wg.Add(1)
        go incr(&wg)
    }

    wg.Wait() // wait until all goroutines executed
    fmt.Printf("Step Count: %d\nLastValue: %d\nExpected: %d\n", stepCount, counter, stepCount*routineCount)
}

func incr(wg *sync.WaitGroup) {
    for i := 0; i < stepCount; i++ {
        atomic.AddInt64(&counter, 1) // use atomic function to increase counter
    }

    wg.Done()
}