DEV Community

Tim Downey

Posted on • Updated on • Originally published at downey.io

Emulating an OpenMP Parallel For-Loop in Go

Cover photo by Denys Nevozhai on Unsplash

In several of the labs for Georgia Tech's Intro to High Performance Computing course we used OpenMP and C to implement some parallel algorithms like the Helman and Jájá list ranking algorithm. Although C can be a lot of fun, I've been trying to get some more practice with Go lately so this got me wondering: "Is there an equivalent to OpenMP in Go?"

Although I did not find an exact drop-in replacement for OpenMP, posts on Stackoverflow lead me to realize I could accomplish similar goals using Goroutines. Specifically, I was interested in creating a simple parallel for-loop to try and reimplement some of the labs in Go.

Parallel For-Loop Example in C and Go

Below is a simple parallel for-loop in C using OpenMP's omp parallel for directive. It is performing a portion of the Helman-JáJá list-ranking algoritm. The key thing to note here is that each element at `i` in the `rank`/`list` arrays in the examples below is being operated on independently. This allows each loop iteration to be safely executed in parallel.

``````#pragma omp parallel for
for (int i = 0; i < n; i++) {
long currIdx = next[i];

if (currIdx != NIL) {
}
}
``````

Now let's see what this might look like in Go. Ignore the fact that the data structures in play might be a bit different -- the intent behind the algorithm is the same.

``````var wg sync.WaitGroup
for i := 0; i < n; i++ {
go func(i int) {
defer wg.Done()
}(i)
}
wg.Wait()
``````

Here we're doing the following:

1. Create a WaitGroup (`wg`) and telling it to expect to wait for `n` Goroutines to finish
2. Spawn a Goroutine for each iteration of the loop we want to parallelize
3. Tell each Goroutine to let the WaitGroup know it's done once it's performed the prefix add operation
4. Tell the WaitGroup to block until it has heard back from the `n` Goroutines

A Note on Goroutine Scheduling

You may be wondering why it's okay to create a potentially huge number (`n`) of Goroutines in the code above. The short answer is, Goroutines are multiplexed across the hardware threads and scheduled independently from the OS by Go's scheduler. This blog post does a pretty good job of explaining how Go can handle millions of Goroutines. This is a bit different from how OpenMP handles scheduling by divvying chunks of work among threads. At the end of the day, however, they're both means of distributing a large amount of work across a smaller, fixed number of available processors.

If you really want to go deep, this talk from GopherCon 2018 by Kavya Joshi gives a great view into what is going on under the hood of the Goroutine scheduler.

Summary

Well, there you have it. It's a bit more wordy than the OpenMP version, but I thought it was really cool that you could accomplish almost the same behavior using just a few of Go's standard libraries. That said, the OpenMP paradigms are very C-like and I imagine are not idiomatic Go. 🙃

If any experience Go folks are reading this, please share how you'd write the parallel for loop (or accomplish the same goals in another way) in the comments below!