Abhijit Hota

Posted on Jul 2, 2022 • Originally published at abhijithota.me on Jul 2, 2022

Nerd-sniping myself over Go range expressions

#go #rtfm #benchmarking #internals

Originally published on abhijithota.me

The background

I stumbled across a situation while writing a Golang project where I needed to loop over a slice returned by a function.
The function being strings.Split. No big deal. I wrote the following:

for _, id := range strings.Split(path, "/") {
    // Something()
}

There's another way of writing the same thing:

splitPath := strings.Split(path, "/")
for _, id := range splitPath {
    // Something()
}

To the novice-eyed, like me, it would seem the latter was an optimized version as the function is only called once before the loop and hence more performant.

I did not care about the negligible performance gain I would get as the path variable passed to the split function would be limited in length by application constraints.

But what if it wasn't?

tldr; It doesn't matter how you write it. They are both the same.
From the Golang spec,

The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.

This seems obvious once you know about it because it's such a low-hanging fruit for the compiler. It's not even compiler optimization. It's compiler duty.

From here, I'm going to write about how I nerd-sniped myself into finding the answer to this and naturally, spent around 4 hours benchmarking, trying to figure out assembly, checking the compiler optimization wiki, some trivial experimentation that confirmed my theory and one line in the spec that made it all come together.

Before we proceed into this, have some context about the original scenario. The function should split a separated string called path with / as delimiter and extract all integers into an array like this:

func ExtractInts(path string) []int {
    pathIds := []int{}
    for _, v := range strings.Split(path, "/") {
        if v != "" {
            val, _ := strconv.Atoi(v)
            pathIds = append(pathIds, val)
        }
    }
    return pathIds
}

Benchmarking the two functions

I put together a quick small project to bench the thing consisting of main.go and main_test.go:

`main.go`

package main

import (
    "strconv"
    "strings"
)

func WithOptim(path string) int {
    pathIds := []int{}
    pathIdsStr := strings.Split(path, "/")
    for _, v := range pathIdsStr {
        if v != "" {
            val, _ := strconv.Atoi(v)
            pathIds = append(pathIds, val)
        }
    }
    return len(pathIds)
}

func WithoutOptim(path string) int {
    pathIds := []int{}
    for _, v := range strings.Split(path, "/") {
        if v != "" {
            val, _ := strconv.Atoi(v)
            pathIds = append(pathIds, val)
        }
    }
    return len(pathIds)
}

`main_test.go`

package main

import (
    "strings"
    "testing"
)

var path = strings.Repeat("734/956/831/811/", 100_000)

var resultWithOptim int

func BenchmarkWithOptim(b *testing.B) {
    var r int
    for n := 0; n < b.N; n++ {
        r = WithOptim(path)
    }
    resultWithOptim = r
}

var resultWithoutOptim int

func BenchmarkWithoutOptim(b *testing.B) {
    var r int
    for n := 0; n < b.N; n++ {
        r = WithoutOptim(path)
    }
    resultWithoutOptim = r
}

The variables r, resultWithOptim and resultWithoutOptim are needed to escape compiler optimizations and get reliable benchmarks.

Check out this section from the awesome benchmarking article by Dave Cheney.

Results

I ran the benchmark a couple of times.¹

In most cases, it was neck-and-neck:

BenchmarkWithoutOptim-8     63   18067614 ns/op
BenchmarkWithOptim-8        64   17833483 ns/op
3.282s

BenchmarkWithoutOptim-8     64   17879142 ns/op
BenchmarkWithOptim-8        66   17943536 ns/op
3.287s

BenchmarkWithoutOptim-8     66   17915000 ns/op
BenchmarkWithOptim-8        66   17567501 ns/op
3.292s

BenchmarkWithoutOptim-8     56   18344452 ns/op
BenchmarkWithOptim-8        57   17649186 ns/op
2.090s

In some cases the one with optimizations performed better:

BenchmarkWithoutOptim-8     64   18446009 ns/op
BenchmarkWithOptim-8        73   18054308 ns/op
3.548s

BenchmarkWithoutOptim-8     63   18466071 ns/op
BenchmarkWithOptim-8        70   18600002 ns/op
3.422s

BenchmarkWithOptim-8        64   18917323 ns/op
BenchmarkWithoutOptim-8     58   18092279 ns/op
3.215s

But in some cases the one without any optimizations performed faster!

BenchmarkWithOptim-8        55   18848131 ns/op
BenchmarkWithoutOptim-8     68   19239865 ns/op
2.395s

BenchmarkWithOptim-8        57   18323290 ns/op
BenchmarkWithoutOptim-8     63   17598465 ns/op
2.206s

Digging into the why

The first resource I came across was the Compiler And Runtime Optimizations wiki which stated nothing about such behaviour.

I also tried converting the code into Assembly using Compiler Explorer but couldn't understand any of it.

Was the compiler calling the function only once? Are the 2 ways of writing same? I had no way of knowing. Then I thought of something so trivial I felt dumb.

The trivial experiment

Consider the following code:

func Nums() []int {
    print("Nums called\n")
    return []int{1, 2, 3, 4, 5}
}

func Loop() {
    for _, v := range Nums() {
        print(v)
    }
}

func main() {
    Loop()
}

And surely enough, Nums called is only printed once to STDOUT.
This felt so obvious once I realised it.

Eureka + Facepalm moment

Googling "go range internals" gave me this article by Robbie V which sent me to the Go language specification where I found this:

The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.

Conclusion

There's nothing much to conclude but there's a lesson to learn. In Robbie V's article, the Step 1 was to RTFM.Unlike that, I dove right into benchmarking which was like comparing two sum functions which returned a + b vs. b + a.

I discovered a lot but this shouldn't have taken so much of my time. It's a reminder on how we can take some slightly interesting things, like the one I talked about here, for granted and never realise the machinery of it.

As to which function is better to write, that would be an opinion.

Other languages

I tried the same thing in a few other languages I've coded in because I never realised this:

JavaScript

function nums() {
    console.debug('Nums called');
    return [1, 2, 3, 4, 5];
}

function loop() {
    for (const v of nums()) {
        console.debug(v);
    }
}

loop();

Python

def nums():
    print("Nums called")
    return [1,2,3,4,5]

def loop():
    for l in nums():
        print(l)

loop()

Using go test -bench=. -run=xxx ↩

Specs:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz

Order of function execution was changed a few times.

DEV Community