Posted onJul 2, 2022 • Originally published atabhijithota.me onJul 2, 2022

Nerd-sniping myself over Go range expressions

#go #rtfm #benchmarking #internals

Originally published onabhijithota.me

The background

I stumbled across a situation while writing a Golang project where I needed to loop over a slice returned by a function.
The function beingstrings.Split. No big deal. I wrote the following:

for_,id:=rangestrings.Split(path,"/"){// Something()}

There's another way of writing the same thing:

splitPath:=strings.Split(path,"/")for_,id:=rangesplitPath{// Something()}

To the novice-eyed, like me, it would seem the latter was anoptimized version as the function is only called once before the loop and hence moreperformant.

I did not care about the negligible performance gain I would get as thepath variable passed to the split function would be limited in length by application constraints.

But what if it wasn't?

tldr; It doesn't matter how you write it. They are both the same.
From theGolang spec,
The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.
This seems obvious once you know about it because it's such a low-hanging fruit for the compiler. It's not even compiler optimization. It's compiler duty.

From here, I'm going to write about how Inerd-sniped myself into finding the answer to this and naturally, spent around 4 hours benchmarking, trying to figure out assembly, checking the compiler optimization wiki, some trivial experimentation that confirmed my theory and one line in the spec that made it all come together.

Before we proceed into this, have some context about the original scenario. The function should split a separated string calledpath with/ as delimiter and extract all integers into an array like this:

funcExtractInts(pathstring)[]int{pathIds:=[]int{}for_,v:=rangestrings.Split(path,"/"){ifv!=""{val,_:=strconv.Atoi(v)pathIds=append(pathIds,val)}}returnpathIds}

Benchmarking the two functions

I put together a quick small project to bench the thing consisting ofmain.go andmain_test.go:

`main.go`

packagemainimport("strconv""strings")funcWithOptim(pathstring)int{pathIds:=[]int{}pathIdsStr:=strings.Split(path,"/")for_,v:=rangepathIdsStr{ifv!=""{val,_:=strconv.Atoi(v)pathIds=append(pathIds,val)}}returnlen(pathIds)}funcWithoutOptim(pathstring)int{pathIds:=[]int{}for_,v:=rangestrings.Split(path,"/"){ifv!=""{val,_:=strconv.Atoi(v)pathIds=append(pathIds,val)}}returnlen(pathIds)}

`main_test.go`

packagemainimport("strings""testing")varpath=strings.Repeat("734/956/831/811/",100_000)varresultWithOptimintfuncBenchmarkWithOptim(b*testing.B){varrintforn:=0;n<b.N;n++{r=WithOptim(path)}resultWithOptim=r}varresultWithoutOptimintfuncBenchmarkWithoutOptim(b*testing.B){varrintforn:=0;n<b.N;n++{r=WithoutOptim(path)}resultWithoutOptim=r}

The variablesr,resultWithOptim andresultWithoutOptim are needed to escape compiler optimizations and get reliable benchmarks.
Check outthis section from theawesome benchmarking article by Dave Cheney.

Results

I ran the benchmark a couple of times.¹

In most cases, it was neck-and-neck:

BenchmarkWithoutOptim-8     63   18067614 ns/opBenchmarkWithOptim-8        64   17833483 ns/op3.282sBenchmarkWithoutOptim-8     64   17879142 ns/opBenchmarkWithOptim-8        66   17943536 ns/op3.287sBenchmarkWithoutOptim-8     66   17915000 ns/opBenchmarkWithOptim-8        66   17567501 ns/op3.292sBenchmarkWithoutOptim-8     56   18344452 ns/opBenchmarkWithOptim-8        57   17649186 ns/op2.090s

In some cases the one withoptimizations performed better:

BenchmarkWithoutOptim-8     64   18446009 ns/opBenchmarkWithOptim-8        73   18054308 ns/op3.548sBenchmarkWithoutOptim-8     63   18466071 ns/opBenchmarkWithOptim-8        70   18600002 ns/op3.422sBenchmarkWithOptim-8        64   18917323 ns/opBenchmarkWithoutOptim-8     58   18092279 ns/op3.215s

But in some cases the one without anyoptimizations performed faster!

BenchmarkWithOptim-8        55   18848131 ns/opBenchmarkWithoutOptim-8     68   19239865 ns/op2.395sBenchmarkWithOptim-8        57   18323290 ns/opBenchmarkWithoutOptim-8     63   17598465 ns/op2.206s

Digging into the why

The first resource I came across was theCompiler And Runtime Optimizations wiki which stated nothing about such behaviour.

I also tried converting the code into Assembly usingCompiler Explorer but couldn't understand any of it.

Was the compiler calling the function only once? Are the 2 ways of writing same? I had no way of knowing. Then I thought of something so trivial I felt dumb.

The trivial experiment

Consider the following code:

funcNums()[]int{print("Nums called\n")return[]int{1,2,3,4,5}}funcLoop(){for_,v:=rangeNums(){print(v)}}funcmain(){Loop()}

And surely enough,Nums called is only printed once toSTDOUT.
This felt so obvious once I realised it.

Eureka + Facepalm moment

Googling"go range internals" gave methis article by Robbie V which sent me to theGo language specification where I found this:

The range expression x is evaluated once before beginning the loop, with one exception: if at most one iteration variable is present and len(x) is constant, the range expression is not evaluated.

Conclusion

There's nothing much to conclude but there's a lesson to learn. In Robbie V's article, the Step 1 was toRTFM.Unlike that, I dove right into benchmarking which was like comparing two sum functions which returneda + b vs.b + a.

I discovered a lot but this shouldn't have taken so much of my time. It's a reminder on how we can take some slightly interesting things, like the one I talked about here, for granted and never realise the machinery of it.

As to which function is better to write, that would be an opinion.

Other languages

I tried the same thing in a few other languages I've coded in because I never realised this:

JavaScript

functionnums(){console.debug('Nums called');return[1,2,3,4,5];}functionloop(){for(constvofnums()){console.debug(v);}}loop();

Python

defnums():print("Nums called")return[1,2,3,4,5]defloop():forlinnums():print(l)loop()

Usinggo test -bench=. -run=xxx ↩
Specs:
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
Order of function execution was changed a few times.