<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Assel Meher</title>
    <description>The latest articles on DEV Community by Assel Meher (@segflow).</description>
    <link>https://dev.to/segflow</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F321280%2F4ddc88ca-dd65-4a01-a8ec-7f9ebe6eb687.png</url>
      <title>DEV Community: Assel Meher</title>
      <link>https://dev.to/segflow</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/segflow"/>
    <language>en</language>
    <item>
      <title>My journey optimizing the Go Compiler</title>
      <dc:creator>Assel Meher</dc:creator>
      <pubDate>Tue, 28 Apr 2020 17:01:58 +0000</pubDate>
      <link>https://dev.to/segflow/my-journey-optimizing-the-go-compiler-46jc</link>
      <guid>https://dev.to/segflow/my-journey-optimizing-the-go-compiler-46jc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Original post: &lt;a href="https://segflow.github.io/post/go-compiler-optimization/"&gt;https://segflow.github.io/post/go-compiler-optimization/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At &lt;a href="https://edge.network/"&gt;EDGE&lt;/a&gt; we write a lot of Go, and we love it for various reasons, one of them being speed. One day I got into a situation where I need to assign an &lt;code&gt;int&lt;/code&gt; to a variable based on another string value. &lt;/p&gt;

&lt;p&gt;Sounds easy right? well yes, but this particular use case awakened the beast in me and made me think what's the &lt;strong&gt;best&lt;/strong&gt; way to do it. &lt;/p&gt;

&lt;p&gt;The journey finished by me contributing to the language compiler and make &lt;code&gt;map&lt;/code&gt; lookups faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Situation
&lt;/h2&gt;

&lt;p&gt;Our binaries can be found in 3 flavors, &lt;code&gt;amd64&lt;/code&gt;, &lt;code&gt;arm64&lt;/code&gt;, and &lt;code&gt;arm&lt;/code&gt;. Sometimes a running binary needs to know what is its architecture, for example when pulling images/other binaries if the current running binary is an &lt;code&gt;amd64&lt;/code&gt; binary then we should use the &lt;code&gt;amd64&lt;/code&gt; repository or registry.&lt;/p&gt;

&lt;p&gt;In Go that's easy. The &lt;a href="https://golang.org/pkg/runtime/#pkg-constants"&gt;constant&lt;/a&gt; &lt;code&gt;runtime.GOARCH&lt;/code&gt; gives us the running program's architecture.&lt;/p&gt;

&lt;p&gt;In one case, I needed to assign an &lt;code&gt;int&lt;/code&gt; to a variable based on the value of &lt;code&gt;runtime.GOARCH&lt;/code&gt;. And the code below does exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;archIndex&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"amd64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;archIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"arm64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;archIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"arm"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;archIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;But I didn't want it to be that way because the day we support another architecture I need to add another &lt;code&gt;case&lt;/code&gt; clause and that didn't feel right to me. &lt;/p&gt;

&lt;p&gt;It's a simple value mapping and I though using a &lt;code&gt;map&lt;/code&gt; followed by a lookup would be better. Below was the &lt;code&gt;map&lt;/code&gt; based solution:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;archIndex&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"amd64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}[&lt;/span&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;map&lt;/code&gt; based solution felt more readable and maintainable to me but I was curious which solution was faster? &lt;/p&gt;

&lt;p&gt;&lt;em&gt;The code is not in a hot path, and micro-optimizing it is not needed. But still wanted to know what's faster.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To satisfy my curiosity, I benchmarked both approaches.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;goos: darwin
goarch: amd64
BenchmarkMapImpl-4     19503195       58.0 ns/op
BenchmarkSwitchImpl-4  1000000000     0.648 ns/op
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Turns out the &lt;code&gt;map&lt;/code&gt; based solution is 96 times slower than the &lt;code&gt;switch&lt;/code&gt; based one. To understand why it's the case I start analyzing the generated code for both approaches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compiler generated code
&lt;/h2&gt;

&lt;p&gt;Like any other language compiler, to generate the final output the Go compiler will pass through various phases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scanning&lt;/strong&gt;: Scans the source code and split it into &lt;strong&gt;tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parsing&lt;/strong&gt;: Parses those &lt;strong&gt;tokens&lt;/strong&gt; and build the Abstract Syntax Tree (AST). Also checks that the code is a valid Go code (type checking etc..)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code generating&lt;/strong&gt;: convert the &lt;strong&gt;AST&lt;/strong&gt; to a lower-level representation of the program, specifically into a Static Single Assignment (SSA) form&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the end of the &lt;strong&gt;parsing&lt;/strong&gt; phase, we are certain the program is valid Go code. The interesting phase for our case is the last one. &lt;/p&gt;

&lt;p&gt;The code generation phase takes the &lt;strong&gt;AST&lt;/strong&gt;, applies some optimization to the &lt;strong&gt;AST&lt;/strong&gt; itself by re-writing it, and then convert it into an &lt;strong&gt;SSA&lt;/strong&gt; form. After the initial version of the &lt;strong&gt;SSA&lt;/strong&gt; has been generated, several optimization passes will be applied like "dead code elimination", "constant propagation" and "bound check elimination"&lt;/p&gt;

&lt;p&gt;We can see the work of each optimizer and the final &lt;strong&gt;SSA&lt;/strong&gt; for our function by running this command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GOSSAFUNC=switchImplementation go tool compile benchmark_test.go
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The command generates a html file &lt;a href="https://segflow.github.io/code/go-compiler-optimization/switch-ssa.html"&gt;ssa.html&lt;/a&gt; showing the generated &lt;strong&gt;SSA&lt;/strong&gt; for the function &lt;code&gt;switchImplementation&lt;/code&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  switch based implementation
&lt;/h3&gt;

&lt;p&gt;The final SSA form for our &lt;code&gt;switchImplementation&lt;/code&gt; function looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;00000 (8) TEXT "".switchImplementation(SB), ABIInternal
00001 (8) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00002 (8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00003 (8) FUNCDATA $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
00004 (+11) PCDATA $0, $0
00005 (+11) PCDATA $1, $0

00006 (+11) MOVQ $0, "".~r0(SP)

00007 (11) RET
00008 (?) END
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The first block is the function epilogue where mainly a stack frame needs to be allocated. The second one is the body, and the final block is the function prologue where the functions need to return to its caller. &lt;/p&gt;

&lt;p&gt;The function body in our case is a simple move instruction which moves 0 to the ~&lt;code&gt;r0&lt;/code&gt; registry. So the function is only returning 0 immediately there is nothing else. To confirm this I generated the SSA for the following function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func return0() int {
    return 0
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And the final generated code is exactly the same as you can see it &lt;a href="https://segflow.github.io/code/go-compiler-optimization/return0-ssa.html"&gt;here&lt;/a&gt;. And that's why it's so fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  map based implementation
&lt;/h3&gt;

&lt;p&gt;As for the SSA form of the &lt;code&gt;mapImplementation&lt;/code&gt; function, it's longer, I annotated it so it's easier to understand what's happening.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;00000 (31) TEXT "".mapImplementation(SB), ABIInternal
00001 (31) FUNCDATA $0, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB)
00002 (31) FUNCDATA $1, gclocals·b9237f7ca55cc8bf6e05646631ad00ce(SB)
00003 (31) FUNCDATA $2, gclocals·a5ed3e65458aadaa1d48863859d2a323(SB)
00004 (31) FUNCDATA $3, "".mapImplementation.stkobj(SB)
00005 (+32) PCDATA $0, $0
00006 (+32) PCDATA $1, $1
00007 (+32) XORPS X0, X0
00008 (32) MOVUPS X0, ""..autotmp_2-256(SP)
00009 (32) MOVUPS X0, ""..autotmp_2-240(SP)
00010 (32) MOVUPS X0, ""..autotmp_2-224(SP)
00011 (32) PCDATA $0, $1
00012 (32) PCDATA $1, $2
00013 (32) LEAQ ""..autotmp_3-208(SP), DI
00014 (32) PCDATA $0, $0
00015 (32) LEAQ -48(DI), DI
00016 (32) DUFFZERO $239
00017 (32) PCDATA $0, $2
00018 (32) PCDATA $1, $1
00019 (32) LEAQ ""..autotmp_3-208(SP), AX
00020 (32) PCDATA $0, $0
00021 (32) MOVQ AX, ""..autotmp_2-240(SP)
00022 (32) CALL runtime.fastrand(SB)
00023 (32) MOVL (SP), AX
00024 (32) MOVL AX, ""..autotmp_2-244(SP)
00025 (33) PCDATA $0, $2
00026 (+33) LEAQ type.map[string]int(SB), AX
00027 (33) PCDATA $0, $0
00028 (33) MOVQ AX, (SP)
00029 (33) PCDATA $0, $3
00030 (33) LEAQ ""..autotmp_2-256(SP), CX
00031 (33) PCDATA $0, $0
00032 (33) MOVQ CX, 8(SP)
00033 (33) PCDATA $0, $4
00034 (33) LEAQ go.string."amd64"(SB), DX
00035 (33) PCDATA $0, $0
00036 (33) MOVQ DX, 16(SP)
00037 (33) MOVQ $5, 24(SP)
00038 (+33) CALL runtime.mapassign_faststr(SB)    // assign "amd64" key
00039 (33) PCDATA $0, $2
00040 (33) MOVQ 32(SP), AX
00041 (33) PCDATA $0, $0
00042 (33) MOVQ $0, (AX)                          // assign "0" value
00043 (34) PCDATA $0, $2
00044 (+34) LEAQ type.map[string]int(SB), AX
00045 (34) PCDATA $0, $0
00046 (34) MOVQ AX, (SP)
00047 (34) PCDATA $0, $3
00048 (34) LEAQ ""..autotmp_2-256(SP), CX
00049 (34) PCDATA $0, $0
00050 (34) MOVQ CX, 8(SP)
00051 (34) PCDATA $0, $4
00052 (34) LEAQ go.string."arm"(SB), DX
00053 (34) PCDATA $0, $0
00054 (34) MOVQ DX, 16(SP)
00055 (34) MOVQ $3, 24(SP)
00056 (+34) CALL runtime.mapassign_faststr(SB)    // assign "arm" key
00057 (34) PCDATA $0, $2
00058 (34) MOVQ 32(SP), AX
00059 (34) PCDATA $0, $0
00060 (34) MOVQ $1, (AX)                          // assign "1" value
00061 (35) PCDATA $0, $2
00062 (+35) LEAQ type.map[string]int(SB), AX
00063 (35) PCDATA $0, $0
00064 (35) MOVQ AX, (SP)
00065 (35) PCDATA $0, $3
00066 (35) LEAQ ""..autotmp_2-256(SP), CX
00067 (35) PCDATA $0, $0
00068 (35) MOVQ CX, 8(SP)
00069 (35) PCDATA $0, $4
00070 (35) LEAQ go.string."arm64"(SB), DX
00071 (35) PCDATA $0, $0
00072 (35) MOVQ DX, 16(SP)
00073 (35) MOVQ $5, 24(SP)
00074 (+35) CALL runtime.mapassign_faststr(SB)    // assign "arm64" key
00075 (35) PCDATA $0, $2
00076 (35) MOVQ 32(SP), AX
00077 (35) PCDATA $0, $0
00078 (35) MOVQ $2, (AX)                          // assign "2" value
00079 (36) PCDATA $0, $2
00080 (+36) LEAQ type.map[string]int(SB), AX
00081 (36) PCDATA $0, $0
00082 (36) MOVQ AX, (SP)
00083 (36) PCDATA $0, $2
00084 (36) PCDATA $1, $0
00085 (36) LEAQ ""..autotmp_2-256(SP), AX
00086 (36) PCDATA $0, $0
00087 (36) MOVQ AX, 8(SP)
00088 (36) PCDATA $0, $2
00089 (36) LEAQ go.string."amd64"(SB), AX
00090 (36) PCDATA $0, $0
00091 (36) MOVQ AX, 16(SP)
00092 (36) MOVQ $5, 24(SP)
00093 (+36) CALL runtime.mapaccess1_faststr(SB)  // perform the map lookup
00094 (36) PCDATA $0, $2
00095 (36) MOVQ 32(SP), AX
00096 (36) PCDATA $0, $0
00097 (36) MOVQ (AX), AX
00098 (+32) MOVQ AX, "".~r0(SP)
00099 (+36) RET
00100 (?) END
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The reason behind this is the fact that the generated code is building the map which requires allocating it, assign the different values, and then doing a lookup. &lt;/p&gt;

&lt;h2&gt;
  
  
  Constant folding
&lt;/h2&gt;

&lt;p&gt;The reason why the switch implementation is similar to a &lt;code&gt;return 0&lt;/code&gt; is something called &lt;code&gt;constant folding&lt;/code&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Constant folding is the process of recognizing and evaluating constant expressions at compile time rather than computing them at runtime - Wikipedia&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We know that &lt;code&gt;runtime.GOARCH&lt;/code&gt; is a constant, so not only its value cannot change but also it's known at compile time. The compiler can use this two properties to evaluate constant expression at compile time instead of doing that when running, in our case the compiler knew which of the &lt;code&gt;case&lt;/code&gt; clauses is true so it deleted the conditional structure and replaced it with a naked &lt;code&gt;return 0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This was not the case on the &lt;code&gt;map&lt;/code&gt; based implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implement the optimization
&lt;/h2&gt;

&lt;p&gt;Our map lookup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"amd64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}[&lt;/span&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This is represented in the AST using an &lt;code&gt;INDEXMAP&lt;/code&gt; node. The &lt;code&gt;INDEXMAP&lt;/code&gt; has two childs &lt;code&gt;left&lt;/code&gt; and &lt;code&gt;right&lt;/code&gt; (remember it's a tree).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;left&lt;/code&gt; child is the map we will lookup from, and the &lt;code&gt;right&lt;/code&gt; child is the key we are looking for. Both childs are also nodes, for example the &lt;code&gt;right&lt;/code&gt; node can be a &lt;code&gt;FUNCCALL&lt;/code&gt; node for a lookup like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"amd64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"arm64"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}[&lt;/span&gt;&lt;span class="n"&gt;aRandomFunc&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;At compile time, we can check if both &lt;code&gt;right&lt;/code&gt; and &lt;code&gt;left&lt;/code&gt; nodes are constant, if they are, we see if what are we looking for (the key), is defined in the constant map, and if it's the case we replace the &lt;code&gt;INDEXMAP&lt;/code&gt; node in the AST by the value of that key. This will replace all lookups on maps where the map is an &lt;code&gt;OMAPLIT&lt;/code&gt; and the key is a constant with a constant if possible.&lt;/p&gt;

&lt;p&gt;This optimization is applied directly to the AST and not the SSA form. This type of AST optimization is implemented inside the &lt;code&gt;walk&lt;/code&gt; function. &lt;/p&gt;

&lt;p&gt;The PR with this optimization can be seen here: &lt;a href="https://go-review.googlesource.com/c/go/+/208323"&gt;https://go-review.googlesource.com/c/go/+/208323&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The new generated SSA with that optimization can be found &lt;a href="https://segflow.github.io/code/go-compiler-optimization/optimized-map-ssa.html"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now if we benchmark both implementations using the Go compiler from that branch we see that both are similar. They are both similar to our &lt;code&gt;return 0&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BenchmarkSwitchImpl-4           1000000000               0.599 ns/op           0 B/op          0 allocs/op
BenchmarkMapImpl-4              1000000000               0.612 ns/op           0 B/op          0 allocs/op
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The PR is not merged yet, hopefully soon, it got added to Go 1.15 milestone which should be released in a month.&lt;/p&gt;

&lt;p&gt;Huge thanks to everyone in the #compilers channel in &lt;a href="https://invite.slack.golangbridge.org/"&gt;Gophers&lt;/a&gt; slack&lt;/p&gt;

</description>
      <category>go</category>
      <category>compiler</category>
      <category>optimization</category>
      <category>ast</category>
    </item>
  </channel>
</rss>
