<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Taras Tsugrii</title>
    <description>The latest articles on DEV Community by Taras Tsugrii (@ttsugrii).</description>
    <link>https://dev.to/ttsugrii</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F620327%2F44957fc7-130a-4ca7-8454-a837a735f20d.jpg</url>
      <title>DEV Community: Taras Tsugrii</title>
      <link>https://dev.to/ttsugrii</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ttsugrii"/>
    <language>en</language>
    <item>
      <title>EIP-55: Mixed-case checksum address encoding.</title>
      <dc:creator>Taras Tsugrii</dc:creator>
      <pubDate>Sat, 24 Apr 2021 21:32:52 +0000</pubDate>
      <link>https://dev.to/ttsugrii/eip-55-mixed-case-checksum-address-encoding-3ckb</link>
      <guid>https://dev.to/ttsugrii/eip-55-mixed-case-checksum-address-encoding-3ckb</guid>
      <description>&lt;p&gt;Mistakes can get fairly expensive when it comes to dealing with money. Because of this, Bitcoin implemented error detection using checksums to catch attempts to use invalid addresses. Ethereum decided to take more compositional approach and delegate this functionality to higher-level systems, but while they are being created and adopted many users are losing their money. Implementing checksum is fairly easy, but the tricky part is doing it in a backwards compatible way. Take a minute and try to come up with a way to add checksums to Ethereum addresses without changing them. Hint:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ethereum Addresses are based on the Hexadecimal format (also base16 or hex). [...]. Ethereum addresses are not case sensitive and can be used as lowercase or uppercase.&lt;br&gt;
Since addresses are case insensitive, what if we use casing to embed checksum information? That's exactly what Vitalik &lt;a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md"&gt;proposed in EIP-55&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;checksum_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Takes a 20-byte binary address as input
&lt;/span&gt;    &lt;span class="n"&gt;hex_addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;

    &lt;span class="c1"&gt;# Treat the hex address as ascii/utf-8 for keccak256 hashing
&lt;/span&gt;    &lt;span class="n"&gt;hashed_address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eth_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keccak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hex_addr&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Iterate over each character in the hex address
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;nibble_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hex_addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;"0123456789"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# We can't upper-case the decimal digits
&lt;/span&gt;            &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;"abcdef"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Check if the corresponding hex digit (nibble) in the hash is 8 or higher
&lt;/span&gt;            &lt;span class="n"&gt;hashed_address_nibble&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashed_address&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;nibble_index&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;hashed_address_nibble&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;eth_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"Unrecognized hex character &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;character&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at position &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;nibble_index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"0x"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basically we are computing a checksum using &lt;code&gt;keccak256&lt;/code&gt; and use its hex representation as a mask to decide whether hex digit of the address should be upper-cased based on a value of the checksum at the same position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;...&lt;/span&gt;
            &lt;span class="c1"&gt;# Check if the corresponding hex digit (nibble) in the hash is 8 or higher
&lt;/span&gt;            &lt;span class="n"&gt;hashed_address_nibble&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashed_address&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;nibble_index&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;hashed_address_nibble&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;checksummed_buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;character&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Older clients ignore the case and don't perform the check, hence backwards compatibility, but newer clients perform the check and detect errors.&lt;br&gt;
It's fascinating that such a simple, cheap and backwards compatible technique&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This underscores the value of specification, since it defines behavior, and is a great example of clever way to embed additional information.&lt;/p&gt;

</description>
      <category>ethereum</category>
      <category>blockchain</category>
      <category>checksum</category>
      <category>errorcorrection</category>
    </item>
    <item>
      <title>Trading memory for fewer allocations.</title>
      <dc:creator>Taras Tsugrii</dc:creator>
      <pubDate>Sat, 24 Apr 2021 21:30:10 +0000</pubDate>
      <link>https://dev.to/ttsugrii/trading-memory-for-fewer-allocations-25lm</link>
      <guid>https://dev.to/ttsugrii/trading-memory-for-fewer-allocations-25lm</guid>
      <description>&lt;p&gt;Even though memory allocations are not always easy to spot, they are fairly expensive due to overhead and garbage collector load. In a seemingly innocent &lt;a href="https://golang.design/gossa?id=643aadbf-d10d-4d74-8cf6-e33ec1e73ae1"&gt;function&lt;/a&gt; &lt;code&gt;print_int&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;print_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;compiler claims (&lt;code&gt;-gcflags="-m"&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x escapes to heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;so &lt;code&gt;runtime.convT64&lt;/code&gt; function is used to convert &lt;code&gt;x&lt;/code&gt; into a pointer&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;00007 (6) CALL runtime.convT64(SB)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which is implemented in &lt;a href="https://github.com/golang/go/blob/e8700f1ce6f4103207f470cce443f04377baa600/src/runtime/iface.go#L360-L368"&gt;runtime.iface.go&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;convT64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="n"&gt;unsafe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;staticuint64s&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;unsafe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pointer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;staticuint64s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mallocgc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uint64Type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most interesting bit is &lt;code&gt;x = unsafe.Pointer(&amp;amp;staticuint64s[val])&lt;/code&gt; part, which returns a pointer from a &lt;a href="https://github.com/golang/go/blob/e8700f1ce6f4103207f470cce443f04377baa600/src/runtime/iface.go#L521-L555"&gt;preallocated&lt;/a&gt; &lt;code&gt;staticuint64s&lt;/code&gt; pool of ints between &lt;code&gt;0&lt;/code&gt; and &lt;code&gt;255&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// staticuint64s is used to avoid allocating in convTx for small integer values.&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;staticuint64s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="m"&gt;0x00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x02&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x04&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x06&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x07&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="m"&gt;0x08&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x09&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x0f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="m"&gt;0xf8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xf9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xfa&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xfb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xfc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xfd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xfe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's a fairly cheap way to trade a little memory for allocation reduction and is used in many other managed languages like Java. To make it even more useful, runtime reuses the same cache also for &lt;code&gt;convT16&lt;/code&gt; and &lt;code&gt;convT32&lt;/code&gt; functions.&lt;/p&gt;

&lt;p&gt;It's such a useful technique that it's also used for dynamic caches, also known as object pools, but the extra flexibility of not being limited by a fairly small range of values comes at a cost of synchronization, so it's important to measure this overhead when evaluating object pools.&lt;/p&gt;

&lt;p&gt;In summary, consider using static preallocated caches to reduce allocation count and improve performance.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>memory</category>
      <category>go</category>
      <category>allocations</category>
    </item>
    <item>
      <title>Don't leave easy performance wins on the table.</title>
      <dc:creator>Taras Tsugrii</dc:creator>
      <pubDate>Sat, 24 Apr 2021 21:26:53 +0000</pubDate>
      <link>https://dev.to/ttsugrii/don-t-leave-easy-performance-wins-on-the-table-2og</link>
      <guid>https://dev.to/ttsugrii/don-t-leave-easy-performance-wins-on-the-table-2og</guid>
      <description>&lt;p&gt;One of the most exciting feature of Java 16 is vector API (JEP 338) that makes it possible to take advantage of available SIMD instructions and by doing so significantly improve performance.&lt;/p&gt;

&lt;p&gt;When reading an example from JEP &lt;a href="https://openjdk.java.net/jeps/338?source=:em:nw:mt::::RC_WWMK200429P00043:NSL400139911"&gt;documentation&lt;/a&gt; I was somewhat shocked to see that a simple scalar computation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;scalarComputation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0f&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;has to be rewritten as a hardly readable&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;VectorSpecies&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;SPECIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FloatVector&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SPECIES_PREFERRED&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;vectorComputation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;VectorSpecies&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;species&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;upperBound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;species&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;loopBound&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;upperBound&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;species&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//FloatVector va, vb, vc;&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FloatVector&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromArray&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;species&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;vb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FloatVector&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromArray&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;species&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;vc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;va&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;
                    &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vb&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;
                    &lt;span class="n"&gt;neg&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;vc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;intoArray&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.0f&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;vectorComputation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;SPECIES&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to get the desired vectorized assembly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  0.43%  / │  0x0000000113d43890: vmovdqu 0x10(%r8,%rbx,4),%ymm0
  7.38%  │ │  0x0000000113d43897: vmovdqu 0x10(%r10,%rbx,4),%ymm1
  8.70%  │ │  0x0000000113d4389e: vmulps %ymm0,%ymm0,%ymm0
  5.60%  │ │  0x0000000113d438a2: vmulps %ymm1,%ymm1,%ymm1
 13.16%  │ │  0x0000000113d438a6: vaddps %ymm0,%ymm1,%ymm0
 21.86%  │ │  0x0000000113d438aa: vxorps -0x7ad76b2(%rip),%ymm0,%ymm0
  7.66%  │ │  0x0000000113d438b2: vmovdqu %ymm0,0x10(%r9,%rbx,4)
 26.20%  │ │  0x0000000113d438b9: add    $0x8,%ebx
  6.44%  │ │  0x0000000113d438bc: cmp    %r11d,%ebx
         \ │  0x0000000113d438bf: jl     0x0000000113d43890
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Phew, great that I don't use Java", thought I and went on to see what would Go do in such case. To my big disappointment, Go does not seem to support SIMD intrinsics and generates &lt;a href="https://golang.design/gossa?id=8a28c38c-73b9-4cfc-b785-d9b7213ecd6b"&gt;non-vectorized assembly&lt;/a&gt; :(&lt;/p&gt;

&lt;p&gt;Convinced that Clang would not disappoint me I checked the assembly using compiler explorer with highest level of optimization and noticed that even though it does a lot of useful optimizations, including loop unrolling, it's still &lt;a href="https://godbolt.org/z/vWMMzsaWP"&gt;using&lt;/a&gt; only 128bit XMM registers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
.LBB0_6: # =&amp;gt;This Inner Loop Header: Depth=1
movups xmm1, xmmword ptr [rsi + 4*rax]
movups xmm2, xmmword ptr [rsi + 4*rax + 16]
mulps xmm1, xmm1
mulps xmm2, xmm2
movups xmm3, xmmword ptr [rdx + 4*rax]
movups xmm4, xmmword ptr [rdx + 4*rax + 16]
mulps xmm3, xmm3
addps xmm3, xmm1
mulps xmm4, xmm4
addps xmm4, xmm2
xorps xmm3, xmm0
xorps xmm4, xmm0
movups xmmword ptr [rcx + 4*rax], xmm3
movups xmmword ptr [rcx + 4*rax + 16], xmm4
add rax, 8
cmp rdi, rax
jne .LBB0_6
cmp rdi, r8
je .LBB0_13
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but easily &lt;a href="https://godbolt.org/z/Kf7jd1nTr"&gt;switches to 512bit ZMM registers&lt;/a&gt; when foundation AVX 512 support is requested via &lt;code&gt;-mavx512f&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
.LBB0_8: # =&amp;gt;This Inner Loop Header: Depth=1
vmovups zmm1, zmmword ptr [rsi + 4*rdi]
vmovups zmm2, zmmword ptr [rsi + 4*rdi + 64]
vmovups zmm3, zmmword ptr [rsi + 4*rdi + 128]
vmovups zmm4, zmmword ptr [rsi + 4*rdi + 192]
vmulps zmm1, zmm1, zmm1
vmulps zmm2, zmm2, zmm2
vmulps zmm3, zmm3, zmm3
vmulps zmm4, zmm4, zmm4
vmovups zmm5, zmmword ptr [rdx + 4*rdi]
vmovups zmm6, zmmword ptr [rdx + 4*rdi + 64]
vmovups zmm7, zmmword ptr [rdx + 4*rdi + 128]
vmovups zmm8, zmmword ptr [rdx + 4*rdi + 192]
vmulps zmm5, zmm5, zmm5
vaddps zmm1, zmm1, zmm5
vmulps zmm5, zmm6, zmm6
vaddps zmm2, zmm2, zmm5
vmulps zmm5, zmm7, zmm7
vaddps zmm3, zmm3, zmm5
vmulps zmm5, zmm8, zmm8
vaddps zmm4, zmm4, zmm5
vpxord zmm1, zmm1, zmm0
vpxord zmm2, zmm2, zmm0
vpxord zmm3, zmm3, zmm0
vpxord zmm4, zmm4, zmm0
vmovdqu64 zmmword ptr [rcx + 4*rdi], zmm1
vmovdqu64 zmmword ptr [rcx + 4*rdi + 64], zmm2
vmovdqu64 zmmword ptr [rcx + 4*rdi + 128], zmm3
vmovdqu64 zmmword ptr [rcx + 4*rdi + 192], zmm4
add rdi, 64
cmp rax, rdi
jne .LBB0_8
cmp rax, r8
je .LBB0_19
test r8b, 56
je .LBB0_14
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AVX 512 support was added by Intel with the Haswell processor, which shipped in 2013, so it's very likely that in 2021 your servers have it.&lt;/p&gt;

&lt;p&gt;Moral of the story?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don't leave performance on the table - know your hardware and how to take full advantage of it.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>performance</category>
      <category>java</category>
      <category>vectorization</category>
      <category>simd</category>
    </item>
    <item>
      <title>Trust performance advice... but verify.</title>
      <dc:creator>Taras Tsugrii</dc:creator>
      <pubDate>Sat, 24 Apr 2021 21:24:14 +0000</pubDate>
      <link>https://dev.to/ttsugrii/trust-performance-advice-but-verify-3dj8</link>
      <guid>https://dev.to/ttsugrii/trust-performance-advice-but-verify-3dj8</guid>
      <description>&lt;p&gt;Just like most things, performance advice also has an expiration date and it's important to keep this in mind when applying suggestions from blog posts or StackOverflow. That's why my favorite command line flag when building Go apps is &lt;code&gt;--gcflags="-m=2"&lt;/code&gt;. It enables printing optimizer decisions, making it possible to answer numerous interesting questions affecting performance, including whether function is inlined:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cannot inline main: function too complex: cost 523 exceeds budget 80
...
inlining call to largeReturner func() Large { return Large literal }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or whether variable escapes to heap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main.go:10:2: i escapes to heap:
main.go:10:2:  flow: ~r0 = &amp;amp;i:
main.go:10:2:    from &amp;amp;i (address-of) at main.go:11:9
main.go:10:2:    from return &amp;amp;i (return) at main.go:11:2
main.go:10:2: moved to heap: i
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bonus:&lt;/strong&gt; do you think l will escape to heap in the following code?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="m"&gt;24&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Large&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="m"&gt;25&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;
&lt;span class="m"&gt;26&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="m"&gt;27&lt;/span&gt;
&lt;span class="m"&gt;28&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="m"&gt;29&lt;/span&gt;        &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Large&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="m"&gt;30&lt;/span&gt;        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you guessed yes, you are right :)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main.go:30:13: l escapes to heap:
main.go:30:13:  flow: ~arg0 = &amp;amp;{storage for l}:
main.go:30:13:    from l (spill) at main.go:30:13
main.go:30:13:    from ~arg0 = &amp;lt;N&amp;gt; (assign-pair) at main.go:30:13
main.go:30:13:  flow: {storage for []interface {} literal} = ~arg0:
main.go:30:13:    from []interface {} literal (slice-literal-element) at main.go:30:13
main.go:30:13:  flow: fmt.a = &amp;amp;{storage for []interface {} literal}:
main.go:30:13:    from []interface {} literal (spill) at ./main.go:30:13
main.go:30:13:    from fmt.a = []interface {} literal (assign) at main.go:30:13
main.go:30:13:  flow: {heap} = *fmt.a:
main.go:30:13:    from fmt.Fprintln(io.Writer(os.Stdout), fmt.a...) (call parameter) at main.go:30:13
main.go:30:13: l escapes to heap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why it matters? Even if the additional memory indirection is not a problem, heap allocations result in extra work for garbage collector and dreaded stop-the-world pauses.&lt;/p&gt;

&lt;p&gt;Moral of the story? Know your tools and use them to verify all performance assumptions.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>go</category>
      <category>compiler</category>
      <category>optimization</category>
    </item>
  </channel>
</rss>
