<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Beta Ziliani</title>
    <description>The latest articles on DEV Community by Beta Ziliani (@betaziliani).</description>
    <link>https://dev.to/betaziliani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F817736%2Fca3281c9-dd29-4b4c-9164-8853de44c8eb.jpeg</url>
      <title>DEV Community: Beta Ziliani</title>
      <link>https://dev.to/betaziliani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/betaziliani"/>
    <language>en</language>
    <item>
      <title>Yes, Ruby is fast, but…</title>
      <dc:creator>Beta Ziliani</dc:creator>
      <pubDate>Thu, 09 May 2024 13:53:31 +0000</pubDate>
      <link>https://dev.to/betaziliani/yes-ruby-is-fast-but-1l49</link>
      <guid>https://dev.to/betaziliani/yes-ruby-is-fast-but-1l49</guid>
      <description>&lt;p&gt;John Hawthorn wrote &lt;a href="https://www.johnhawthorn.com/2024/ruby-might-be-faster-than-you-think/" rel="noopener noreferrer"&gt;a nice post&lt;/a&gt; discussing a recent tool to &lt;a href="https://github.com/wouterken/crystalruby" rel="noopener noreferrer"&gt;incorporate Crystal into your Ruby app&lt;/a&gt;. While JH brings an important point, it overlooks certain aspects that are worth consideration. I'll discuss Crystal's &lt;em&gt;real&lt;/em&gt; performance and benefits, highlighting why such Ruby/Crystal integration is an indispensable tool to have on the bench.&lt;/p&gt;

&lt;p&gt;This is also a structured presentation of some comments made on &lt;a href="https://news.ycombinator.com/item?id=40152029" rel="noopener noreferrer"&gt;the Hacker News post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  tl;dr
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;JH makes the case that Ruby has a just-in-time compiler, and that optimizing the Ruby version of the code has a great performance improvement.&lt;/li&gt;
&lt;li&gt;Crystal code doesn't need wrestling to be optimal.&lt;/li&gt;
&lt;li&gt;The comparison is performed within Ruby, that is, incorporating the cost of calling Crystal within Ruby.&lt;/li&gt;
&lt;li&gt;Pure Crystal shows something radically different!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Yes, Ruby is fast
&lt;/h2&gt;

&lt;p&gt;The first point I want to make is that JH is right: we need to be fair to Ruby's JIT compiler (&lt;code&gt;--yjit&lt;/code&gt;), and only consider benchmarks that include it. And, indeed, with it, Ruby gets very nice performance.&lt;/p&gt;

&lt;p&gt;And let me be blunt here: &lt;strong&gt;I love Ruby!&lt;/strong&gt; Ruby is one of my top 5 languages of choice. And a great community with many big companies agree, so I expect Ruby's performance will only increase with time as more improvements gets incorporated into it.&lt;/p&gt;

&lt;p&gt;🔴 &lt;strong&gt;First point:&lt;/strong&gt; Ruby's YJIT is fast!&lt;/p&gt;

&lt;h2&gt;
  
  
  The real performance of JITs and Crystal
&lt;/h2&gt;

&lt;p&gt;Let's compare the execution of Ruby's YJIT, Python PyPy (another JIT compiler), and pure Crystal (that is, without the integration).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ruby:&lt;/strong&gt; On my computer, the numbers for Ruby's YJIT goes on par with those in the post. Each line corresponds to each of the optimizations proposed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ruby &lt;span class="nt"&gt;--yjit&lt;/span&gt; fib.rb
       user     system      total        real
   3.464166   0.022979   3.487145 &lt;span class="o"&gt;(&lt;/span&gt;  3.491493&lt;span class="o"&gt;)&lt;/span&gt;
   1.705869   0.002169   1.708038 &lt;span class="o"&gt;(&lt;/span&gt;  1.710117&lt;span class="o"&gt;)&lt;/span&gt;
   0.187083   0.000318   0.187401 &lt;span class="o"&gt;(&lt;/span&gt;  0.187578&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt; My Python-foo is limited, so I only ported the last problem (a simple while loop) and ran it with &lt;a href="https://pypy.org" rel="noopener noreferrer"&gt;PyPy&lt;/a&gt;. It takes a bit less of time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;  pypy fib.py
0.12447810173
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Crystal:&lt;/strong&gt; When we compile the code with &lt;code&gt;--release&lt;/code&gt;, numbers are insignificant! Not only that, I've added some extra code to make sure the optimizations weren't throwing away important code. So not only I calculate the Fibonacci number of 45 (using an UInt128, to even stretch this further), but I also print the sum of the million runs!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; crystal build &lt;span class="nt"&gt;--release&lt;/span&gt; fib.cr&lt;span class="p"&gt;;&lt;/span&gt; ./fib
        user     system      total        real
  1134903170000000
  0.000002   0.000004   0.000006 &lt;span class="o"&gt;(&lt;/span&gt;  0.000004&lt;span class="o"&gt;)&lt;/span&gt;
  1134903170000000
  0.000001   0.000002   0.000003 &lt;span class="o"&gt;(&lt;/span&gt;  0.000003&lt;span class="o"&gt;)&lt;/span&gt;
  1134903170000000
  0.000002   0.000002   0.000004 &lt;span class="o"&gt;(&lt;/span&gt;  0.000003&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚫ &lt;strong&gt;Second point:&lt;/strong&gt; Pure Crystal is &lt;em&gt;really, really&lt;/em&gt; fast in this benchmark!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reference:&lt;/em&gt; The code I'm using for the benchmarks is listed in this &lt;a href="https://gist.github.com/beta-ziliani/0f815f205e7f5789fbb35653ec17d1a7" rel="noopener noreferrer"&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; As mentioned, the Crystal version uses a primitive number type (&lt;code&gt;UInt128&lt;/code&gt;). That explains a lot of the performance difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crystal compilation optimizes your code
&lt;/h2&gt;

&lt;p&gt;In the timings of the Crystal programs, the first one takes a couple more micro-seconds. However, if we swap the order in which the examples are run, the output is identical: the first one, whichever that is, takes a few micro-seconds more.&lt;/p&gt;

&lt;p&gt;In conclusion, none of the proposed changes to the Ruby version of the code makes a dent in the Crystal version. This is not entirely Crystal's doing: it uses the &lt;a href="https://llvm.org" rel="noopener noreferrer"&gt;LLVM&lt;/a&gt; backend, which generates very optimized binaries.&lt;/p&gt;

&lt;p&gt;Quite frankly, I'm puzzled as to why Ruby's YJIT doesn't optimize this as well. Perhaps it will get there with time (I tested Ruby 3.3.1).&lt;/p&gt;

&lt;p&gt;⚫ &lt;strong&gt;Third point:&lt;/strong&gt; Crystal code is fast, even without tweaks&lt;/p&gt;

&lt;h2&gt;
  
  
  Maybe it's the plumbing that's slow?
&lt;/h2&gt;

&lt;p&gt;Doesn't seem so. But to understand why, we need to discuss an important point: by default, the integration compiles the Crystal code without the &lt;code&gt;--release&lt;/code&gt; flag. This makes sense: during development, you don't want the compilation to take a lot of time. Compiling in release mode makes efficient binaries, but at the cost of significantly increasing the compilation time.&lt;/p&gt;

&lt;p&gt;When I tested the Prime Counting from the README file of the &lt;a href="https://github.com/wouterken/crystalruby" rel="noopener noreferrer"&gt;crystalruby page&lt;/a&gt;, using release mode, the time it takes to run the Crystal code is the same as the one from pure Crystal. For that, one needs to add the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;CrystalRuby&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So perhaps the timings from the Fibonacci example would look the same as with pure Crystal. I say &lt;em&gt;perhaps&lt;/em&gt; because I stumbled across an &lt;a href="https://github.com/wouterken/crystalruby/issues/11" rel="noopener noreferrer"&gt;issue&lt;/a&gt; that turned the integration unusable on that particular example.&lt;/p&gt;

&lt;p&gt;🔴⚫ &lt;strong&gt;Fourth point:&lt;/strong&gt; The integration doesn't produce efficient Crystal code by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crystal/Ruby integration revisited
&lt;/h2&gt;

&lt;p&gt;Crystal and Ruby are two wonderful languages, each with their pros and cons. Crystal's performance and low memory footprint is hardly contested, and can further be studied in &lt;a href="https://programming-language-benchmarks.vercel.app/" rel="noopener noreferrer"&gt;the benchmarks of language and compilers&lt;/a&gt; (but be critical about benchmarks!).&lt;/p&gt;

&lt;p&gt;Performance is not the only advantage of Crystal: its typechecker is another benefit that teams might want to use for safety-critical parts of an application. Or maybe there is an interesting shard to call from a gem… Whatever the reason, integrating Crystal code into Ruby is a very appealing tool to have in the dev toolbox.&lt;/p&gt;

&lt;p&gt;It is common to call C functions from Ruby or Crystal. It's interesting to know that there are alternatives to bridge these two languages that share the same goal of writing beautiful programs, using a similar syntax. The mentioned &lt;a href="https://github.com/wouterken/crystalruby" rel="noopener noreferrer"&gt;crystalruby&lt;/a&gt; gem allows interfacing Ruby programs with Crystal, and the shard &lt;a href="https://github.com/Anyolite/anyolite" rel="noopener noreferrer"&gt;anyolite&lt;/a&gt; allows calling Ruby programs from Crystal.&lt;/p&gt;

&lt;p&gt;🔴⚫ &lt;strong&gt;Fifth point:&lt;/strong&gt; Ruby + Crystal FTW! ❤️&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EDIT:&lt;/strong&gt; I got twice a very good question: how do we know LLVM isn't optimizing it &lt;em&gt;that much&lt;/em&gt;, that it just replaces the call to the Fibonacci function with the result? After all, the argument is fixed, it can calculate how much the result will be and just place that.&lt;/p&gt;

&lt;p&gt;I missed this point in the post, although I originally thought about it. At the time of writing, I tried adding &lt;code&gt;45 + rand(1)&lt;/code&gt; as argument. This ensures the argument is not a literal number. It certainly impacts in the overall performance, and now it takes 1ms. Still very good, because it also counts the calls to &lt;code&gt;rand&lt;/code&gt;! This is why I didn't see a problem and forgot to add this to the article.&lt;/p&gt;

&lt;p&gt;However, with further inspection of the LLVM generated code, I found more! It optimizes the code nevertheless! It produces a sum of 1134903170 (result of &lt;code&gt;fib(45)&lt;/code&gt;) with the million calls to &lt;code&gt;rand(1)&lt;/code&gt;! I was totally mind-blowed by this. In any case, point to LLVM, and for Crystal to use it!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EDIT 2:&lt;/strong&gt; GitHub's user &lt;code&gt;@petr-fischer&lt;/code&gt; suggested to take the argument from the command line, in order to force LLVM to not optimize that much. With that change, times changes significantly, in particular we can see a difference from the second to the third version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;        user     system      total        real
    0.034982   0.000266   0.035248 &lt;span class="o"&gt;(&lt;/span&gt;  0.035400&lt;span class="o"&gt;)&lt;/span&gt;
    0.034268   0.000134   0.034402 &lt;span class="o"&gt;(&lt;/span&gt;  0.034522&lt;span class="o"&gt;)&lt;/span&gt;
    0.023234   0.000140   0.023374 &lt;span class="o"&gt;(&lt;/span&gt;  0.023607&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I don't think the takeaways are any different: we're still talking of a significant reduction w.r.t. to the Ruby or Python versions.  And as mentioned already, let me stress that a big part of this is using a primitive type (check &lt;a href="https://crystal-lang.org/2016/07/15/fibonacci-benchmark/" rel="noopener noreferrer"&gt;this post by Ary&lt;/a&gt; that George Dietrich recommended in the &lt;a href="https://forum.crystal-lang.org/t/the-crystalruby-gem-through-the-crystal-lens/6832/1" rel="noopener noreferrer"&gt;forum&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ohtj11a948ylyepiptq.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ohtj11a948ylyepiptq.jpeg" alt="A generated image of a red polyhedron with some black tints" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>crystal</category>
      <category>performance</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
