<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lex Plt</title>
    <description>The latest articles on DEV Community by Lex Plt (@lexplt).</description>
    <link>https://dev.to/lexplt</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F40778%2Fee207d76-6950-4d5e-a11f-9b8f01bfd8db.png</url>
      <title>DEV Community: Lex Plt</title>
      <link>https://dev.to/lexplt</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lexplt"/>
    <language>en</language>
    <item>
      <title>How does quoting works in ArkScript macros?</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Sun, 18 May 2025 10:00:00 +0000</pubDate>
      <link>https://dev.to/lexplt/how-does-quoting-works-in-arkscript-macros-m6p</link>
      <guid>https://dev.to/lexplt/how-does-quoting-works-in-arkscript-macros-m6p</guid>
      <description>&lt;p&gt;The other night, I was talking about meta programming to other developers, and at one point someone asked how macros could be used to do meta programming. They were probably thinking about &lt;em&gt;C type of macros&lt;/em&gt;, which are powerful but are just text processor tools.&lt;/p&gt;

&lt;p&gt;Using macros you could have the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;Compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// doing something&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and transform it into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="nf"&gt;Compute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="n"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// doing something&lt;/span&gt;
  &lt;span class="n"&gt;_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;h2&gt;
  
  
  💡 Macros
&lt;/h2&gt;

&lt;p&gt;They are a tool to manipulate code, with code. It allows one to write generic code and have the compiler do the heavy lifting and monomorphize your code.&lt;/p&gt;
&lt;h2&gt;
  
  
  📝 Note
&lt;/h2&gt;

&lt;p&gt;What's &lt;strong&gt;meta programming&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Metaprogramming" rel="noopener noreferrer"&gt;According to Wikipedia&lt;/a&gt;, it's a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modify itself, while running. In some cases, this allows programmers to minimize the number of lines of code to express a solution, in turn reducing development time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then someone asked how we could implement the following code in ArkScript (not working by the way, because Rust macros are hygienic):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;macro_rules!&lt;/span&gt; &lt;span class="n"&gt;using_a&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e:expr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;$e&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;four&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;using_a!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I quickly threw this together, and sure enough, ArkScript let it pass as its macros aren't hygienic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt; &lt;span class="nv"&gt;using_a&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;e&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;four&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;using_a&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;/&lt;/span&gt; &lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="nv"&gt;four&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="mf"&gt;4.2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Homoiconicity
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A concept that gets talked about quite frequently in the context of macros.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It is the ability to manipulate code in the language, using code from the same language. Code is data, and data is code. Lisp is the most common homoiconic language, but we can also cite all its dialects (Clojure, Scheme, Racket...), as well as Rebol, and you guessed it, ArkScript.&lt;/p&gt;

&lt;p&gt;Using parentheses to represent S-expressions in Lisp inspired languages helps as you can represent your AST in code, and AST is also data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quoting?
&lt;/h2&gt;

&lt;p&gt;In ArkScript macros, there is no capture of variables, you play directly with the AST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@&lt;/span&gt; &lt;span class="nv"&gt;val&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nv"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;becomes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As long as you are inside a macro, the AST node isn't evaluated unless it is involved in an expression that can be evaluated at compile time (eg &lt;code&gt;(+ 1 arg)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Alas it comes at a cost and expressions can be evaluated at compile when you didn't mean to, if the example was passed something like &lt;code&gt;(+ 1 2)&lt;/code&gt; it would print 2! To prevent this behavior I thought I just needed to have a way to stop macro evaluation and added &lt;code&gt;$as-is&lt;/code&gt; to paste nodes in the AST as-is, stopping further macro evaluation on them.&lt;/p&gt;

&lt;p&gt;This is particularly useful in the testing framework that relies on many macros, we can write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;test:suite&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;test:expect&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;some_computation&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the &lt;code&gt;test:xxx&lt;/code&gt; macros use &lt;code&gt;($as-is arg)&lt;/code&gt; to escape each argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt; &lt;span class="nv"&gt;test:expect&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;_cond&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;_desc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;!=&lt;/span&gt; &lt;span class="nv"&gt;true&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$as-is&lt;/span&gt; &lt;span class="nv"&gt;_cond&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;testing:_report_error&lt;/span&gt; &lt;span class="nv"&gt;true&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$as-is&lt;/span&gt; &lt;span class="nv"&gt;_cond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$repr&lt;/span&gt; &lt;span class="nv"&gt;_cond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;_desc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;testing:_report_success&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TL;DR: I reinvented (a less powerful) quote&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/quoting_in_arkscript/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>arkscript</category>
      <category>programming</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Instruction source location tracking in ArkScript</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Sat, 17 May 2025 10:00:00 +0000</pubDate>
      <link>https://dev.to/lexplt/instruction-source-location-tracking-in-arkscript-a89</link>
      <guid>https://dev.to/lexplt/instruction-source-location-tracking-in-arkscript-a89</guid>
      <description>&lt;p&gt;Good error reporting is crucial in programming languages. Doing it at compile time was easy in ArkScript as we have all the context we need at hand, but since we compile code down to bytecode, which is then run by the virtual machine, we loose a lot of context: the source files, their path, their content, we don't have that anymore! Our runtime errors could only show the VM internal state. This article is about how it all changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multiple solutions
&lt;/h2&gt;

&lt;p&gt;I went to the drawing board, and three solutions presented themselves to me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create a source location table in the bytecode, mapping an instruction to a file and line ;&lt;/li&gt;
&lt;li&gt;emit special instructions that would be skipped by the VM, and used only when the VM crashed, backtracking to find the nearest &lt;code&gt;SOURCE&lt;/code&gt; instruction ;&lt;/li&gt;
&lt;li&gt;extend the size of instructions to 8 bytes and use the 4 new bytes to track the source file (eg 2 bytes for an identifier) and line (2 bytes for the line seemed enough to track 64k+ lines of files) ;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The second one was off the table pretty quickly, because I had a hunch it would hinder performances too much to my liking. It would also disrupt the IR optimizer, and I would have had to detect more instruction patterns as &lt;code&gt;SOURCE&lt;/code&gt; instructions could be interleaved with optimizable instructions.&lt;/p&gt;

&lt;p&gt;The third solution felt like a lot of work for a small gain, as it would be used only when handling errors. It would also double the size of the bytecode files, and lock the future evolutions of the VM as I wouldn't be able to use those additional 4 bytes for anything else.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  📝 Note
&lt;/h2&gt;

&lt;p&gt;As &lt;a href="https://old.reddit.com/r/ProgrammingLanguages/comments/1kcef2l/comment/mq2ibt5/" rel="noopener noreferrer"&gt;Robert Nystrom noted&lt;/a&gt; on Reddit, making the bytecode larger the VM would have more cache misses, making performance worse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As you might have guessed, I went with the first solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;The source location data is added to each AST node by the parser, and the last compiler pass that can access the AST is the AST lowerer, whose job is to generate the IR, hence it felt logical to add two fields, &lt;code&gt;source_file&lt;/code&gt; and &lt;code&gt;source_line&lt;/code&gt; to the &lt;code&gt;IR::Entity&lt;/code&gt; structure.&lt;/p&gt;

&lt;p&gt;Bonus point: using this source tracking solution, there are (&lt;em&gt;nearly&lt;/em&gt;) 0 modifications on the IR Optimizer! &lt;em&gt;Nearly 0&lt;/em&gt;, because I had to add source location to each optimized IR entity from the compacted IR entities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which instruction should we track?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;All of them!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You might ask yourself, "But wouldn't this make the generated bytecode twice as big, as solution 3 would?", and you're partly right. To make this right, we have to introduce de-duplication!&lt;/p&gt;

&lt;p&gt;The proposed solution was to track every instruction source location, but many instructions would point to the same file and line, as a single statement like &lt;code&gt;(if (&amp;lt; value 5) (print "you can pass!") (go-to-jail))&lt;/code&gt; would involve around 10-12 instructions.&lt;/p&gt;

&lt;p&gt;If we keep track of the source location of the first instruction in a series of instructions from the same file and on the same line, that's more than enough!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;InstLocation&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;nullopt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hasValidSourceLocation&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// we are guaranteed to have a value since we listed all&lt;/span&gt;
    &lt;span class="c1"&gt;// existing filenames in IRCompiler::process before,&lt;/span&gt;
    &lt;span class="c1"&gt;// thus we do not have to check if std::ranges::find&lt;/span&gt;
    &lt;span class="c1"&gt;// returned a valid iterator.&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;file_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;m_filenames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ranges&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m_filenames&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;())));&lt;/span&gt;

    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;internal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;InstLoc&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;nullopt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;locations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
      &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;locations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;back&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

     &lt;span class="c1"&gt;// skip redundant instruction location&lt;/span&gt;
     &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;has_value&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
           &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;filename_id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;file_id&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
           &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sourceLine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
           &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
       &lt;span class="n"&gt;locations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
           &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inst_pointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sourceLine&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;IR&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Kind&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exploiting the new source location for our errors
&lt;/h3&gt;

&lt;p&gt;We now need to track filenames and (page, instruction, filename id, line) tuples so that we have source locations for our errors. Those are split in two data tables in the bytecode.&lt;/p&gt;

&lt;p&gt;Having those tables, given a page pointer and instruction pointer, we can retrieve the nearest source location information and the associated file by its id (index in the filenames table).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;InstLoc&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;VM&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;findSourceLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;InstLoc&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;nullopt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;m_inst_locations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// select the best match: we want to find the location&lt;/span&gt;
    &lt;span class="c1"&gt;// that's nearest our instruction pointer, but not equal&lt;/span&gt;
    &lt;span class="c1"&gt;// to it as the IP will always be pointing to the next&lt;/span&gt;
    &lt;span class="c1"&gt;// instruction, not yet executed. Thus, the erroneous&lt;/span&gt;
    &lt;span class="c1"&gt;// instruction is the previous one.&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inst_pointer&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// early exit because we won't find anything better, as&lt;/span&gt;
    &lt;span class="c1"&gt;// inst locations are ordered by ascending (pp, ip)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_pointer&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
          &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inst_pointer&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results!
&lt;/h2&gt;

&lt;p&gt;Given the following erroneous code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="nv"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="nv"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We expect an arity error upon calling &lt;code&gt;foo&lt;/code&gt;, as we passed too many arguments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ArityError: When calling `(foo)', received 3 arguments, but expected 2: `(foo a b)'

In file a.ark
    1 | (let foo (fun (a b) (+ a b)))
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
    2 |
    3 | (foo 1 2 3)

[   2] In function `foo' (a.ark:1)
[   1] In global scope (a.ark:3)

Current scope variables values:
foo = Function@1
At IP: 0, PP: 1, SP: 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In terms of bytecode, it generated the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Symbols table (length: 3)
0) foo
1) a
2) b

Constants table (length: 4)
0) (PageAddr) 1
1) (Number) 1
2) (Number) 2
3) (Number) 3

Instruction locations table (length: 3)
 PP, IP
  0,  0 -&amp;gt; a.ark:0
  0,  4 -&amp;gt; a.ark:2
  1,  4 -&amp;gt; a.ark:0

Code segment 0 (length: 24)
   0 39 00 00 00 LOAD_CONST_STORE
   1 38 00 20 01 LOAD_CONST_LOAD_CONST
   2 03 00 00 03 LOAD_CONST 3 (Number)
   3 02 00 00 00 LOAD_SYMBOL_BY_INDEX (0)
   4 0b 00 00 03 CALL (3)
   5 0a 00 00 00 HALT

Code segment 1 (length: 28)
   0 05 00 00 01 STORE a
   1 05 00 00 02 STORE b
   2 02 00 00 01 LOAD_SYMBOL_BY_INDEX (1)
   3 02 00 00 00 LOAD_SYMBOL_BY_INDEX (0)
   4 20 00 00 00 ADD
   5 09 00 00 00 RET
   6 0a 00 00 00 HALT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the instruction locations table is quite small thanks to the de-duplication, and we have all the information we need to report errors correctly!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/inst_source_tracking_in_arkscript/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>arkscript</category>
      <category>pldev</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Optimizing scopes data in ArkScript VM</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Fri, 16 May 2025 10:00:00 +0000</pubDate>
      <link>https://dev.to/lexplt/optimizing-scopes-data-in-arkscript-vm-2b2n</link>
      <guid>https://dev.to/lexplt/optimizing-scopes-data-in-arkscript-vm-2b2n</guid>
      <description>&lt;p&gt;If you don't know me yet, I have been working on &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt; for nearly 6 years now. ArkScript is a scripting language in modern C++, running on a custom virtual machine (like Python or Lua), with the goal of having a syntax easy to learn and use, a C++ interface to embed it in programs, and decent performances (without trying to be as fast as Lua though, Mike Pall is a genius and did outstanding work on LuaJIT).&lt;/p&gt;

&lt;h2&gt;
  
  
  Is my language fast?
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A silly question that you shouldn't care about, unless the perceived slowness of the language is becoming a problem.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Recently, I added more benchmarks to the language, and I was quite astonished to see that it was &lt;em&gt;1.5 times slower&lt;/em&gt; than Python on such a simple benchmark:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;mut&lt;/span&gt; &lt;span class="nv"&gt;collection&lt;/span&gt; &lt;span class="nv"&gt;[]&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;mut&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;append!&lt;/span&gt; &lt;span class="nv"&gt;collection&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;mut&lt;/span&gt; &lt;span class="nv"&gt;sum&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;sum&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;sum&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;@&lt;/span&gt; &lt;span class="nv"&gt;collection&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is also &lt;em&gt;17 times slower&lt;/em&gt; than Python on the binary tree benchmark. I expected it to be a bit slower, but not by that much! When digging in profiler traces, it appears that we lose about &lt;em&gt;35% of execution time looking for variables&lt;/em&gt;. This is what inspired me to write this article: how locals are stored in the virtual machine, and what kind of optimizations were applied.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  📝 Note
&lt;/h2&gt;

&lt;p&gt;You can see the benchmark results on this page: &lt;a href="https://arkscript-lang.dev/benchmarks.html" rel="noopener noreferrer"&gt;arkscript-lang.dev/benchmarks.html&lt;/a&gt;. They are generated from &lt;a href="https://github.com/ArkScript-lang/benchmarks" rel="noopener noreferrer"&gt;github.com/ArkScript-lang/benchmarks&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Some definitions
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;[A &lt;strong&gt;stack-based VM&lt;/strong&gt;, such as ArkScript], is a processor in which the primary interaction is moving short-lived temporary values to and from a push-down stack.&lt;br&gt;
&lt;a href="https://en.m.wikipedia.org/wiki/Stack_machine" rel="noopener noreferrer"&gt;Wikipedia — Stack machine&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;ArkScript does not use any kind of registers; all computations are done using a stack, so &lt;code&gt;(1 + 2) * 3&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PUSH 2
PUSH 1
ADD  // pop 1, 2, push (1+2)
PUSH 3
MUL  // pop the addition result, 3, push (3*3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables are stored in a &lt;strong&gt;scope&lt;/strong&gt;, and a stack of &lt;strong&gt;scopes&lt;/strong&gt; defines the environment at one point in the program execution. I call the current &lt;strong&gt;scope&lt;/strong&gt; (the last one on the stack of scopes) &lt;strong&gt;locals&lt;/strong&gt;: it holds all the variables defined in the current scope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evolution between versions
&lt;/h2&gt;

&lt;p&gt;For the following sections, I inspected the code of each version to see how locals and scopes were handled. Some versions do not change that much and instead have optimizations elsewhere, which I didn't bother checking/measuring since it isn't the main focus of the article.&lt;/p&gt;

&lt;p&gt;Benchmarks are run on the Ackermann-Péter function using &lt;a href="https://github.com/google/benchmark" rel="noopener noreferrer"&gt;google/benchmark&lt;/a&gt;, because it's a recursive function but not a &lt;a href="https://en.wikipedia.org/wiki/Primitive_recursive_function" rel="noopener noreferrer"&gt;primitive recursive function&lt;/a&gt;, meaning compilers can't easily optimize it. It grows quickly, creates a lot of scopes and destroys them a lot too, which is perfect for our use case.&lt;/p&gt;

&lt;p&gt;They are run on a M1 MacBook Pro with 10 cores and 32GB of RAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To retrace my steps, I created one git &lt;code&gt;worktree&lt;/code&gt; per tag on the project (and that's when I saw that the first tag on the project was 3.0.1 instead of 0.0.1):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git worktree list
~/ArkScript/Ark        d1be6b9f &lt;span class="o"&gt;[&lt;/span&gt;dev]
~/ArkScript/ark-v301   b43738be &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3010  4d8067ce &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3011  a0627382 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3012  9452ccee &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3013  6e7c5b7b &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3014  75ca4090 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v3015  0744f9a0 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v302   788e9d5e &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v303   1191515e &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v304   2173b4f5 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v305   7ae2bf51 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v306   ce876013 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v307   386e289e &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v308   4f99008c &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v309   250e27cb &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v310   c2dcd843 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v311   d301511a &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v312   a9e7ef97 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v313   bf151b77 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v320   0f16875d &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v330   c6c59c4c &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v340   425f7baf &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v350   2efb4ed8 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v4002  f5b247c3 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v4003  019d36bd &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v4004  07e569b6 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
~/ArkScript/ark-v4005  6a4c6449 &lt;span class="o"&gt;(&lt;/span&gt;detached HEAD&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Instantiating a bunch of stacks and big empty vectors
&lt;/h3&gt;

&lt;p&gt;From version &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v3.0.1" rel="noopener noreferrer"&gt;3.0.1&lt;/a&gt; to &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v3.0.12" rel="noopener noreferrer"&gt;3.0.12&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The very first version of the VM uses a &lt;code&gt;Frame&lt;/code&gt; object, instantiated for each scope. I think I stole this idea from Java, though my implementation is subpar, and I probably didn't understand everything at the time. Each &lt;code&gt;Frame&lt;/code&gt; instanced a new stack for itself, implemented as a &lt;code&gt;std::vector&lt;/code&gt;, which means that pushing to it would make grow and copy all of its elements, which is highly inefficient.&lt;/p&gt;

&lt;p&gt;Locals were stored as a &lt;code&gt;std::shared_ptr&amp;lt;std::vector&amp;lt;Value&amp;gt;&amp;gt;&lt;/code&gt; (&lt;code&gt;Scope_t&lt;/code&gt;). You read that right, no id. You accessed a specific variable using &lt;code&gt;locals[variable_id]&lt;/code&gt;. The shared pointer is there because locals could be added by closures, which have to retain their environment, which can be mutated. The stack of scopes was materialized as &lt;code&gt;std::vector&amp;lt;Scope_t&amp;gt;&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Frame&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="nl"&gt;public:&lt;/span&gt;
  &lt;span class="n"&gt;Frame&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;Frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Frame&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;Frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;caller_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;caller_page_addr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m_stack&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m_stack&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;stackSize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;callerAddr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;callerPageAddr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_page_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// related to scope deletion&lt;/span&gt;

  &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;incScopeCountToDelete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;m_scope_to_delete&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;resetScopeCountToDelete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;m_scope_to_delete&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;scopeCountToDelete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_scope_to_delete&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;m_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m_page_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;m_stack&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;int8_t&lt;/span&gt; &lt;span class="n"&gt;m_i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;m_scope_to_delete&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benchmark                         Time             CPU   Iterations
ackermann                       197 ms          197 ms            4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Locals as a list of pairs of id and value
&lt;/h3&gt;

&lt;p&gt;From version &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v3.0.13" rel="noopener noreferrer"&gt;3.0.13&lt;/a&gt; to &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v3.0.15" rel="noopener noreferrer"&gt;3.0.15&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The next iteration of local management introduced the first version of the &lt;code&gt;Scope&lt;/code&gt;, which we're still using today! They are still instantiated inside shared pointers, but they are way smaller as we do not access a variable through &lt;code&gt;operator[]&lt;/code&gt;, but iterate through the pairs of &lt;code&gt;id -&amp;gt; value&lt;/code&gt; instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="nl"&gt;public:&lt;/span&gt;
  &lt;span class="n"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;idFromValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;friend&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Ark&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;VM&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nl"&gt;private:&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This version tried to be smart: upon adding a new value, we sort the elements, so that the lookup would be faster. In retrospect (and after having looked at benchmarks), I now know it was a bad idea (which is why I'm not doing this anymore!):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define push_pair(id, val) \
  m_data.emplace_back(std::pair&amp;lt;uint16_t, Value&amp;gt;(id, val))
#define insert_pair(place, id, val) \
  m_data.insert(place, std::pair&amp;lt;uint16_t, Value&amp;gt;(id, val))
&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;Scope&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="cp"&gt;#ifdef ARK_SCOPE_DICHOTOMY
&lt;/span&gt;  &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="n"&gt;push_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;push_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt;
        &lt;span class="n"&gt;insert_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nl"&gt;default:&lt;/span&gt;
      &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;lower_bound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;lhs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="n"&gt;insert_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="cp"&gt;#else
&lt;/span&gt;  &lt;span class="n"&gt;push_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="cp"&gt;#endif
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inserting while trying to keep all elements ordered is slower than simply adding each element to the end of the scope:&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benchmark                         Time             CPU   Iterations
ackermann_dichotomy             294 ms          294 ms            2
ackermann_push_back             192 ms          192 ms            4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Single stack
&lt;/h3&gt;

&lt;p&gt;From version &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v3.1.0" rel="noopener noreferrer"&gt;3.1.0&lt;/a&gt; to &lt;a href="https://github.com/ArkScript-lang/Ark/tree/v4.0.0-10" rel="noopener noreferrer"&gt;4.0.0-10&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In v3.1.0, the &lt;code&gt;Scope&lt;/code&gt; removed the sorted insert to push everything to the end of its &lt;code&gt;std::vector&amp;lt;std::pair&amp;lt;id, Value&amp;gt;&amp;gt;&lt;/code&gt;. Also, the horrendous &lt;code&gt;std::vector&amp;lt;Frame&amp;gt;&lt;/code&gt; was replaced by a &lt;code&gt;std::unique_ptr&amp;lt;std::array&amp;lt;Value, 8192&amp;gt;&amp;gt;&lt;/code&gt;, which yielded a much-needed performance improvement. Instead of having a separate data structure to save the caller page pointer and instruction pointer, we now push those on the stack too (meaning we have a recursion depth of 4096 for non-primary recursive functions, that can't be optimized to loops).&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benchmark                         Time             CPU   Iterations
ackermann                       146 ms          146 ms            5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, in v3.1.3 the &lt;code&gt;ExecutionContext&lt;/code&gt; appeared: it's a struct with everything the VM needs to run code (instruction, page and stack pointer, stack, &lt;code&gt;std::vector&amp;lt;Scope&amp;gt;&lt;/code&gt; for locals...). This has been added to help with adding parallelism to the language; it did not downgrade performances (nor did it improve them).&lt;/p&gt;

&lt;p&gt;In later versions, more AST and new IR optimizations were implemented, which helped divide the run time of our benchmark by ~2.4:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benchmark                         Time             CPU   Iterations
ackermann                      60.4 ms         60.3 ms           50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, we still spend around 35% of our time in &lt;code&gt;findNearestVariable&lt;/code&gt; (which calls &lt;code&gt;Scope::operator[]&lt;/code&gt; on line 5):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;VM&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;findNearestVariable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;internal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rbegin&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;it_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rend&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
       &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;it_end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even with a basic bloom filter, searching for a value takes a lot of time, plus data locality isn't that good:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;Scope&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;maybeHas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_min_id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;m_max_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;Scope&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;maybeHas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How can we do better?
&lt;/h2&gt;

&lt;p&gt;In his book &lt;a href="https://craftinginterpreters.com/" rel="noopener noreferrer"&gt;Crafting Interpreters&lt;/a&gt;, Robert Nystrom shows that you can use the stack for storing local variables in the &lt;a href="https://craftinginterpreters.com/local-variables.html" rel="noopener noreferrer"&gt;Local Variables&lt;/a&gt; chapter. This would mean using a &lt;em&gt;single contiguous data structure&lt;/em&gt; for all of our needs, which would actually be pretty neat, and most certainly yield impressive performance improvements!&lt;/p&gt;

&lt;p&gt;Alas, due to how closures work in ArkScript, this isn't doable. Storing variables on the stack means the compiler has to know where we put each variable, and since no type checking is done at compile time, we can't easily know when we're compiling a closure call!&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  📝 Note
&lt;/h2&gt;

&lt;p&gt;Actually, we can infer that from &lt;code&gt;a.b&lt;/code&gt; or &lt;code&gt;a.b.c&lt;/code&gt; because the dot notation is reserved for closures and field accessing, but this only covers 90% of use cases, by reading (foo) we don't know if we're calling a C++ function, a user function or a user closure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But this gave me an idea: what if we could use a single contiguous data structure for our variables? And make small adaptations to handle closure scopes, that need to be kept alive while a closure is being called?&lt;/p&gt;

&lt;p&gt;Well, I did that. And it worked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Contiguous storage for our locals
&lt;/h3&gt;

&lt;p&gt;Currently, our locals are stored in a &lt;code&gt;std::vector&amp;lt;std::shared_ptr&amp;lt;std::vector&amp;lt;std::pair&amp;lt;id, value&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;, basically a list of pointers to lists of pairs. This can't be good, our piles of &lt;code&gt;id -&amp;gt; value&lt;/code&gt; are scattered all over RAM! However, what we need for storing locals is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a vector or array to store our pairs &lt;code&gt;id -&amp;gt; value&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a length, to know how many elements are in our scope&lt;/li&gt;
&lt;li&gt;maybe some min and max id to implement a basic bloom filter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we could make &lt;em&gt;views&lt;/em&gt; over a single &lt;code&gt;std::array&amp;lt;std::pair&amp;lt;id, value&amp;gt;, N&amp;gt;&lt;/code&gt;, that knows where they start, and how many elements they hold.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Scope&lt;/code&gt; implementation didn't even have to change that much:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;pair_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ScopeStackSize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;*&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;m_storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;m_start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;m_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;m_min_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;numeric_limits&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;::&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
  &lt;span class="n"&gt;m_max_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;m_min_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;m_min_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;m_max_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;m_max_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;m_storage&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;m_start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;m_size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;make_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;m_size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;maybeHas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_min_id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;m_max_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;maybeHas&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m_start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;m_start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;m_size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;m_storage&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;id_to_look_for&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And creating a new scope is still quite easy, we just need to pass a pointer to our array, and the first free position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="c1"&gt;// our array&amp;lt;pair&amp;lt;id, value&amp;gt;&amp;gt;&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scopes_storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// the end of the scope (size + start) = first free slot&lt;/span&gt;
  &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;back&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;storageEnd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As for closures, I had to create a dedicated &lt;code&gt;ClosureScope&lt;/code&gt;, which was basically the old &lt;code&gt;Scope&lt;/code&gt;, since the closure needs to have ownership of a scope, and now we use views. Merging a &lt;code&gt;ClosureScope&lt;/code&gt; inside a &lt;code&gt;Scope&lt;/code&gt; is still doable (so that we can create a scope of references to the closure's fields):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;ClosureScope&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;mergeRefInto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ScopeView&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;valueType&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;ValueType&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Reference&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of this is very fancy, and seems to work on paper, right? But some of you may have noticed a flaw in this design: what if we need to add values to a scope that &lt;em&gt;isn't&lt;/em&gt; the last and active one? It would break! Luckily, this can't happen as only one scope can be active at a time, and variables are &lt;strong&gt;always&lt;/strong&gt; added to the current active scope, which is the last one on our pile of scopes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gotta go fast!
&lt;/h3&gt;

&lt;p&gt;By using a contiguous storage, avoiding useless copies of the pairs &lt;code&gt;id -&amp;gt; value&lt;/code&gt; when the storage vector grew, avoiding reserving and freeing memory every time a function is call / returns, we have achieved a 21% performance improvement on our benchmark (and even up to 76% improvement on the binary tree benchmark!):&lt;/p&gt;

&lt;p&gt;Results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Benchmark                         Time             CPU   Iterations
ackermann_begining              197 ms          197 ms            4
ackermann_dichotomy             294 ms          294 ms            2
ackermann_push_back             192 ms          192 ms            4
ackermann_single_stack         60.4 ms         60.3 ms           50
ackermann_contiguous           47.8 ms         47.7 ms           50

binary_trees_single_stack    3965.9 ms      3962.32 ms            1
binary_trees_contiguous         941 ms          939 ms            1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/optimizing_scopes_data_in_arkscript_vm/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>arkscript</category>
      <category>pldev</category>
      <category>cpp</category>
      <category>performance</category>
    </item>
    <item>
      <title>Publishing ZIP artifacts with SBT</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Thu, 15 May 2025 10:00:00 +0000</pubDate>
      <link>https://dev.to/lexplt/publishing-zip-artifacts-with-sbt-5edd</link>
      <guid>https://dev.to/lexplt/publishing-zip-artifacts-with-sbt-5edd</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/lexplt/generating-swaggers-at-compile-time-1m4c"&gt;In my previous article about Scala&lt;/a&gt;, I briefly mentioned we were publishing Swaggers inside JARs, that we could unzip to retrieve the Swagger definition and then run our code generation. It works, but using ZIPs would feel better and less confusing for users.&lt;/p&gt;

&lt;p&gt;I have decided to nerd snip myself and see how I could publish a ZIP using SBT, and then use said ZIP as a dependency for &lt;a href="https://github.com/SuperFola/sbt-swaggerinator" rel="noopener noreferrer"&gt;sbt-swaggerinator&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  sbt-native-packager to the rescue
&lt;/h2&gt;

&lt;p&gt;I quickly noticed that sbt-native-packager could &lt;a href="https://www.scala-sbt.org/sbt-native-packager/formats/universal.html#build" rel="noopener noreferrer"&gt;output ZIPs&lt;/a&gt;, which was very helpful as we are already using it for bundling JARs and creating Docker images.&lt;/p&gt;

&lt;p&gt;With this basic SBT configuration, you can publish a ZIP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;publishZipSettings&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;publishTo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"https://my-maven-repository"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;publishConfiguration&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nv"&gt;publishConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;withOverwrite&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
  &lt;span class="nc"&gt;Universal&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;mappings&lt;/span&gt; &lt;span class="o"&gt;++=&lt;/span&gt; &lt;span class="nf"&gt;contentOf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;sourceDirectory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="s"&gt;"main"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="s"&gt;"resources"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;publish&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;publish&lt;/span&gt; &lt;span class="nf"&gt;dependsOn&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Universal&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;packageBin&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;publishLocal&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;publishLocal&lt;/span&gt; &lt;span class="nf"&gt;dependsOn&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Universal&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;packageBin&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="nf"&gt;makeDeploymentSettings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Universal&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Universal&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;packageBin&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"zip"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;`my-api-swagger`&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"modules/my-api-swagger"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;UniversalPlugin&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;UniversalDeployPlugin&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;publishZipSettings&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Line 4 is quite important ; at first I wrote &lt;code&gt;Universal / mappings += (Compile / packageBin).value -&amp;gt; "swagger.yaml"&lt;/code&gt;, as per the docs, not really knowing what I was doing. It seems that it worked, but it is actually &lt;strong&gt;putting the JAR in the ZIP as a file named swagger.yaml&lt;/strong&gt;, which isn't ideal if you ask me. Since we generate the Swagger files under the &lt;code&gt;resources/&lt;/code&gt; folder, using &lt;code&gt;contentOf&lt;/code&gt; is good enough.&lt;/li&gt;
&lt;li&gt;Lines 5 and 6 aren't strictly necessary, just useful to be able to generate the ZIP file in a single command: &lt;code&gt;sbt "my-api-swagger / publish"&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Publishing works as expected, but we have problems when trying to retrieve the ZIP for the Maven repository: SBT is looking for a POM file which isn't generated! Great, we can push artifacts but not retrieve them.&lt;/li&gt;
&lt;li&gt;Apparently we need to generate POM files, which we do not have.&lt;/li&gt;
&lt;li&gt;The module is published with a &lt;code&gt;_2.13&lt;/code&gt; suffix at the end, even though it doesn't depend on any particular Scala version.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Using SBT artifacts
&lt;/h2&gt;

&lt;p&gt;We can tell SBT that we are &lt;a href="https://www.scala-sbt.org/1.x/docs/Artifacts.html" rel="noopener noreferrer"&gt;publishing an artifact&lt;/a&gt;, that might help us. At first, I thought we had to publish a module, eg &lt;code&gt;my-api&lt;/code&gt;, with an attached artifact named &lt;code&gt;my-api-swagger&lt;/code&gt; ; not really what I was looking for but let's try this anyway.&lt;/p&gt;

&lt;p&gt;It turns out that creating a project with a single artifact correctly configured was a thing: an artifact is just what SBT is publishing (by default: the binary JAR, docs etc). And we can disable JAR publishing, as well as docs, since we don't need them!&lt;/p&gt;

&lt;p&gt;New settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; lazy val publishZipSettings = Seq(
   publishTo := "https://my-maven-repository",
   publishConfiguration := publishConfiguration.value.withOverwrite(true),
   Universal / mappings ++= contentOf(sourceDirectory.value / "main" / "resources"),
&lt;span class="gi"&gt;+  Compile / packageBin / publishArtifact := false,
+  Compile / packageDoc / publishArtifact := false,
&lt;/span&gt;   publish := (publish dependsOn (Universal / packageBin)).value,
   publishLocal := (publishLocal dependsOn (Universal / packageBin)).value,
&lt;span class="gi"&gt;+  crossPaths := false,
&lt;/span&gt; ) ++ makeDeploymentSettings(Universal, Universal / packageBin, "zip")
&lt;span class="err"&gt;
&lt;/span&gt; lazy val `my-api-swagger` = (project in file("modules/my-api-swagger"))
   .enablePlugins(UniversalPlugin)
   .enablePlugins(UniversalDeployPlugin)
   .settings(publishZipSettings)
&lt;span class="gi"&gt;+  .settings(addArtifact(Artifact("my-api-swagger", "zip", "zip"), Universal / packageBin))
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With those additions, we can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoid publishing a &lt;code&gt;jars/&lt;/code&gt; folder (line 5) ;&lt;/li&gt;
&lt;li&gt;avoid publishing a &lt;code&gt;docs/&lt;/code&gt; folder (line 6) ;&lt;/li&gt;
&lt;li&gt;avoid the &lt;code&gt;_2.13&lt;/code&gt; suffix (line 9) ;&lt;/li&gt;
&lt;li&gt;generate POM and ivy.xml files for SBT ;&lt;/li&gt;
&lt;li&gt;keep publishing a ZIP under the &lt;code&gt;zips/&lt;/code&gt; folder (line 16).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's still a bit verbose, and I would have liked to be able to add the &lt;code&gt;addArtifact(...)&lt;/code&gt; setting inside the &lt;code&gt;publishZipSettings&lt;/code&gt;, deducing the artifact name from the module ; alas we need to rely on a setting (&lt;code&gt;projectID.value.name&lt;/code&gt;), and you can only use &lt;code&gt;.value&lt;/code&gt; on settings inside other settings or macros. That leaves room for improvements, but for now, it's good enough!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/publish_zip_artifact_sbt/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>scala</category>
      <category>openapi</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Generating swaggers at compile time</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Wed, 14 May 2025 15:48:30 +0000</pubDate>
      <link>https://dev.to/lexplt/generating-swaggers-at-compile-time-1m4c</link>
      <guid>https://dev.to/lexplt/generating-swaggers-at-compile-time-1m4c</guid>
      <description>&lt;p&gt;At work, we've been generating code from swaggers and publishing said generated code. Alas, this requires us to remember to generate the swagger(s), as well as fixing versions of libraries (which means you need to upgrade versions in the server, publish updated generated code, then update your client), see this examples from &lt;a href="https://github.com/eikek/sbt-openapi-schema" rel="noopener noreferrer"&gt;eikek/sbt-openapi-schema&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;com.github.eikek.sbt.openapi._&lt;/span&gt;

&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;CirceVersion&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.14.1"&lt;/span&gt;
&lt;span class="n"&gt;libraryDependencies&lt;/span&gt; &lt;span class="o"&gt;++=&lt;/span&gt; &lt;span class="nc"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
  &lt;span class="s"&gt;"io.circe"&lt;/span&gt; &lt;span class="o"&gt;%%&lt;/span&gt; &lt;span class="s"&gt;"circe-generic"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nc"&gt;CirceVersion&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
  &lt;span class="s"&gt;"io.circe"&lt;/span&gt; &lt;span class="o"&gt;%%&lt;/span&gt; &lt;span class="s"&gt;"circe-parser"&lt;/span&gt;  &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nc"&gt;CirceVersion&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;openapiSpec&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Compile&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;resourceDirectory&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="s"&gt;"test.yml"&lt;/span&gt;
&lt;span class="n"&gt;openapiTargetLanguage&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nv"&gt;Language&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;Scala&lt;/span&gt;
&lt;span class="nc"&gt;Compile&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;openapiScalaConfig&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nc"&gt;ScalaConfig&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;withJson&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;ScalaJson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;circeSemiauto&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;addMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;CustomMapping&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;forType&lt;/span&gt;&lt;span class="o"&gt;({&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nc"&gt;TypeDef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"LocalDateTime"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;_&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt;
      &lt;span class="nc"&gt;TypeDef&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Timestamp"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Imports&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.mypackage.Timestamp"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;}))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;addMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;CustomMapping&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;forName&lt;/span&gt;&lt;span class="o"&gt;({&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"Dto"&lt;/span&gt; &lt;span class="o"&gt;}))&lt;/span&gt;

&lt;span class="nf"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenApiSchema&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;When defining endpoints, it's not uncommon to couple the endpoint and its implementation with a &lt;code&gt;.serverLogic&lt;/code&gt;, meaning creating our endpoints relied on the implementation being available (need to instantiate the API, repositories to connect to databases, http clients...).&lt;/p&gt;

&lt;p&gt;This made it harder than necessary to write our swaggers to disk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you needed to compile the app,&lt;/li&gt;
&lt;li&gt;run all the services (databases, kafka...),&lt;/li&gt;
&lt;li&gt;run the app,&lt;/li&gt;
&lt;li&gt;go to &lt;a href="http://localhost:8080/swagger/" rel="noopener noreferrer"&gt;http://localhost:8080/swagger/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the swagger to be generated and written to disk. Otherwise, generated code wouldn't be up to date, which is a problem. A very long procedure, not very convenient, that we used to do once in a while.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;We can generate swaggers using libraries like &lt;a href="https://github.com/softwaremill/tapir" rel="noopener noreferrer"&gt;tapir&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;swagger&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAPIDocsInterpreter&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toOpenAPI&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tapirEndpoints&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"my app"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"v0.0.1"&lt;/span&gt;
  &lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;openapi&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"3.0.3"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toYaml&lt;/span&gt;

&lt;span class="nf"&gt;writeSwaggerToFile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;swagger&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pathToSwaggerFile&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we would just have to decouple our &lt;code&gt;Endpoint&lt;/code&gt;s and their implementation (which made them &lt;code&gt;ServerEndpoint&lt;/code&gt;s by the way, but you only need &lt;code&gt;Endpoint&lt;/code&gt; to generate the OpenAPI spec).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MyApiEndpoints.scala&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Endpoints&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;booksListingEndpoint&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;PublicEndpoint&lt;/span&gt;&lt;span class="o"&gt;[(&lt;/span&gt;&lt;span class="kt"&gt;BooksQuery&lt;/span&gt;, &lt;span class="kt"&gt;Limit&lt;/span&gt;, &lt;span class="kt"&gt;AuthToken&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="kt"&gt;String&lt;/span&gt;, &lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Book&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;, &lt;span class="kt"&gt;Any&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; 
    &lt;span class="n"&gt;endpoint&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;get&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;in&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="s"&gt;"books"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"genre"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"year"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="py"&gt;mapTo&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;BooksQuery&lt;/span&gt;&lt;span class="o"&gt;])&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;in&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"limit"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Maximum number of books to retrieve"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;in&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;AuthToken&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"X-Auth-Token"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;errorOut&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stringBody&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;out&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonBody&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Book&lt;/span&gt;&lt;span class="o"&gt;]])&lt;/span&gt;

  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;plainEndpoints&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;AnyEndpoint&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;booksListingEndpoint&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MyApiRoutes.scala&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Routes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;MyApiService&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Endpoints&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;booksListing&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;booksListingEndpoint&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;serverLogic&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;case&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;booksQuery&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;authToken&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nv"&gt;api&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;listBooks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authToken&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;booksQuery&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;endpoints&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;List&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;ServerEndpoint&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Fs2Streams&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;, &lt;span class="kt"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;booksListing&lt;/span&gt;
  &lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;routes&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;HttpRoutes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Http4sServerInterpreter&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;toRoutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoints&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then find a way to compile and run code calling &lt;code&gt;OpenAPIDocsInterpreter&lt;/code&gt; with the &lt;code&gt;new Endpoints().plainEndpoints&lt;/code&gt;. Easy right?&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's over engineer it
&lt;/h2&gt;

&lt;p&gt;Splitting endpoints and server endpoints implementation is the right and easy thing to do, but we can go further. What if we could leverage our build system to generate a swagger for us each time we compile?&lt;/p&gt;

&lt;h3&gt;
  
  
  Making a sbt plugin
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.scala-sbt.org/" rel="noopener noreferrer"&gt;sbt&lt;/a&gt; is the &lt;em&gt;de facto&lt;/em&gt; build tool in Scala, hence I made a plugin to do the generating for us. A plugin can define a task, which you can then call inside the sbt shell as a command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sbt.Keys.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sbt.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;SpecGen&lt;/span&gt; &lt;span class="k"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AutoPlugin&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;autoImport&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;Spec&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"spec"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;extend&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;specGenMain&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settingKey&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"Main class (FQDN) to run"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;specGenArgs&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settingKey&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt;&lt;span class="o"&gt;]](&lt;/span&gt;&lt;span class="s"&gt;"Arguments to pass to runner"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;specGenMake&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;taskKey&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Unit&lt;/span&gt;&lt;span class="o"&gt;](&lt;/span&gt;&lt;span class="s"&gt;"run code/resource generation from config"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;autoImport.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;projectSettings&lt;/span&gt;&lt;span class="k"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;Def.Settings&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;?&lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;inConfig&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;)(&lt;/span&gt;&lt;span class="nv"&gt;Defaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;configSettings&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="nc"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;specGenMain&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;user defined&amp;gt;"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;specGenArgs&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nv"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;empty&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;specGenMake&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;logger&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;streams&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;log&lt;/span&gt;
      &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;classPath&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;Attributed&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;data&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Spec&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;fullClasspath&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Spec&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;specGenMain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;classPath&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;specGenArgs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;get&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;:+&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ivyConfigurations&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nf"&gt;overrideConfigs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Spec&lt;/span&gt;&lt;span class="o"&gt;)(&lt;/span&gt;&lt;span class="nv"&gt;ivyConfigurations&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;value&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;h2&gt;
  
  
  💡 Tip
&lt;/h2&gt;

&lt;p&gt;This small implementation is the bare minimum to register a class, instantiate it, and run it via sbt.&lt;/p&gt;

&lt;p&gt;To make our task run after &lt;code&gt;compile&lt;/code&gt; we would have to make it return a &lt;code&gt;File&lt;/code&gt; and add a resource generator: &lt;code&gt;Compile / resourceGenerators += (Spec / specGenMake).taskValue&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This simple plugin can be enabled on projects as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;`my-api`&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"application/my-api"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(...)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SpecGen&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Spec&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;specGenMain&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"org.myapi.GenerateSwagger"&lt;/span&gt;
  &lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// we need a project to be able to `publish` the swagger.&lt;/span&gt;
&lt;span class="c1"&gt;// GenerateSwagger would write to the &amp;lt;module&amp;gt;/src/resources/swagger.yaml&lt;/span&gt;
&lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;`my-api-swagger`&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"modules/my-api-swagger"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we just have to write a &lt;code&gt;GenerateSwagger&lt;/code&gt; class that instanciate &lt;code&gt;Endpoints&lt;/code&gt; and call &lt;code&gt;OpenAPIDocsInterpreter&lt;/code&gt; on them, to write the resulting yaml to a file!&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  📝 Note
&lt;/h2&gt;

&lt;p&gt;For now, this small implementation requires the user to run &lt;code&gt;Spec / specGenMake;compile&lt;/code&gt; to have the swagger generated and code to be compiled, until I revisit it. For our needs it isn't a huge deal, as we generate and publish swaggers inside our CI/CD environement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Generating Scala code from a published swagger
&lt;/h3&gt;

&lt;p&gt;How would you generate code from a swagger inside a jar downloaded by sbt?&lt;/p&gt;

&lt;p&gt;Using &lt;a href="https://github.com/SuperFola/sbt-swaggerinator" rel="noopener noreferrer"&gt;sbt-swaggerinator&lt;/a&gt;, the swagger downloader and unpacker, it's quite easy! Add your swagger module as a dependency and let the plugin do the job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="c1"&gt;// build.sbt&lt;/span&gt;
&lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;Dependencies&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;swagger&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"com.example"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"my-api-swagger_2.13"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"1.0.0"&lt;/span&gt;
  &lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;circe&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"io.circe"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"circe-core"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"io.circe"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"circe-generics"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"io.circe"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"circe-generic-extras"&lt;/span&gt;
  &lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="py"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;_&lt;/span&gt; &lt;span class="o"&gt;%%&lt;/span&gt; &lt;span class="s"&gt;"0.14.1"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;`my-api-generated`&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"modules/my-api-generated"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenApiSchema&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;enablePlugins&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SwaggerinatorSbt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;swaggerinatorDependency&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nv"&gt;Dependencies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;swagger&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;swaggerinatorPackage&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nc"&gt;Pkg&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.example.my-api.generated"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;libraryDependencies&lt;/span&gt; &lt;span class="o"&gt;++=&lt;/span&gt; &lt;span class="nv"&gt;Dependencies&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;circe&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;lazy&lt;/span&gt; &lt;span class="k"&gt;val&lt;/span&gt; &lt;span class="nv"&gt;infrastructure&lt;/span&gt; &lt;span class="k"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"modules/infrastructure"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="py"&gt;dependsOn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;`my-api-generated`&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="s"&gt;"compile-&amp;gt;compile"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Passing the dependency to swaggerinator will let it access the swagger inside it, by unzipping the jar (after all, it's just a ZIP file with another extension). Then the plugin generates code from the swagger, by calling the &lt;code&gt;OpenApiSchema&lt;/code&gt; plugin for us.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/generating_swaggers_at_compile_time/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>scala</category>
      <category>openapi</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Implementing an Intermediate Representation for ArkScript</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Mon, 14 Oct 2024 07:37:33 +0000</pubDate>
      <link>https://dev.to/lexplt/implementing-an-intermediate-representation-for-arkscript-1824</link>
      <guid>https://dev.to/lexplt/implementing-an-intermediate-representation-for-arkscript-1824</guid>
      <description>&lt;p&gt;ArkScript is a scripting language, running on a VM. To accomplish this, we had (as of September 2024) a compiler generating bytecode for the virtual machine, receiving an AST from the parser (and a few other passes like name resolution, macro evaluation, name and scope resolution...).&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring new optimizations
&lt;/h2&gt;

&lt;p&gt;The only thing we could optimize was the virtual machine and the memory layout of our values, and some very little things directly in the compiler, like &lt;a href="https://dev.to/lexplt/understanding-tail-call-optimization-3562"&gt;tail call optimization&lt;/a&gt;. Having implemented &lt;a href="https://dev.to/lexplt/implementing-computed-gotos-in-c-193p"&gt;computed gotos&lt;/a&gt; a few weeks ago, I think I've hit the limit in terms of feasible optimization for this VM.&lt;/p&gt;

&lt;p&gt;For a while, a friend tried to push me toward making an &lt;em&gt;intermediate representation&lt;/em&gt; for ArkScript. I shrugged it off, saying it was too much work, not really knowing what I would get into and having bigger fish to fry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Biting the bullet
&lt;/h3&gt;

&lt;p&gt;At the end of September, I stumbled upon &lt;a href="https://peoplemaking.games/@eniko/113211730549249447" rel="noopener noreferrer"&gt;a post by Eniko on Mastodon&lt;/a&gt; (if you don't follow her already, what are you waiting for?), and the idea of making an IR came back into my mind... and I just started thinking about, but seriously this time.&lt;/p&gt;

&lt;p&gt;What were my goals?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Optimizing the bytecode produced by the compiler, so that we could remove useless or redundant instructions ;&lt;/li&gt;
&lt;li&gt;Replacing a series of instructions by a single one, more specific, that could do multiple things at once.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: operating directly on bytecode is hard: we either need&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;to replace instructions with &lt;code&gt;NOP&lt;/code&gt; (that would still be decoded and run by the VM, even to do virtually nothing)&lt;/li&gt;
&lt;li&gt;to recompute every single jump address after merging or removing instructions (this problem appears with both relative and absolute jumps)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Designing the IR
&lt;/h2&gt;

&lt;p&gt;The IR would have to solve this problem, otherwise it would be useless for its single job: helping to produce better bytecode. Let's see what we can work with:&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;compiler&lt;/em&gt; job is to flatten the AST (&lt;em&gt;Abstract Syntax Tree&lt;/em&gt;, our parsed code represented as a tree that we visit recursively) into a list of instructions. To simplify the calling convention, each function is compiled in a dedicated region that I call a &lt;strong&gt;page&lt;/strong&gt;. Inside a &lt;strong&gt;page&lt;/strong&gt;, each jump is relative to the first instruction, and the bytecode is essentially a &lt;code&gt;uint8_t[][]&lt;/code&gt;, that we can access using a &lt;code&gt;page pointer&lt;/code&gt; and an &lt;code&gt;instruction pointer&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  First draft
&lt;/h3&gt;

&lt;p&gt;My first idea was to output a tree of IR instructions instead of a list of instructions. That would still be flatter than the AST, and on paper, it would solve the jump problem as we have jumps only for loops and conditions.&lt;/p&gt;

&lt;p&gt;If we use a Lisp-like (ArkScript-like?) syntax, it could look like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(store a 0)   # a = 0
(setval a 5)  # a = 5
(load_symbol a)
(load_const 0)
(gt)          # (&amp;lt; a 0)
(if
    then...
    else...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However I didn't really like this idea, it felt like reinventing the wheel, another tree, as we have a non-flat structure to handle a sequence of conditions &lt;code&gt;(if cond (if cond2 ...))&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Second try: getting rid of jumps altogether
&lt;/h3&gt;

&lt;p&gt;This time, we will use a structure very similar if not identical to the bytecode, and get rid of all the &lt;code&gt;JUMP&lt;/code&gt; instructions. Taking inspiration from assembly, they will get replaced by &lt;strong&gt;labels&lt;/strong&gt; and &lt;strong&gt;gotos&lt;/strong&gt; in our IR! This way we can add and remove as many instructions as we want, as long as we don't update a &lt;strong&gt;label&lt;/strong&gt; we can still compute its address later and compile our &lt;strong&gt;gotos&lt;/strong&gt; to absolute jumps without any issues.&lt;/p&gt;

&lt;p&gt;This is also the easiest solution, as it does not require me to entirely rewrite the compiler: I just have to add a wrapper for the &lt;code&gt;Instruction&lt;/code&gt;, that gets compiled to bytecode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the IR
&lt;/h2&gt;

&lt;p&gt;The wrapper is small, and was easy to implement, needing only an additional &lt;code&gt;Kind&lt;/code&gt; to differentiate final instructions (&lt;code&gt;Opcode&lt;/code&gt; and &lt;code&gt;Opcode2Args&lt;/code&gt;) and entities that require processing to be compiled to final instructions (&lt;code&gt;Goto&lt;/code&gt;, &lt;code&gt;GotoIfTrue&lt;/code&gt;, &lt;code&gt;GotoIfFalse&lt;/code&gt; all require the attached label to be computed first). The &lt;code&gt;Label&lt;/code&gt; is only there to get an address in the bytecode, and won't produce an instruction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Kind&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Goto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;GotoIfTrue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;GotoIfFalse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Opcode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Opcode2Args&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;label_t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Entity&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="nl"&gt;public:&lt;/span&gt;
    &lt;span class="k"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Kind&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;primary_arg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;secondary_arg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// tools to build IR entities easily&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt; &lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt; &lt;span class="n"&gt;Goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt; &lt;span class="n"&gt;GotoIf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;cond&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="n"&gt;Word&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// getters&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;label_t&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_label&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;Kind&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_kind&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_inst&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;primaryArg&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_primary_arg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;nodiscard&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;secondaryArg&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;m_secondary_arg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;label_t&lt;/span&gt; &lt;span class="n"&gt;LabelCounter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;Kind&lt;/span&gt; &lt;span class="n"&gt;m_kind&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;label_t&lt;/span&gt; &lt;span class="n"&gt;m_label&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;m_inst&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;NOP&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;m_primary_arg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;m_secondary_arg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Entity&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Instructions in ArkScript
&lt;/h3&gt;

&lt;p&gt;It is also interesting to speak about the instruction representation in ArkScript. An instruction in on four bytes: &lt;code&gt;iiiiiiii pppppppp aaaaaaaa aaaaaaaa&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;i&lt;/code&gt; represents the instruction bits, 8, giving us 256 different instructions possible&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;p&lt;/code&gt; is for padding, ignored in instructions with a single immediate argument&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt; represents the bits of the immediate argument, a total of 16 (0 -&amp;gt; 65'535)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Super Instructions can require up to two arguments, which are encoded using the padding: &lt;code&gt;iiiiiiii ssssssss ssssaaaa aaaaaaaa&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we still have our instruction on the same byte,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s&lt;/code&gt; represents the bits of the secondary argument, a total of 12 (0 -&amp;gt; 4095)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;a&lt;/code&gt; represents the bits of the primary argument, a total of 12 (0 -&amp;gt; 4095)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Compiling an IR entity to bytecode
&lt;/h3&gt;

&lt;p&gt;Since some instructions can take two arguments, the instruction-to-bytecode helper had to be updated. Implementation for reference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Word&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;opcode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;byte_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;byte_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;byte_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;opcode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;// byte_1 = 0, this is our padding here&lt;/span&gt;
        &lt;span class="n"&gt;byte_2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="n"&gt;byte_3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;primary_arg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;secondary_arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;opcode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;byte_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;secondary_arg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;byte_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secondary_arg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x00f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary_arg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xf00&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;byte_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;primary_arg&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x0ff&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rest is pretty straightforward: instead of having the &lt;strong&gt;Compiler&lt;/strong&gt; output bytecode directly, it now outputs IR entities, and a new &lt;strong&gt;IRCompiler&lt;/strong&gt; has been introduced. All it has to do is map an IR entity to its instruction, and turn it into a &lt;code&gt;Word&lt;/code&gt; so that it can be written to disk.&lt;/p&gt;

&lt;p&gt;An important step is to compute the labels addresses so that we can generate correct &lt;code&gt;JUMP&lt;/code&gt;, &lt;code&gt;POP_JUMP_IF_TRUE&lt;/code&gt; and &lt;code&gt;POP_JUMP_IF_FALSE&lt;/code&gt; instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IR&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;label_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;label_to_position&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;IR&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Kind&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;label_to_position&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="c1"&gt;// the label isn't an instruction,&lt;/span&gt;
            &lt;span class="c1"&gt;// do not update `pos` here&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nl"&gt;default:&lt;/span&gt;
            &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Detecting sequence of entities that can be combined
&lt;/h3&gt;

&lt;p&gt;This one was way easier than I thought, all we have to do is iterate on the given IR blocks, and match the current entity and the next one with a known pattern. The only thing to be aware of is jumping over the instructions we managed to fuse, so that we do not push left over instructions in our optimized IR.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;IROptimizer&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IR&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;symbols&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ValTableElem&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m_symbols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;symbols&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;m_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;m_ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;IR&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Block&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;current_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m_ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;back&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// alias to ease the writing of the rules below:&lt;/span&gt;
            &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;arg_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;primaryArg&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

            &lt;span class="c1"&gt;// if we have at least two instructions left,&lt;/span&gt;
            &lt;span class="c1"&gt;// we can try to match for super instructions patterns&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Instruction&lt;/span&gt; &lt;span class="n"&gt;second&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                &lt;span class="c1"&gt;// only the `primaryArg` is needed as we will check&lt;/span&gt;
                &lt;span class="c1"&gt;// for normal instructions below, which have&lt;/span&gt;
                &lt;span class="c1"&gt;// a single argument&lt;/span&gt;
                &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;arg_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;primaryArg&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

                &lt;span class="c1"&gt;// LOAD_CONST x&lt;/span&gt;
                &lt;span class="c1"&gt;// LOAD_CONST y&lt;/span&gt;
                &lt;span class="c1"&gt;// ---&amp;gt; LOAD_CONST_LOAD_CONST x y&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;LOAD_CONST&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;second&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;LOAD_CONST&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;current_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;LOAD_CONST_LOAD_CONST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg_2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="c1"&gt;// ...&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="c1"&gt;// otherwise we should not forget to add the&lt;/span&gt;
                    &lt;span class="c1"&gt;// other instructions to the output IR, we don't&lt;/span&gt;
                    &lt;span class="c1"&gt;// want just the optimzed IR!&lt;/span&gt;
                    &lt;span class="n"&gt;current_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
                    &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;current_block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
                &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance gain!
&lt;/h2&gt;

&lt;p&gt;Who would have thought that avoiding a series of &lt;code&gt;LOAD_CONST&lt;/code&gt;, &lt;code&gt;STORE&lt;/code&gt; and using a single &lt;code&gt;LOAD_CONST_STORE&lt;/code&gt; instruction would be so beneficial? We are avoiding a push -&amp;gt; pop and immediately putting a value from our constants table inside a variable.&lt;/p&gt;

&lt;p&gt;Applying this pattern to increment (&lt;code&gt;LOAD_SYMBOL a&lt;/code&gt;, &lt;code&gt;LOAD_CONST 1&lt;/code&gt;, &lt;code&gt;ADD&lt;/code&gt; becomes &lt;code&gt;INCREMENT a&lt;/code&gt;), decrement, and store the head or tail of a list in a variable helps, tremendously according to the benchmarks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                          |           | 5-c7f632ff   | 6-28999c0f
--------------------------+-----------+--------------+--------------------
 quicksort                | real_time | 0.168728ms   | -0.014 (-8.2280%)
                          | cpu_time  | 0.168515ms   | -0.014 (-8.2479%)
 ackermann/iterations:50  | real_time | 68.31ms      | -7.278 (-10.6545%)
                          | cpu_time  | 68.2342ms    | -7.259 (-10.6384%)
 fibonacci/iterations:100 | real_time | 6.62604ms    | -0.160 (-2.4159%)
                          | cpu_time  | 6.61819ms    | -0.161 (-2.4266%)
 man_or_boy               | real_time | 0.0169651ms  | -0.001 (-7.6834%)
                          | cpu_time  | 0.0160228ms  | -0.000 (-2.3673%)
 builtins                 | real_time | 0.622685ms   | -0.037 (-5.8815%)
                          | cpu_time  | 0.621938ms   | -0.037 (-5.9495%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first column, &lt;code&gt;5-c7f632ff&lt;/code&gt; is our reference benchmark (based on ArkScript commit &lt;code&gt;684ea758&lt;/code&gt;), and the second one, &lt;code&gt;5-ee9ff764&lt;/code&gt;, is the result of implementing our IR and IR optimizer. Quite the improvement, we gained at least 2% and at most 10% on every single benchmark!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/implementing_an_intermediate_representation/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>optimization</category>
      <category>pldev</category>
      <category>cpp</category>
      <category>arkscript</category>
    </item>
    <item>
      <title>Implementing computed gotos in C++</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Wed, 25 Sep 2024 11:19:49 +0000</pubDate>
      <link>https://dev.to/lexplt/implementing-computed-gotos-in-c-193p</link>
      <guid>https://dev.to/lexplt/implementing-computed-gotos-in-c-193p</guid>
      <description>&lt;p&gt;A common idiom in virtual machines or state machines is to read data from a list, execute some code depending on the value we read, advance in the list, rinse and repeat. That could be written as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readBytecode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THIS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THAT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this works, there are other ways that do the same thing but are also better in term of performances. This current approach isn't very branch-predictor friendly because we read data from a vector, select code to execute, break, and repeat. The branch-predictor can not learn what common instructions follow each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Definition
&lt;/h2&gt;

&lt;p&gt;A way to solve that is to use &lt;em&gt;computed gotos&lt;/em&gt;. We read an instruction, load the next and jump to its code. No more instruction selection, a single stream of &lt;em&gt;code to run&lt;/em&gt; -&amp;gt; &lt;em&gt;jump&lt;/em&gt; -&amp;gt; &lt;em&gt;repeat&lt;/em&gt;, which pleases the branch predictor. It can now learn what instruction Y often follows another instruction X and preload code (even though it can still fail, which degrades performances).&lt;/p&gt;

&lt;h2&gt;
  
  
  Modifying our code for compute gotos
&lt;/h2&gt;

&lt;p&gt;We'll use the code shown previously as a basis&lt;/p&gt;

&lt;h3&gt;
  
  
  Using gotos
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readBytecode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;dispatch_op:&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THIS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;dispatch_op&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THAT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;dispatch_op&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we replaced our &lt;code&gt;while&lt;/code&gt; loop with a &lt;code&gt;switch&lt;/code&gt;, a label and a &lt;code&gt;goto&lt;/code&gt;, nearly achieving the same thing as before. &lt;em&gt;Nearly&lt;/em&gt; because we are now loading the next instruction before our &lt;code&gt;goto&lt;/code&gt;, and we don't have any &lt;code&gt;if (pos &amp;lt; bytecode.size())&lt;/code&gt; anymore.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
While we could add an &lt;code&gt;if (end of bytecode)&lt;/code&gt; condition before our goto, an easier solution would be to add a special &lt;code&gt;STOP_INTERPRETER&lt;/code&gt; instruction, implemented like this:&lt;/p&gt;


&lt;pre class="highlight plaintext"&gt;&lt;code&gt;case STOP_INTERPRETER:
    break;  // or another goto label_end;
            // with label_end after the switch
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;This code isn't any faster or slower than the previous implementation, it is just another way (though not a recommended one due to the presence of &lt;code&gt;goto&lt;/code&gt;s) to write a loop, but that will help us for the next transformations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making our code slightly better with macros
&lt;/h3&gt;

&lt;p&gt;Now, we have to add instruction fetching into every case. We could make this easier using macros:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define FETCH_INSTRUCTION()    \
    do {                       \
        inst = bytecode[pos];  \
        pos++;                 \
    } while (false)
#define DISPATCH_GOTO() goto dispatch_op
#define DISPATCH()        \
    FETCH_INSTRUCTION();  \
    DISPATCH_GOTO()
&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readBytecode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;dispatch_op:&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THIS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;DO_THAT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;STOP_INTERPRETER&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;// we can't have a label at the end of a block in C++98-20,&lt;/span&gt;
    &lt;span class="c1"&gt;// this only works in C++23 and onward&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Computed gotos
&lt;/h3&gt;

&lt;p&gt;With a gcc extension, we can take the address of a label, and store it in an array. Using this, we can &lt;code&gt;goto array[index];&lt;/code&gt; and jump at a given label. What if we put one label per instruction now?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define FETCH_INSTRUCTION()    \
    do {                       \
        inst = bytecode[pos];  \
        pos++;                 \
    } while (false)
&lt;/span&gt;
&lt;span class="cp"&gt;#define DISPATCH_GOTO() goto opcodes[inst]
#define TARGET(op) TARGET_##op
&lt;/span&gt;
&lt;span class="cp"&gt;#define DISPATCH()        \
    FETCH_INSTRUCTION();  \
    DISPATCH_GOTO()
&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readBytecode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="n"&gt;opcodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_DO_THIS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_DO_THAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_STOP_INTERPRETER&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DO_THIS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DO_THAT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STOP_INTERPRETER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Everything together
&lt;/h3&gt;

&lt;p&gt;With some conditions and more macros, we could have a dual implementation, generating a &lt;code&gt;switch&lt;/code&gt; or a computed gotos table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#define FETCH_INSTRUCTION()    \
    do {                       \
        inst = bytecode[pos];  \
        pos++;                 \
    } while (false)
&lt;/span&gt;
&lt;span class="cp"&gt;#if USE_COMPUTED_GOTO
#  define DISPATCH_GOTO() goto opcodes[inst]
#  define TARGET(op) TARGET_##op
#else
#  define DISPATCH_GOTO() goto dispatch_op
#  define TARGET(op) case op:
#end
&lt;/span&gt;
&lt;span class="cp"&gt;#define DISPATCH()        \
    FETCH_INSTRUCTION();  \
    DISPATCH_GOTO()
&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;readBytecode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;inst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="cp"&gt;#if !USE_COMPUTED_GOTO
&lt;/span&gt;    &lt;span class="nl"&gt;dispatch_op:&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="cp"&gt;#else
&lt;/span&gt;    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="n"&gt;opcodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_DO_THIS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_DO_THAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;TARGET_STOP_INTERPRETER&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="cp"&gt;#end
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DO_THIS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DO_THAT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// ...&lt;/span&gt;
            &lt;span class="n"&gt;DISPATCH&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;TARGET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STOP_INTERPRETER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;goto&lt;/span&gt; &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;label_end&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;I've implemented this in &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt;, a small scripting language I've been working on for a few years now, and this has yielded about a 10% performance improvement:&lt;/p&gt;

&lt;p&gt;Machine (M1 MBP):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run on (8 X 24 MHz CPU s)&lt;/li&gt;
&lt;li&gt;CPU Caches:

&lt;ul&gt;
&lt;li&gt;L1 Data 64 KiB&lt;/li&gt;
&lt;li&gt;L1 Instruction 128 KiB&lt;/li&gt;
&lt;li&gt;L2 Unified 4096 KiB (x8)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Load Average: 3.62, 2.42, 2.46
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
quicksort                             0.223 ms        0.222 ms         3125
ackermann/iterations:50                97.0 ms         96.9 ms           50
fibonacci/iterations:100               9.23 ms         9.22 ms          100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Load Average: 2.87, 2.73, 3.07
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
quicksort                             0.218 ms        0.218 ms         3231
ackermann/iterations:50                88.9 ms         88.9 ms           50
fibonacci/iterations:100               8.58 ms         8.57 ms          100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.codingconfessions.com/p/cpython-vm-internals" rel="noopener noreferrer"&gt;The Design &amp;amp; Implementation of the CPython Virtual Machine&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/computed_gotos/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>optimization</category>
    </item>
    <item>
      <title>Adding short-circuiting in a bytecode interpreter</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Mon, 16 Sep 2024 13:44:51 +0000</pubDate>
      <link>https://dev.to/lexplt/adding-short-circuiting-in-a-bytecode-interpreter-37lj</link>
      <guid>https://dev.to/lexplt/adding-short-circuiting-in-a-bytecode-interpreter-37lj</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/lexplt/function-calls-in-bytecode-interpreters-2l90"&gt;In the previous article&lt;/a&gt;, we saw how to compile functions and handle their scopes. We also saw how to optimize a certain kind of function calls, the tail call ones in &lt;a href="https://dev.to/lexplt/understanding-tail-call-optimization-3562"&gt;Understanding tail call optimization&lt;/a&gt;, now we are going to see yet another optimization, this time on conditions and expression evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic way to deal with conditionals
&lt;/h2&gt;

&lt;p&gt;Following the previous article(s), we could implement &lt;code&gt;and&lt;/code&gt; / &lt;code&gt;or&lt;/code&gt; as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# a and b
LOAD a
LOAD b
AND
# foo() and bar()
LOAD foo
CALL 0
LOAD bar
CALL 0
AND
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, each argument have to be evaluated before calling the operator, which is not something we might want! In mainstream languages like C, C++, Java, Python... the expression &lt;code&gt;a and b&lt;/code&gt; would evaluate &lt;code&gt;b&lt;/code&gt; only if &lt;code&gt;a&lt;/code&gt; is &lt;strong&gt;true&lt;/strong&gt;. If &lt;code&gt;a&lt;/code&gt; is &lt;strong&gt;false&lt;/strong&gt;, no need to evaluate the rest of the expression: it will be &lt;strong&gt;false&lt;/strong&gt;. Same thing goes for &lt;code&gt;or&lt;/code&gt;, if &lt;code&gt;a&lt;/code&gt; is &lt;strong&gt;true&lt;/strong&gt;, no need to evaluate the rest of the expression: it will be &lt;strong&gt;true&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding how short-circuiting works
&lt;/h2&gt;

&lt;p&gt;What we want is evaluate arguments one at a time in an expression, and immediately stop if we have a &lt;strong&gt;false&lt;/strong&gt; in an &lt;code&gt;and&lt;/code&gt; expression, or &lt;strong&gt;true&lt;/strong&gt; in an &lt;code&gt;or&lt;/code&gt; expression. In Python you could write it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# vvv
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# else: ...
&lt;/span&gt;
&lt;span class="c1"&gt;# -----------------
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# vvv
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# &amp;lt;-+--- this is exactly the same code,
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                   &lt;span class="c1"&gt;#   |    actually not duplicated once
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;               &lt;span class="c1"&gt;#   |    compiled.
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-/
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;(and) We check if &lt;code&gt;a&lt;/code&gt; is &lt;strong&gt;true&lt;/strong&gt;, if not do nothing. Otherwise, evaluate &lt;code&gt;b&lt;/code&gt; and act accordingly.&lt;/li&gt;
&lt;li&gt;(or) We check if &lt;code&gt;a&lt;/code&gt; is &lt;strong&gt;true&lt;/strong&gt; and execute our code. Otherwise evaluate &lt;code&gt;b&lt;/code&gt; and if &lt;strong&gt;true&lt;/strong&gt; execute our code.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Implementation in bytecode
&lt;/h2&gt;

&lt;p&gt;Let's see how &lt;code&gt;and&lt;/code&gt; is implemented in &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt; (as of 16/09/2024):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOAD_SYMBOL a
DUP
POP_JUMP_IF_FALSE (after)  ---`
POP                           |
LOAD_SYMBOL b                 |
                  &amp;lt;-----------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;We load our &lt;code&gt;a&lt;/code&gt; variable as before and duplicate it&lt;/li&gt;
&lt;li&gt;We duplicate it for later use&lt;/li&gt;
&lt;li&gt;We pop the duplicate, if it is &lt;strong&gt;false&lt;/strong&gt; we jump after the expression, on the next instruction (which could be a store or another compare ; thanks to the duplicate, we still have the &lt;strong&gt;false&lt;/strong&gt; value here at the end&lt;/li&gt;
&lt;li&gt;If it is &lt;strong&gt;true&lt;/strong&gt;, we pop the original value, we don't need it anymore and just load &lt;code&gt;b&lt;/code&gt;. As in &lt;code&gt;3.&lt;/code&gt; we continue with no special treatment with the next instruction, now having our boolean loaded to compare it, store it...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And that's it, we don't have to have a specific implementation to handle &lt;code&gt;if a and b&lt;/code&gt; and &lt;code&gt;val = a and b&lt;/code&gt; separately, as we jump on the next instruction, which might be a &lt;code&gt;POP_JUMP_IF_FALSE&lt;/code&gt; in the case of a condition (to avoid executing the code of the condition), or a &lt;code&gt;STORE&lt;/code&gt; in case of a variable assignment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
Implementing &lt;code&gt;or&lt;/code&gt; would be practically identical, apart from the &lt;code&gt;POP_JUMP_IF_FALSE&lt;/code&gt; instruction that would be a &lt;code&gt;POP_JUMP_IF_TRUE&lt;/code&gt;!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/shortcircuiting_in_bytecode_interpreter/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>bytecode</category>
      <category>vm</category>
      <category>interpreter</category>
      <category>programming</category>
    </item>
    <item>
      <title>Comparing Python and ArkScript asynchronous models</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Tue, 03 Sep 2024 21:02:29 +0000</pubDate>
      <link>https://dev.to/lexplt/comparing-python-and-arkscript-asynchronous-models-3l60</link>
      <guid>https://dev.to/lexplt/comparing-python-and-arkscript-asynchronous-models-3l60</guid>
      <description>&lt;p&gt;Python has received a lot of attention lately. The 3.13 release, planned for October this year, will begin the huge work of &lt;a href="https://peps.python.org/pep-0703/" rel="noopener noreferrer"&gt;removing the GIL&lt;/a&gt;. A &lt;a href="https://www.python.org/downloads/release/python-3130rc1/" rel="noopener noreferrer"&gt;prerelease&lt;/a&gt; is already out for the curious users who want to try a (nearly) GIL-less Python.&lt;/p&gt;

&lt;p&gt;All this hype made me dig in my own language, &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt;, as I had a Global VM Lock, too, in the past (added in version 3.0.12, in 2020, removed in 3.1.3 in 2022), to compare things and force me to dig deeper into the how and why of the Python GIL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Definitions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;To get started, let's define what a GIL (&lt;em&gt;Global interpreter lock&lt;/em&gt;) is:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;A global interpreter lock (GIL) is a mechanism used in computer-language interpreters to synchronize the execution of threads so that only one native thread (per process) can execute basic operations (such as memory allocation and reference counting) at a time.&lt;br&gt;&lt;br&gt;
&lt;a href="https://en.m.wikipedia.org/wiki/Global_interpreter_lock" rel="noopener noreferrer"&gt;Wikipedia — Global interpreter lock&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concurrency&lt;/strong&gt; is when two or more tasks can start, run and complete in overlapping time periods, but that doesn't mean they will both be running simultaneously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallelism&lt;/strong&gt; is when tasks literally run at the same time, eg on a multicore processor.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For an in-depth explanation, check &lt;a href="https://stackoverflow.com/a/24684037" rel="noopener noreferrer"&gt;this Stack Overflow answer&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python's GIL
&lt;/h2&gt;

&lt;p&gt;The GIL can &lt;em&gt;increase the speed of single-threaded programs&lt;/em&gt; because you don't have to acquire and release locks on all data structures: the entire interpreter is locked so you are safe by default.&lt;/p&gt;

&lt;p&gt;However, since there is one GIL per interpreter, that limits parallelism: you need to spawn a whole new interpreter in a separate process (using the &lt;code&gt;multiprocessing&lt;/code&gt; module instead of &lt;code&gt;threading&lt;/code&gt;) to use more than one core! This has a greater cost than just spawning a new thread because you now have to worry about inter-process communication, which adds a non-negligible overhead (see &lt;a href="https://geekpython.in/gil-become-optional-in-python" rel="noopener noreferrer"&gt;GeekPython — GIL Become Optional in Python 3.13&lt;/a&gt; for benchmarks).&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it affect Python's async?
&lt;/h3&gt;

&lt;p&gt;In the case of Python, it lies down to the main implementation, CPython, not having thread-safe memory management. Without the GIL, the following scenario would generate a race condition:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create a shared variable &lt;code&gt;count = 5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;thread 1: &lt;code&gt;count *= 2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;thread 2: &lt;code&gt;count += 1&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If &lt;strong&gt;thread 1&lt;/strong&gt; runs first, &lt;code&gt;count&lt;/code&gt; will be 11 (&lt;code&gt;count * 2&lt;/code&gt; = 10, then &lt;code&gt;count + 1&lt;/code&gt; = 11).&lt;br&gt;&lt;br&gt;
If &lt;strong&gt;thread 2&lt;/strong&gt; runs first, &lt;code&gt;count&lt;/code&gt; will be 12 (&lt;code&gt;count + 1&lt;/code&gt; = 6, then &lt;code&gt;count * 2&lt;/code&gt; = 12).&lt;br&gt;&lt;br&gt;
The order of execution matters, but even worse can happen: if both threads read &lt;code&gt;count&lt;/code&gt; at the same time, one will erase the result of the other, and &lt;code&gt;count&lt;/code&gt; will be either 10 or 6!&lt;/p&gt;

&lt;p&gt;Overall, having a GIL makes the (CPython) implementation easier and faster in general cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;faster in the single-threaded case (no need to acquire/release a lock for every operation)&lt;/li&gt;
&lt;li&gt;faster in the multi-threaded case for IO-bound programs (because those happen outside the GIL)&lt;/li&gt;
&lt;li&gt;faster in the multi-threaded case for CPU-bound programs that do their compute-intensive work in C (because the GIL is released before calling the C code)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also makes wrapping C libraries easier, because you're guaranteed thread-safety thanks to the GIL.&lt;/p&gt;

&lt;p&gt;The downside is that your code is &lt;strong&gt;asynchronous&lt;/strong&gt; as in &lt;strong&gt;concurrent&lt;/strong&gt;, but &lt;strong&gt;not parallel&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
&lt;strong&gt;Python 3.13 is removing the GIL!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://peps.python.org/pep-0703/" rel="noopener noreferrer"&gt;PEP 703&lt;/a&gt; added a building configuration &lt;code&gt;--disable-gil&lt;/code&gt; so that upon installing Python 3.13+, you can benefit from performance improvements in multithreaded programs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Python async/await model
&lt;/h3&gt;

&lt;p&gt;In Python, functions have to &lt;a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/" rel="noopener noreferrer"&gt;take a color&lt;/a&gt;: they are either "normal" or "async". What does this mean in practice?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call_me&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;call_me&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;a_bar&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_bar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;coroutine&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="n"&gt;a_bar&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mh"&gt;0x10491f480&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;stdin&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;RuntimeWarning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;coroutine&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a_bar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="n"&gt;was&lt;/span&gt; &lt;span class="n"&gt;never&lt;/span&gt; &lt;span class="n"&gt;awaited&lt;/span&gt;
&lt;span class="nb"&gt;RuntimeWarning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Enable&lt;/span&gt; &lt;span class="n"&gt;tracemalloc&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;get&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="n"&gt;allocation&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="mi"&gt;6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because an asynchronous function does not return a value immediately, but rather invokes a coroutine, we can't use them everywhere as callbacks, unless the function we are calling is designed to take &lt;code&gt;async&lt;/code&gt; callbacks.&lt;/p&gt;

&lt;p&gt;We get a hierarchy of functions, because "normal" functions need to be made &lt;code&gt;async&lt;/code&gt; to use the &lt;code&gt;await&lt;/code&gt; keyword, needed to call asynchronous functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         can call
normal -----------&amp;gt; normal

         can call
async -+-----------&amp;gt; normal
       |
       .-----------&amp;gt; async                    

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apart from trusting the caller, there is no way to know if a callback is async or not (unless you try to call it first inside a &lt;code&gt;try&lt;/code&gt;/&lt;code&gt;except&lt;/code&gt; block to check for an exception, but that's ugly).&lt;/p&gt;

&lt;h2&gt;
  
  
  ArkScript parallelism
&lt;/h2&gt;

&lt;p&gt;In the beginning, ArkScript was using a Global VM Lock (akin to Python's GIL), because the &lt;code&gt;http.arkm&lt;/code&gt; module (used to create HTTP servers) was multithreaded and it caused problems with ArkScript's VM by altering its state through modifying variables and calling functions on multiple threads.&lt;/p&gt;

&lt;p&gt;Then in 2021, I started working on a new model to handle the VM state so that we could parallelize it easily, and wrote &lt;a href="https://lexp.lt/posts/parallelizing_a_bytecode_interpreter/" rel="noopener noreferrer"&gt;an article about it&lt;/a&gt;. It was later &lt;a href="https://github.com/ArkScript-lang/Ark/commit/743c2c94de0bcd8299cbcf41b4a57d825e742745" rel="noopener noreferrer"&gt;implemented&lt;/a&gt; by the end of 2021, and the Global VM Lock was removed.&lt;/p&gt;

&lt;h3&gt;
  
  
  ArkScript async/await
&lt;/h3&gt;

&lt;p&gt;ArkScript does not assign a color to &lt;code&gt;async&lt;/code&gt; functions, because they do not exist in the language: you either have a function or a closure, and both can call each other without any additional syntax (a closure is &lt;a href="https://wiki.c2.com/?ClosuresAndObjectsAreEquivalent=" rel="noopener noreferrer"&gt;a poor man object&lt;/a&gt;, in this language: a function holding a mutable state).&lt;/p&gt;

&lt;p&gt;Any function can be made &lt;code&gt;async&lt;/code&gt; at the &lt;em&gt;call site&lt;/em&gt; (instead of declaration):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="nv"&gt;b&lt;/span&gt; &lt;span class="nv"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;a&lt;/span&gt; &lt;span class="nv"&gt;b&lt;/span&gt; &lt;span class="nv"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;future&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;async&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="nv"&gt;future&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="nv"&gt;UserType&amp;lt;0,&lt;/span&gt; &lt;span class="nv"&gt;0x0x7f0e84d85dd0&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;await&lt;/span&gt; &lt;span class="nv"&gt;future&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;await&lt;/span&gt; &lt;span class="nv"&gt;future&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the &lt;code&gt;async&lt;/code&gt; builtin, we are spawning a &lt;code&gt;std::future&lt;/code&gt; under the hood (leveraging &lt;a href="https://en.cppreference.com/w/cpp/thread/async" rel="noopener noreferrer"&gt;std::async&lt;/a&gt; and threads) to run our function given a set of arguments. Then we can call &lt;code&gt;await&lt;/code&gt; (another builtin) and get a result whenever we want, which will block the current VM thread until the function returns.&lt;br&gt;&lt;br&gt;
Thus, it is possible to &lt;code&gt;await&lt;/code&gt; from any function, and from any thread.&lt;/p&gt;
&lt;h3&gt;
  
  
  The specificities
&lt;/h3&gt;

&lt;p&gt;All of this is possible because we have a single VM that operates on a state contained inside an &lt;code&gt;Ark::internal::ExecutionContext&lt;/code&gt;, which is tied to a single thread. The VM is shared between the threads, not the contexts!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        .---&amp;gt; thread 0, context 0
        |            ^
VM &amp;lt;----+       can't interact
        |            v
        .---&amp;gt; thread 1, context 1              

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When creating a &lt;em&gt;future&lt;/em&gt; by using &lt;code&gt;async&lt;/code&gt;, we are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;copying all the arguments to the new context,&lt;/li&gt;
&lt;li&gt;creating a brand new stack and scopes,&lt;/li&gt;
&lt;li&gt;finally create a separate thread.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This forbids any sort of synchronization between threads since ArkScript does not expose references or any kind of lock that could be shared (this was done for simplicity reasons, as the language aims to be somewhat minimalist but still usable).&lt;/p&gt;

&lt;p&gt;However this approach isn't better (nor worse) than Python's, as we create a new thread per call, and the number of threads per CPU is limited, which is a bit costly. Luckily I don't see that as problem to tackle, as one should never create hundreds or thousands of threads simultaneously nor call hundreds or thousands of async Python functions simultaneously: both would result in a huge slow down of your program.&lt;br&gt;&lt;br&gt;
In the first case, this would slowdown your process (even computer) as the OS is juggling to give time to every thread ; in the second case it is Python's scheduler that would have to juggle between all of your coroutines.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!NOTE]&lt;br&gt;
Out of the box, ArkScript does not provide mechanisms for thread synchronization, but even if we pass a &lt;code&gt;UserType&lt;/code&gt; (which is a wrapper on top of &lt;em&gt;type-erased&lt;/em&gt; C++ objects) to a function, the underlying object isn't copied.&lt;br&gt;&lt;br&gt;
With some careful coding, one could create a lock using the &lt;code&gt;UserType&lt;/code&gt; construct, that would allow synchronization between threads.&lt;/p&gt;


&lt;pre class="highlight common_lisp"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;module:createLock&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;fun&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="nv"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;str:format&lt;/span&gt; &lt;span class="s"&gt;"hello {}"&lt;/span&gt; &lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="nv"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;}&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;async&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;async&lt;/span&gt; &lt;span class="nv"&gt;foo&lt;/span&gt; &lt;span class="nv"&gt;lock&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;ArkScript and Python use two very different kinds of &lt;code&gt;async&lt;/code&gt; / &lt;code&gt;await&lt;/code&gt;: the first one requires the use of &lt;code&gt;async&lt;/code&gt; at the call site and spawns a new thread with its own context, while the latter requires the programmer to mark functions as &lt;code&gt;async&lt;/code&gt; to be able to use &lt;code&gt;await&lt;/code&gt;, and those &lt;code&gt;async&lt;/code&gt; functions are coroutines, running in the same thread as the interpreter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://softwareengineering.stackexchange.com/questions/186889/why-was-python-written-with-the-gil" rel="noopener noreferrer"&gt;Stack Exchange — Why was Python written with the GIL?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://wiki.python.org/moin/GlobalInterpreterLock" rel="noopener noreferrer"&gt;Python Wiki — GlobalInterpreterLock&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/" rel="noopener noreferrer"&gt;stuffwithstuff - What color is your function?&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/python_and_arkscript_async/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>arkscript</category>
    </item>
    <item>
      <title>Finding bugs by fuzzing your code</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Sun, 06 Nov 2022 12:12:59 +0000</pubDate>
      <link>https://dev.to/lexplt/finding-bugs-by-fuzzing-your-code-1b0e</link>
      <guid>https://dev.to/lexplt/finding-bugs-by-fuzzing-your-code-1b0e</guid>
      <description>&lt;p&gt;If you have ever worked on a large scale project, you know that finding and tracking bugs can be very tedious and lengthy.&lt;/p&gt;

&lt;p&gt;Did you know it could be automated? There are multiple ways to achieve this, starting with unit &amp;amp; integration tests run regularly to detect regressions, end-to-end tests to ensure a functionality is behaving as intended, and much more.&lt;/p&gt;

&lt;p&gt;However, writing those tests is also a lengthy process, and you miss some hard to find bugs. We are going to focus on fuzzing, an automated crash detection process.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is fuzzing?
&lt;/h2&gt;

&lt;p&gt;According to wikipedia, fuzzing &lt;em&gt;is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program&lt;/em&gt; (&lt;a href="https://en.wikipedia.org/wiki/Fuzzing" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Fuzzing&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Given a set of inputs your program can work on, a fuzzer will generate as much diverging data as possible and feed it to your program, recording each crash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fuzzing in practice
&lt;/h2&gt;

&lt;p&gt;If you don't know me yet, I am the developper of &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt;, an easy to embed scripting language, and I have worked on dozen of new functionalities this past year. However, this can (and it has) introduce bugs, sometimes quite tricky to find.&lt;/p&gt;

&lt;p&gt;Fuzzing comes to the rescue here! I didn't want (nor had time) to write thousands of tests by hand, so I just wrote the basic tests, checking that every good input leads to the expected output. What was missing was the "bad input leads to bad output" kind of tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing AFL++
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/AFLplusplus/AFLplusplus" rel="noopener noreferrer"&gt;AFL++&lt;/a&gt; is a superior fork of AFL (American Fuzzy Lop), a fuzzer originally developed by Google.&lt;/p&gt;

&lt;p&gt;Here is how it operates:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FAFLplusplus%2FAFLplusplus%2Fdev%2Fdocs%2Fresources%2F0_fuzzing_process_overview.drawio.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FAFLplusplus%2FAFLplusplus%2Fdev%2Fdocs%2Fresources%2F0_fuzzing_process_overview.drawio.svg" alt="How AFL++ operates" width="1052" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is very easy to use, you have to recompile your project using afl-cc and/or afl-c++, give it an input corpus, and let it work for you until you are satisfied.&lt;/p&gt;
&lt;h3&gt;
  
  
  Generating an input corpus
&lt;/h3&gt;

&lt;p&gt;Since I wanted to fuzz a programming language, my input corpus was easy to put together: code samples, parts of the standard library, some tests files.&lt;/p&gt;

&lt;p&gt;The process is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;given the corpus, we want to generate an unique corpus to remove inputs from the corpus that do not produce a new path/coverage in the target &lt;/li&gt;
&lt;li&gt;minimizing the corpus: the shorter the input files that still traverse the same path within the target, the better the fuzzing will be.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# step 1)&lt;/span&gt;
afl-cmin &lt;span class="nt"&gt;-i&lt;/span&gt; fuzzing/corpus &lt;span class="nt"&gt;-o&lt;/span&gt; fuzzing/unique &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;buildFolder&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/arkscript @@ &lt;span class="nt"&gt;-L&lt;/span&gt; ./lib

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; fuzzing/input
&lt;span class="nb"&gt;cd &lt;/span&gt;fuzzing/unique

&lt;span class="c"&gt;# step 2)&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;afl-tmin &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"../input/&lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; ../../&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;buildFolder&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/arkscript @@ &lt;span class="nt"&gt;-L&lt;/span&gt; ../../lib
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You may have noticed the &lt;code&gt;-- ${buildFolder}/arkscript @@ -L ./lib&lt;/code&gt; bit: this is the command to run the inputs against, with &lt;code&gt;@@&lt;/code&gt; being the filename of the input generated by AFL++.&lt;/p&gt;

&lt;p&gt;Then we can run the fuzzer as follows, and get crashes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;afl-fuzz &lt;span class="nt"&gt;-i&lt;/span&gt; fuzzing/input &lt;span class="nt"&gt;-o&lt;/span&gt; fuzzing/output &lt;span class="nt"&gt;-s&lt;/span&gt; 0 &lt;span class="nt"&gt;-m&lt;/span&gt; 64 &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;buildFolder&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/arkscript @@ &lt;span class="nt"&gt;-L&lt;/span&gt; ./lib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-s 0&lt;/code&gt; is here to set the RNG seed to 0, to be able to reproduce RNG based crashes more easily ; every crash will be stored under &lt;code&gt;fuzzing/output/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;-m 64&lt;/code&gt; sets a hard limit of 64MB of RAM for the target so that memory leaks can be more easily detected, and it will prevents potential out-of-memory problem for your computer. I tried to run with no limit, and sure enough my system crashed after 15 minutes (out of memory). I then played around with &lt;code&gt;-m&lt;/code&gt; value, 8MB is not enough and it prevents my target to starts.&lt;/p&gt;

&lt;p&gt;And here we are, the fuzzer is running and finding bugs for us.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1dp1selqy4bruv46yqxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1dp1selqy4bruv46yqxd.png" alt="AFL++ running" width="564" height="420"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Analysing crashes
&lt;/h3&gt;

&lt;p&gt;Now comes the hard part, reducing the input to find the smallest input sample which still generates the bug. Oftentimes, this has to be done by hand, and it is a tedious process, but finding those buggy inputs by hand would have taken much more time, so it's still a win!&lt;/p&gt;

&lt;p&gt;AFL++ has tools to minimize the crashes, helping you to find the bugs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;afl-tmin &lt;span class="nt"&gt;-i&lt;/span&gt; fuzzing/output/main/crashes/id... &lt;span class="nt"&gt;-o&lt;/span&gt; fuzzing/minimized_result &lt;span class="nt"&gt;--&lt;/span&gt; ./build/arkscript @@ &lt;span class="nt"&gt;-L&lt;/span&gt; ./lib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have your smallest input possible for a given crash, and you have fixed it, it is a good idea to keep it somewhere to be able to run the next version(s) of your program on it and see if it is still fixed. This has personally helped me starting a collection of bad inputs, to check in my tests if they are correctly handled.&lt;/p&gt;




&lt;p&gt;Some things to note about fuzzing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;a lot of crashes can be very similar, when the AFL++ fuzzer finds a bugs it will use it and derive it to find others&lt;/li&gt;
&lt;li&gt;because of 1) you might want to run multiple fuzzers at the same time, it will find more bugs, plus it was designed to work that way (one master instance and multiple variants)&lt;/li&gt;
&lt;li&gt;you don't have to limit the memory allocated to each fuzzer, but if you don't you might exhaust all your RAM&lt;/li&gt;
&lt;li&gt;a fuzzer can run for a very long time and not find anything, that doesn't mean your program is bug free!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Because fuzzing can require a lot of time and ressources, you might want to run those tests once in a while, for example for every new release instead of for every commit or test added.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/finding_bugs_with_fuzzing/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>fuzzing</category>
      <category>testing</category>
      <category>cpp</category>
    </item>
    <item>
      <title>Making your project available through Homebrew</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Wed, 07 Sep 2022 14:57:19 +0000</pubDate>
      <link>https://dev.to/lexplt/making-your-project-available-through-homebrew-1ll5</link>
      <guid>https://dev.to/lexplt/making-your-project-available-through-homebrew-1ll5</guid>
      <description>&lt;p&gt;As software developers, having our projects available easily to anyone is a goal, but it can be hard to achieve. Using package managers like &lt;code&gt;apt&lt;/code&gt;, &lt;code&gt;pacman&lt;/code&gt; or &lt;code&gt;brew&lt;/code&gt; has become an industry standard (compared to wget + compile it yourself + install it), but publishing a project on it can be quite tedious.&lt;/p&gt;

&lt;p&gt;In this article, we will go through the basics of creating a tap for homebrew (available on both Linux and MacOS).&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a tap
&lt;/h2&gt;

&lt;p&gt;A tap is an external source of formulae (installation scripts) for homebrew. Using them requires to add them with &lt;code&gt;brew tap user/repo&lt;/code&gt;. It is much easier than submitting a new formula to homebrew/core and waiting for it be approved (or rejected).&lt;/p&gt;

&lt;p&gt;Creating a new tap is as easy as&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# creates a folder under /opt/homebrew/Library/Taps/xxx/xxx&lt;/span&gt;
brew tap-new arkscript-lang/homebrew-arkscript

&lt;span class="c"&gt;# generates a formula in your newly created tap&lt;/span&gt;
brew create &lt;span class="nt"&gt;--cmake&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s1"&gt;'https://github.com/ArkScript-lang/Ark.git'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--HEAD&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--set-name&lt;/span&gt; &lt;span class="s1"&gt;'arkscript@3.3.0'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--set-version&lt;/span&gt; &lt;span class="s1"&gt;'3.3.0'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--tap&lt;/span&gt; arkscript-lang/homebrew-arkscript
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I specified the type of build needed with &lt;code&gt;--cmake&lt;/code&gt; (other templates are available for crystal, go, meson, python, node, ruby, perl and rust), the URL of my git repository (&lt;code&gt;--HEAD&lt;/code&gt; is here to tell brew that the URL is a repo, not a file). With &lt;code&gt;set-name&lt;/code&gt; I gave the formula's name, and then its version with &lt;code&gt;set-version&lt;/code&gt;. Finally, with &lt;code&gt;tap&lt;/code&gt; I gave it a repository (the user and repo name will have to match on GitHub/GitLab/etc).&lt;/p&gt;

&lt;h2&gt;
  
  
  Editing your formula
&lt;/h2&gt;

&lt;p&gt;Once this last command has been entered you will be entering your selected editor (for me it's vim) to edit the formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Documentation: https://docs.brew.sh/Formula-Cookbook&lt;/span&gt;
&lt;span class="c1"&gt;#                https://rubydoc.brew.sh/Formula&lt;/span&gt;
&lt;span class="c1"&gt;# PLEASE REMOVE ALL GENERATED COMMENTS BEFORE SUBMITTING YOUR PULL REQUEST!&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArkscriptAT330&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Formula&lt;/span&gt;
  &lt;span class="n"&gt;desc&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;homepage&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;license&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="s2"&gt;"https://github.com/ArkScript-lang/Ark.git"&lt;/span&gt;

  &lt;span class="n"&gt;depends_on&lt;/span&gt; &lt;span class="s2"&gt;"cmake"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="ss"&gt;:build&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;install&lt;/span&gt;
    &lt;span class="c1"&gt;# ENV.deparallelize  # if your formula fails when building in parallel&lt;/span&gt;
    &lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"cmake"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-S"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-B"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;std_cmake_args&lt;/span&gt;
    &lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"cmake"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"--build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"build"&lt;/span&gt;
    &lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"cmake"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"--install"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"build"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="c1"&gt;# `test do` will create, run in and delete a temporary directory.&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# This test will fail and we won't accept that! For Homebrew/homebrew-core&lt;/span&gt;
    &lt;span class="c1"&gt;# this will need to be a test that verifies the functionality of the&lt;/span&gt;
    &lt;span class="c1"&gt;# software. Run the test with `brew test arkscript@3.3.0`. Options passed&lt;/span&gt;
    &lt;span class="c1"&gt;# to `brew install` such as `--HEAD` also need to be provided to `brew test`.&lt;/span&gt;
    &lt;span class="c1"&gt;#&lt;/span&gt;
    &lt;span class="c1"&gt;# The installed folder is not in the path, so use the entire path to any&lt;/span&gt;
    &lt;span class="c1"&gt;# executables being tested: `system "#{bin}/program", "do", "something"`.&lt;/span&gt;
    &lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"false"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that because we used &lt;code&gt;--HEAD&lt;/code&gt; this formula will only work with &lt;code&gt;brew install --head&lt;/code&gt;. To remedy this, we will add an &lt;code&gt;url&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArkscriptAT330&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Formula&lt;/span&gt;
  &lt;span class="n"&gt;desc&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;homepage&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="s2"&gt;"https://github.com/ArkScript-lang/Ark.git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;tag: &lt;/span&gt;&lt;span class="s2"&gt;"v3.3.0"&lt;/span&gt;
  &lt;span class="n"&gt;license&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
  &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="s2"&gt;"https://github.com/ArkScript-lang/Ark.git"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It has to go right before the license field, which itself has to follow the SPDX license naming convention (&lt;a href="https://spdx.org/licenses/" rel="noopener noreferrer"&gt;https://spdx.org/licenses/&lt;/a&gt;), eg &lt;code&gt;MPL-2.0&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom steps
&lt;/h2&gt;

&lt;p&gt;Once every field has been filled, and default comments have been removed, we can play a little more with the formula's steps.&lt;/p&gt;

&lt;p&gt;For testing, I added a &lt;code&gt;post_install&lt;/code&gt; step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArkscriptAT330&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Formula&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;post_install&lt;/span&gt;
    &lt;span class="n"&gt;ohai&lt;/span&gt; &lt;span class="s2"&gt;"ℹ️  Add ARKSCRIPT_PATH="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;lib&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s2"&gt;"/Ark/ to your bashrc/zshrc"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This basically tells the user to add an environment variable to their shell configuration file for the project to work with &lt;code&gt;ohai&lt;/code&gt; (prints a message). There is &lt;code&gt;odie&lt;/code&gt; as well to display an error message and &lt;code&gt;opoo&lt;/code&gt; for warnings. You can find more steps and fields to customize your formula here: &lt;a href="https://rubydoc.brew.sh/Formula.html" rel="noopener noreferrer"&gt;https://rubydoc.brew.sh/Formula.html&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking for errors
&lt;/h2&gt;

&lt;p&gt;Now that you have written a formula, let's test it with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew audit &lt;span class="nt"&gt;--new&lt;/span&gt; arkscript@3.3.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it returns nothing, then you are good to go and you can publish your tap.&lt;/p&gt;

&lt;p&gt;You should build your formula to check for misconfiguration and errors using&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--build-from-source&lt;/span&gt; &amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;/&amp;lt;formula&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need to do the test again, just do a &lt;code&gt;brew remove &amp;lt;formula&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Publishing a tap
&lt;/h2&gt;

&lt;p&gt;First, you will have to create a repository on GitHub/GitLab if that wasn't already done, with the following name: &lt;code&gt;homebrew-&amp;lt;tap name&amp;gt;&lt;/code&gt;. The prefix is mandatory.&lt;/p&gt;

&lt;p&gt;Then, you can add a remote to your tap with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git remote add origin git@github.com:&amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Commit your work, and you can now push to your repository and voilà!&lt;/p&gt;

&lt;h2&gt;
  
  
  Using your tap
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;brew tap &amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;&lt;/code&gt;, you will add your tap to brew list of taps. Then your formulae will be available either as &lt;code&gt;brew install &amp;lt;formula&amp;gt;&lt;/code&gt; or &lt;code&gt;brew install &amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;/&amp;lt;formula&amp;gt;&lt;/code&gt; if the name is already taken in homebrew core.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going further
&lt;/h2&gt;

&lt;p&gt;Now that you have published your formula in a tap, and made your project easily available to anyone, you might want to go further and check this complete guide: &lt;a href="https://docs.brew.sh/Formula-Cookbook" rel="noopener noreferrer"&gt;https://docs.brew.sh/Formula-Cookbook&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/making_your_project_available_through_homebrew/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>brew</category>
      <category>tooling</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Understanding tail-call optimization</title>
      <dc:creator>Lex Plt</dc:creator>
      <pubDate>Sun, 20 Feb 2022 17:58:41 +0000</pubDate>
      <link>https://dev.to/lexplt/understanding-tail-call-optimization-3562</link>
      <guid>https://dev.to/lexplt/understanding-tail-call-optimization-3562</guid>
      <description>&lt;p&gt;Lately, I've been working on optimizations for my language, &lt;a href="https://arkscript-lang.dev" rel="noopener noreferrer"&gt;ArkScript&lt;/a&gt;, and finally take some time to add tail-call optimization to my compiler.&lt;/p&gt;

&lt;p&gt;In this article, I'll explain what is &lt;em&gt;tail-call optimization&lt;/em&gt;, why it seems easy to implement, and how to do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Definition
&lt;/h2&gt;

&lt;p&gt;Tail-call optimization is a method to optimize some recursive functions so that we don't have to create a bunch of stack frames to execute, and reuse only one. This leads to less memory being used and more performances, as we don't have to create and destroy a bunch of stack frames.&lt;/p&gt;

&lt;p&gt;The functions we can optimize with said methods are the tail recursive ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let factorial = (n, acc) {
    if (n &amp;lt;= 1)
        return acc
    else
        return factorial(n - 1, n * acc)
}

print(factorial(10, 1))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is called a tail recursive function because the last call of the function is a call &lt;em&gt;to itself&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Note that our &lt;code&gt;factorial&lt;/code&gt; function takes an additional parameter, an accumulator, because the following implementation wouldn't be tail recursive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let factorial = (n) {
    if (n &amp;lt;= 1)
        return 1
    else
        return n * factorial(n - 1)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;as the last operation of the function isn't a &lt;em&gt;call to itself&lt;/em&gt;, but the multiplication between &lt;code&gt;n&lt;/code&gt; and &lt;code&gt;factorial(n - 1)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why can we optimize tail recursive functions?
&lt;/h2&gt;

&lt;p&gt;The last thing these functions are doing is a call to themselves, thus all previous state can be discarded as it's either being passed as arguments to the function, or not being used at all. Thus, we don't have to keep a state somewhere in case the function returns and need something from this state (which &lt;strong&gt;is needed&lt;/strong&gt; in the version returning &lt;code&gt;n * factorial(n - 1)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Thus, we can rewrite the tail recursive functions with loops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let factorial = (n, acc) {
    while (true) {
        if (n &amp;lt;= 1)
            return acc

        // we replace the recursive call with this
        acc = n * acc
        n = n - 1
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to implement this optimization?
&lt;/h2&gt;

&lt;p&gt;When we look at the above snippet, it seems that replace the last function call of a function by its argument list and the values, and adding a jump to the beginning of the function, is all we need to do. In fact, this is right. The hard part is identifying a tail call.&lt;/p&gt;

&lt;p&gt;Given a recursive compiler on an abstract syntax tree, you'll need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;to keep track of the name of the current variable being compiled (here we need to know the variable name is &lt;code&gt;factorial&lt;/code&gt; when we are compiling its body)&lt;/li&gt;
&lt;li&gt;to know if the current node will be returned or not (if it's the value given to the &lt;code&gt;return&lt;/code&gt; keyword)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first part is quite easy, just send another argument to &lt;code&gt;compile(node)&lt;/code&gt; when compiling a variable definition. The second one is equally easy, add two arguments to your &lt;code&gt;handleFunctionCall(node)&lt;/code&gt; to tell it the name of the current variable (if any) and if the node is going to be returned or not.&lt;/p&gt;

&lt;p&gt;Then the implementation is as simple as this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;Compiler&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;handleFunctionCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;var_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;is_returned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// checking if the function we are calling has the same&lt;/span&gt;
    &lt;span class="c1"&gt;// name as the current function being compiled&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;constList&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;var_name&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;is_returned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// push the arguments in reverse order&lt;/span&gt;
        &lt;span class="c1"&gt;// because our calling convention is arg0, arg1...&lt;/span&gt;
        &lt;span class="c1"&gt;// but our stack will handle them in a LIFO fashion&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;constList&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;constList&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="c1"&gt;// jump to the top of the function&lt;/span&gt;
            &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Instruction&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;JUMP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;bytecode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pushNumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;_u16&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// normal function call&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diff between the old bytecode and the new should be as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="p"&gt;POP, STORE in n
POP, STORE in acc
LOAD n
LOAD_CONST 0  # 1
LE
POP_JUMP_IF_TRUE &amp;amp;if
&lt;/span&gt;# else:
&lt;span class="gd"&gt;- LOAD n               # |- computing acc * n
- LOAD acc             # |
- MUL                  # |-------------------
&lt;/span&gt;&lt;span class="gi"&gt;+ LOAD n
+ LOAD_CONST 0  # 1
+ SUB
&lt;/span&gt;&lt;span class="gd"&gt;- LOAD n               #   |- computing n - 1
- LOAD_CONST 0  # 1    #   |
- SUB                  #   |-----------------
&lt;/span&gt;&lt;span class="gi"&gt;+ LOAD n
+ LOAD acc
+ MUL
&lt;/span&gt;&lt;span class="gd"&gt;- LOAD factorial
- CALL 2
&lt;/span&gt;&lt;span class="gi"&gt;+ JUMP 0             # jump to the top of the function
&lt;/span&gt;&lt;span class="p"&gt;JUMP &amp;amp;end
&lt;/span&gt;# if:
&lt;span class="p"&gt;LOAD acc
&lt;/span&gt;# end:
&lt;span class="p"&gt;RET
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(blocks were named &lt;code&gt;&amp;amp;block&lt;/code&gt; in the example, but those labels aren't in the final bytecode, it's just to clarify)&lt;/p&gt;

&lt;p&gt;If your language doesn't use a &lt;code&gt;return&lt;/code&gt; keyword, you might run into some trouble (so did I) when trying to identify the returning node of your functions.&lt;/p&gt;

&lt;p&gt;What did was the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add a flag to all the &lt;code&gt;compile_*&lt;/code&gt; methods I have to know if the current node is terminal or not&lt;/li&gt;
&lt;li&gt;the only methods setting this to true is my &lt;code&gt;compile_function&lt;/code&gt;, in charge of compiling the body of a function&lt;/li&gt;
&lt;li&gt;then I alter this flag based on two things:

&lt;ul&gt;
&lt;li&gt;if we are compiling a block (with multiple subnodes), then all the nodes except the last one are &lt;em&gt;non&lt;/em&gt; terminal ; for the last one, the flag is passed as is&lt;/li&gt;
&lt;li&gt;if we are compiling a condition (&lt;code&gt;if&lt;/code&gt; block), then the flag is passed as is to both branches, &lt;code&gt;then&lt;/code&gt; and &lt;code&gt;else&lt;/code&gt;, since both can be terminal depending on the condition&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Finally, instead of checking for a &lt;code&gt;is_returned&lt;/code&gt; I checked for a &lt;code&gt;is_terminal&lt;/code&gt;. You can find the complete implementation of this implementation &lt;a href="https://github.com/ArkScript-lang/Ark/commit/c085dd5609de2fe5db7c6e0c888eb05a9637a0be" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally from &lt;a href="https://lexp.lt/posts/understanding_tail_call_optimization/" rel="noopener noreferrer"&gt;lexp.lt&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>vm</category>
    </item>
  </channel>
</rss>
