<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yota Toyama</title>
    <description>The latest articles on DEV Community by Yota Toyama (@raviqqe).</description>
    <link>https://dev.to/raviqqe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F42893%2Ff298fc53-909d-40e4-a83b-e221a07504d2.png</url>
      <title>DEV Community: Yota Toyama</title>
      <link>https://dev.to/raviqqe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raviqqe"/>
    <language>en</language>
    <item>
      <title>Announcing the Pen programming language v0.4</title>
      <dc:creator>Yota Toyama</dc:creator>
      <pubDate>Sun, 03 Jul 2022 18:50:06 +0000</pubDate>
      <link>https://dev.to/raviqqe/pen-hdo</link>
      <guid>https://dev.to/raviqqe/pen-hdo</guid>
      <description>&lt;p&gt;&lt;a href="https://pen-lang.org"&gt;The Pen programming language&lt;/a&gt; is a new parallel, concurrent, statically typed, functional programming language. I'm excited to announce &lt;a href="https://github.com/pen-lang/pen/releases/tag/v0.4.0"&gt;its v0.4 release&lt;/a&gt; here!&lt;/p&gt;

&lt;p&gt;I've been working on this programming language project for almost a year. Recently, we've released its new version with new syntax constructs, standard packages, &lt;a href="https://www.rust-lang.org/"&gt;Rust&lt;/a&gt; FFI (Foreign Function Interface) and complementary tools like formatter and documentation generator.&lt;/p&gt;

&lt;p&gt;In this post, I would like to introduce &lt;a href="https://pen-lang.org"&gt;the Pen programming language&lt;/a&gt;, and describe its current status and new features included in the latest release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;To try out he Pen, you can use &lt;a href="https://brew.sh"&gt;Homebrew&lt;/a&gt; to install it on Linux, macOS, and &lt;a href="https://docs.microsoft.com/en-us/windows/wsl/install"&gt;WSL&lt;/a&gt; on Windows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;pen-lang/pen/pen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more information on how to write and build programs in Pen, see &lt;a href="https://pen-lang.org/introduction/getting-started.html"&gt;Getting started&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The Pen programming language is a general-purpose, strongly and statically typed, functional programming language with automatic precise memory management based on ownerships. And it's also made for &lt;em&gt;scalable&lt;/em&gt; software development. It aims to make development of large-scale applications easy and scalable by focusing on software &lt;strong&gt;maintainability&lt;/strong&gt; and &lt;strong&gt;portability&lt;/strong&gt;. Programs written in Pen should be simple, testable, and flexible against changes in the real world.&lt;/p&gt;

&lt;p&gt;The language that influenced Pen the most is &lt;a href="https://go.dev/"&gt;Go&lt;/a&gt;. You can also call Pen a functional descendant of Go with focus on application programming. Pen shares the same goal of Go like simplicity, reliability, efficiency, etc. Though, Pen explicitly excludes system programming from its domain to pursue further simplicity, portability, and developer productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current status
&lt;/h2&gt;

&lt;p&gt;Currently, the language is at the stage of syntax finalization for v1. The last missing piece of the syntax constructs is &lt;a href="https://github.com/pen-lang/pen/discussions/1083"&gt;generic built-in functions&lt;/a&gt; similar to Go's. They are expected to be called directly and behave similarly to built-in operators rather than normal functions. By introducing them, Pen can implement more built-in operations without increasing the language's complexity due to addition of syntax constructs. Also, that makes Pen more flexible as a language against requirement changes in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Changes in v0.4
&lt;/h2&gt;

&lt;p&gt;Since a v0.4 release of the language is dependent on and had been blocked by a release of &lt;a href="https://releases.llvm.org/14.0.0/docs/ReleaseNotes.html"&gt;LLVM 14&lt;/a&gt;, it includes many new features in the language and complementary tools. Here are major ones of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Perceus reference counting GC
&lt;/h3&gt;

&lt;p&gt;One of the biggest changes in a compiler is implementation of &lt;a href="https://www.microsoft.com/en-us/research/publication/perceus-garbage-free-reference-counting-with-reuse/"&gt;the Perceus reference counting GC&lt;/a&gt; (Garbage Collection.) It is one of the state-of-the-art GC algorithms for functional programming languages. Compared with non-reference-counting GCs which are more popular in other functional programming languages, such as OCaml and Haskell, its implementation is relatively simple even though it performs comparably to them as described in the paper. Also, adoption of such a reference counting GC makes programs written in Pen more portable even into WASM without any modification. This is because other GC methods commonly require full view of memory to track which memory locations are still reachable. Checking the entire memory including stacks at runtime requires target-specific logic or is impossible on some targets like WASM. I described more details on implementation and performance benchmarks of the Perceus reference counting algorithm in &lt;a href="https://dev.to/raviqqe/implementing-the-perceus-reference-counting-gc-5662"&gt;another post&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rust FFI
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://crates.io/crates/pen-ffi"&gt;A &lt;code&gt;pen-ffi&lt;/code&gt; crate&lt;/a&gt; is a Rust FFI library for Pen. Thanks to both languages' semantics based on ownerships, they can interoperate with each other very easily. The new standard packages included in this release like &lt;code&gt;Http&lt;/code&gt; and &lt;code&gt;Sql&lt;/code&gt; packages are actually simple wrappers of third-party crates in Rust. Using the Rust FFI library, you can also write your own packages in Pen while utilizing existing resources written in Rust and benefiting from its growing ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;go&lt;/code&gt; expression
&lt;/h3&gt;

&lt;p&gt;It's still experimental but we've introduced the first piece of parallel computation primitives, &lt;code&gt;go&lt;/code&gt; expression. By using the &lt;code&gt;go&lt;/code&gt; expression which originates from Go, you can delegate heavy computation, slow I/O, or any other computation you want to perform concurrently to other execution contexts like threads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;future = go \() number {
  # Some heavy computation...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Formatter
&lt;/h3&gt;

&lt;p&gt;Pen is now equipped with its official formatter command, &lt;code&gt;pen format&lt;/code&gt;. This command formats each source file or all in a package at once. It can also receive source from standard input and emit formatted one to standard output so that it integrates with editors and IDEs easily.&lt;/p&gt;

&lt;p&gt;Similarly to Go's &lt;code&gt;gofmt&lt;/code&gt; and differently from some other formatters like &lt;code&gt;rustfmt&lt;/code&gt; in Rust, the source formatter of Pen does not define any canonical form of source codes given the same sequences of tokens. But it rather tries to align indents, and add and remove spaces and newlines given hints extracted from the original source codes so that developers can still control their source codes to make them look readable in their own contexts. For more information, see &lt;code&gt;pen format --help&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Documentation generator
&lt;/h3&gt;

&lt;p&gt;The new release also includes a new &lt;code&gt;pen document&lt;/code&gt; command that generates documentation files of packages in Markdown. To generate the documentation, you can simply run &lt;code&gt;pen document&lt;/code&gt; with some options, such as a package name and URL, in a package directory. For more information, see &lt;code&gt;pen document --help&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;Http&lt;/code&gt; and &lt;code&gt;Sql&lt;/code&gt; standard packages
&lt;/h3&gt;

&lt;p&gt;In addition to new functions and types in the &lt;code&gt;Core&lt;/code&gt; and &lt;code&gt;Os&lt;/code&gt; standard packages, we've added &lt;code&gt;Http&lt;/code&gt; and &lt;code&gt;Sql&lt;/code&gt; standard packages. As their names suggest, the &lt;code&gt;Http&lt;/code&gt; package provides HTTP server and client logic, and the &lt;code&gt;Sql&lt;/code&gt; package provides &lt;code&gt;Sql&lt;/code&gt; client logic.&lt;/p&gt;

&lt;p&gt;Because those packages depend on the third-party crates (&lt;a href="https://github.com/hyperium/hyper"&gt;&lt;code&gt;hyper&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://github.com/launchbadge/sqlx"&gt;&lt;code&gt;sqlx&lt;/code&gt;&lt;/a&gt; respectively) in Rust, they are currently planned &lt;strong&gt;not&lt;/strong&gt; to be included the default installation bundle of the language. But they are likely to be separated into different Git repositories.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLVM upgrade to 14
&lt;/h3&gt;

&lt;p&gt;As I mentioned first, LLVM was upgraded to 14 in the new release finally, which fixed many bugs including the ones related to &lt;a href="https://github.com/raviqqe/llvm-tail-call-opt-bug"&gt;tail call optimization&lt;/a&gt;. Examples of fixed bugs are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Certain operations on the built-in map type led to segmentation faults.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pen test&lt;/code&gt; command failed on macOS when running multiple tests.&lt;/li&gt;
&lt;li&gt;macOS with M1 chips could not run binaries compiled by the compiler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, most programs written in Pen should work also on macOS with M1 chips as well as on x86-64 chips. There is a plan to support macOS on M1 chips as the first tier platform although we can't guarantee that Pen works properly there due to limitation of our CI infrastructure currently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;There are quite a few features planned for the next version of Pen including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/discussions/1083"&gt;Generic built-in functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/issues/444"&gt;Proper implementation of the C calling convention for FFI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;More functionalities in standard packages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The C calling convention is already implemented. However, the current naive implementation is inefficient and sometimes requires heap allocations. By implementing it properly, Pen can call C/Rust functions without unnecessary overheads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We've hit the great v0.4 milestone of Pen! It contains new syntax, standard packages, and other complementary functionalities. Those features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state-of-the-art reference counting GC&lt;/li&gt;
&lt;li&gt;Parallel computation primitive (&lt;code&gt;go&lt;/code&gt; expression)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Http&lt;/code&gt; and &lt;code&gt;Sql&lt;/code&gt; standard packages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thank you for reading this article! And if you are interested in &lt;a href="https://pen-lang.org"&gt;Pen&lt;/a&gt;, please &lt;a href="https://pen-lang.org/introduction/install.html"&gt;install&lt;/a&gt;, try it out, and give some feedback!&lt;/p&gt;

</description>
      <category>functional</category>
      <category>programming</category>
      <category>language</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Implementing the Perceus reference counting GC</title>
      <dc:creator>Yota Toyama</dc:creator>
      <pubDate>Fri, 24 Jun 2022 15:12:25 +0000</pubDate>
      <link>https://dev.to/raviqqe/implementing-the-perceus-reference-counting-gc-5662</link>
      <guid>https://dev.to/raviqqe/implementing-the-perceus-reference-counting-gc-5662</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Reference counting (RC) has rather been a minor party to the other garbage collection (GC) algorithms in functional programming in the last decades as, for example, &lt;a href="https://ocaml.org/"&gt;OCaml&lt;/a&gt; and &lt;a href="https://www.haskell.org/"&gt;Haskell&lt;/a&gt; use non-RC GC. However, several recent papers, such as &lt;a href="https://arxiv.org/abs/1908.05647"&gt;Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming&lt;/a&gt; and &lt;a href="https://www.microsoft.com/en-us/research/publication/perceus-garbage-free-reference-counting-with-reuse/"&gt;Perceus: Garbage Free Reference Counting with Reuse&lt;/a&gt;, showed the efficiency of highly optimized RC GC in functional languages with sacrifice or restriction of some language features like circular references. The latter paper introduced an efficient RC GC algorithm called Perceus which is basically &lt;em&gt;all-in-one&lt;/em&gt; RC.&lt;/p&gt;

&lt;p&gt;In this post, I describe my experience and some caveats about implementing and gaining benefits from the Perceus RC. I've been developing &lt;a href="https://github.com/pen-lang/pen"&gt;a programming language called Pen&lt;/a&gt; and implemented a large part of the Perceus RC there. I hope this post helps someone who is implementing the algorithm or even deciding if it's worth implementing it in their own languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Perceus
&lt;/h2&gt;

&lt;p&gt;The Perceus reference counting algorithm is a thread-safe ownership-based reference counting algorithm with several optimizations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heap reuse on data structure construction and deconstruction (pattern matching)&lt;/li&gt;
&lt;li&gt;Heap reuse specialization (in-place updates of data structures)&lt;/li&gt;
&lt;li&gt;Non-atomic operations or atomic operations with relaxed memory ordering for heap blocks not shared by multiple threads&lt;/li&gt;
&lt;li&gt;Borrow inference to reduce reference count operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By implementing all of those optimizations in &lt;a href="https://github.com/koka-lang/koka"&gt;the Koka programming language&lt;/a&gt;, they achieved GC overhead much less and execution time faster than the other languages including OCaml, Haskell, and even C++ in several algorithms and data structures that frequently keep common sub-structures of them, such as red-black trees. For more information, see &lt;a href="https://www.microsoft.com/en-us/research/publication/perceus-garbage-free-reference-counting-with-reuse/"&gt;the latest version of the paper&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the algorithm
&lt;/h2&gt;

&lt;p&gt;What I've implemented so far in Pen are two core functionalities of the Perceus algorithm:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In-place updates of records on heap

&lt;ul&gt;
&lt;li&gt;This corresponds to heap reuse specialization described above.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Relaxed atomic operations on references not shared by multiple threads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Due to some differences in language features between Koka and Pen, I needed to make some modifications to the algorithm. First, Pen doesn't need any complex algorithm for in-place record updates with heap reuse specialization because it has &lt;a href="https://pen-lang.org/references/language/types.html#records"&gt;syntax for record updates&lt;/a&gt;, and its lowered directly into its mid-level intermediate representation (MIR) where the RC algorithm is applied.&lt;/p&gt;

&lt;p&gt;Secondly, although I've also implemented generic reuse of heap blocks that matches their frees and allocations in functions initially, I've reverted it for now since I realized that it won't improve performance much in Pen because of the lack of pattern matching syntax with deconstruction and another optimization of small record unboxing planned to be implemented later. In addition, the implementation doesn't include borrow inference yet as it had the least contribution to the performance reported in a previous paper.&lt;/p&gt;

&lt;p&gt;The main part of the algorithm is implemented in the source files of a compiler itself and an FFI library in Rust listed below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/blob/d44df6d9cdcbe97fcdd5ac14c4de30f4897664ff/lib/mir-fmm/src/reference_count/pointer.rs"&gt;https://github.com/pen-lang/pen/blob/d44df6d9cdcbe97fcdd5ac14c4de30f4897664ff/lib/mir-fmm/src/reference_count/pointer.rs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/blob/d44df6d9cdcbe97fcdd5ac14c4de30f4897664ff/lib/ffi/src/arc/arc_block.rs"&gt;https://github.com/pen-lang/pen/blob/d44df6d9cdcbe97fcdd5ac14c4de30f4897664ff/lib/ffi/src/arc/arc_block.rs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Counting back synchronized references to 0
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;In this section, I use the term "synchronized" to mean "marked as shared by multiple threads." In Koka and Lean 4, they use the term "shared" to mean the same thing but I rephrased it to reduce confusion.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the Perceus reference counting GC, memory blocks have mainly two &lt;em&gt;un-synchronized&lt;/em&gt; and &lt;em&gt;synchronized&lt;/em&gt; states represented by positive and negative counts respectively. Heap blocks are &lt;em&gt;synchronized&lt;/em&gt; before they get shared with other threads and are never reverted back to &lt;em&gt;un-synchronized&lt;/em&gt; states once they get synchronized. But you may wonder if this is necessary or not. If we have a memory block with a reference count of 1, that also means it's not shared with any other threads anymore. So isn't it possible to use a common count value of 0 to represent unique references and reduce the overhead of some atomic operations potentially?&lt;/p&gt;

&lt;p&gt;The answer is no because in that case, we need to synchronize memory operations on those references &lt;em&gt;un-synchronized&lt;/em&gt; back with drops of those references by the other threads with release memory ordering. For example, let's consider a situation where a thread shares a reference with the other thread:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread A shares a reference with thread B.&lt;/li&gt;
&lt;li&gt;Some computation goes on...&lt;/li&gt;
&lt;li&gt;Thread B drops the reference.&lt;/li&gt;
&lt;li&gt;Thread A drops the reference &lt;strong&gt;and&lt;/strong&gt; frees its inner memory block.

&lt;ul&gt;
&lt;li&gt;Or, thread A reuses the memory block for heap reuse optimization mentioned in the earlier section.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So if references can be &lt;em&gt;un-synchronized&lt;/em&gt; back, we always need to use atomic operations with acquire memory ordering at the point (4) above to make all side effects performed by thread B at the point (3) visible for thread A. Otherwise, thread A might free or rewrite memory locations thread B is trying to read! So as a result, we are rather increasing the overhead of atomic operations for references never &lt;em&gt;synchronized&lt;/em&gt; before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefitting from the algorithm
&lt;/h2&gt;

&lt;p&gt;In general, to get the most out of heap reuse in the Perceus algorithm, we need to write codes so that data structures filled with old data are updated with a small portion of new data. Pen's compiler previously had a performance bug where a relatively old data structure was merged into a new one of the same type. As a result, the code to merge two pieces of data was taking almost double in time although the logic was semantically correct.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recursive data types
&lt;/h3&gt;

&lt;p&gt;When your language has record types and syntax for record field access, things might be a little complex. Let's think about the following pseudo code where we want to update a recursive data structure of type &lt;code&gt;A&lt;/code&gt; in place (The code is written in &lt;a href="https://elm-lang.org/"&gt;Elm&lt;/a&gt; but assume that we implemented it with Perceus.):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elm"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Maybe&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt;
  &lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt;

&lt;span class="c1"&gt;-- From here, assume that we are in a function scope rather than a module scope.&lt;/span&gt;
&lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt;
&lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;Nothing&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;bar&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;A&lt;/span&gt;
&lt;span class="n"&gt;bar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="c1"&gt;-- Here, we want to reuse a memory block of `foo`...&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt;
      &lt;span class="kt"&gt;Nothing&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;Nothing&lt;/span&gt;
      &lt;span class="c1"&gt;-- There are two references to `x` on evaluation of `f x` here!&lt;/span&gt;
      &lt;span class="kt"&gt;Just&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the line of &lt;code&gt;Just x -&amp;gt; f x&lt;/code&gt;, the program applies a function &lt;code&gt;f&lt;/code&gt; to a field value &lt;code&gt;x&lt;/code&gt; which originates from &lt;code&gt;foo&lt;/code&gt;. However, at this point of the function application, we are still keeping the record value &lt;code&gt;foo&lt;/code&gt; itself and the value of &lt;code&gt;x&lt;/code&gt; has two references! Therefore, heap reuse specialization (i.e. in-place record update) cannot be applied there. In order to update the value of &lt;code&gt;x&lt;/code&gt; in place instead, we need to rather deconstruct &lt;code&gt;foo&lt;/code&gt; into its inner values first as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elm"&gt;&lt;code&gt;&lt;span class="n"&gt;bar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;foo&lt;/span&gt;
  &lt;span class="k"&gt;in&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt;
        &lt;span class="kt"&gt;Nothing&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;Nothing&lt;/span&gt;
        &lt;span class="kt"&gt;Just&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
    &lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that, even if languages do not support record deconstruction, for self-recursive types, dropping fields containing the types themselves is possible in most cases in practice because otherwise such types' values cannot exist at runtime unless they are dynamically generated in functions or thunks in those fields.&lt;/p&gt;

&lt;p&gt;When I look at &lt;a href="https://koka-lang.github.io/koka/doc/book.html#sec-copying"&gt;the Koka's documentation&lt;/a&gt;, it seems to support record types but I couldn't find out how it handles this specific case yet. It's also an option to expose the compiler's details and allow annotations to enforce in-place updates for end-users while it might not be the best option in a long term.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarking
&lt;/h2&gt;

&lt;p&gt;Here, I'm excited to show some benchmark results and their improvements. Details of their configurations are in a section later. Note that Pen still lacks some basic optimizations to reduce heap allocations (e.g. lambda lifting, unboxing small values on heap.) So the eventual performance improvements by Perceus would be lower than those results.&lt;/p&gt;

&lt;p&gt;Since I've never implemented the other GC methods like mark-and-sweep for Pen before, this is not a comparison of RC GC vs non-RC GC but rather a proof of how performant traditional thread-safe RC can be adopting Perceus on functional programming languages.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Atomic RC (seconds)&lt;/th&gt;
&lt;th&gt;Perceus RC (seconds)&lt;/th&gt;
&lt;th&gt;Improvement (times)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Conway's game of life&lt;/td&gt;
&lt;td&gt;2.136&lt;/td&gt;
&lt;td&gt;1.142&lt;/td&gt;
&lt;td&gt;1.87&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hash map insertion&lt;/td&gt;
&lt;td&gt;0.909&lt;/td&gt;
&lt;td&gt;0.255&lt;/td&gt;
&lt;td&gt;3.57&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hash map update&lt;/td&gt;
&lt;td&gt;1.935&lt;/td&gt;
&lt;td&gt;0.449&lt;/td&gt;
&lt;td&gt;4.31&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Conway's game of life
&lt;/h4&gt;

&lt;p&gt;The implementation uses lists to represent a field and lives so that it causes many allocations and deallocations of memory blocks on heap.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Field size: 20 x 40&lt;/li&gt;
&lt;li&gt;Iterations: 100&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/tree/319d1b881dbd9b19407c1f9eed7a163253eca83b/examples/life-game"&gt;Implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Hash map insertion/update
&lt;/h4&gt;

&lt;p&gt;The map update benchmark includes the time taken by insertion for map initialization as well.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A number of entries: 100,000&lt;/li&gt;
&lt;li&gt;Key type: 64-bit floating point number&lt;/li&gt;
&lt;li&gt;Data structure: &lt;a href="https://en.wikipedia.org/wiki/Hash_array_mapped_trie"&gt;Hash-Array Mapped Trie (HAMT)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Implementation

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/tree/319d1b881dbd9b19407c1f9eed7a163253eca83b/benchmark/number-set"&gt;Insertion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pen-lang/pen/tree/319d1b881dbd9b19407c1f9eed7a163253eca83b/benchmark/number-set-update"&gt;Update&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In my experience so far, implementing the Perceus algorithm appears to be fairly straightforward compared with the other non-RC GC algorithms while there are a few stumbling blocks especially if you are not familiar with low-level concurrency and atomic instructions.&lt;/p&gt;

&lt;p&gt;I'm pretty happy having the algorithm implemented in my language and seeing it performing well despite its simple implementation. The Perceus RC can be a game changer in functional programming as it outperforms traditional GC on several common patterns in functional programming. However, it's definitely not for everyone and most likely affects language design.&lt;/p&gt;

&lt;p&gt;Finally, thanks for reading! I would appreciate your feedback on this post and &lt;a href="https://github.com/pen-lang/pen"&gt;the Pen programming language&lt;/a&gt;. The language's new release has been blocked by &lt;a href="https://github.com/Homebrew/homebrew-core/pull/97618"&gt;LLVM 14 adoption in Homebrew&lt;/a&gt; but the ticket had some progress in the last few weeks. So I can probably release v0.4 of it soon...&lt;/p&gt;

&lt;p&gt;Also, special thanks to the developers of &lt;a href="https://github.com/koka-lang/koka"&gt;Koka&lt;/a&gt; for answering my questions on the algorithm! That was really helpful!&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Raw benchmark results
&lt;/h3&gt;

&lt;p&gt;Environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;
Linux xenon 5.13.0-1033-gcp &lt;span class="c"&gt;#40~20.04.1-Ubuntu SMP Tue Jun 14 00:44:12 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; lscpu
Architecture:            x86_64
  CPU op-mode&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt;:        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
CPU&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt;:                  4
  On-line CPU&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt; list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel&lt;span class="o"&gt;(&lt;/span&gt;R&lt;span class="o"&gt;)&lt;/span&gt; Xeon&lt;span class="o"&gt;(&lt;/span&gt;R&lt;span class="o"&gt;)&lt;/span&gt; CPU @ 2.20GHz
Virtualization features:
  Hypervisor vendor:     KVM
Caches &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sum &lt;/span&gt;of all&lt;span class="o"&gt;)&lt;/span&gt;:
  L1d:                   64 KiB &lt;span class="o"&gt;(&lt;/span&gt;2 instances&lt;span class="o"&gt;)&lt;/span&gt;
  L1i:                   64 KiB &lt;span class="o"&gt;(&lt;/span&gt;2 instances&lt;span class="o"&gt;)&lt;/span&gt;
  L2:                    512 KiB &lt;span class="o"&gt;(&lt;/span&gt;2 instances&lt;span class="o"&gt;)&lt;/span&gt;
  L3:                    55 MiB &lt;span class="o"&gt;(&lt;/span&gt;1 instance&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; lsmem
Memory block size:       128M
Total online memory:       4G
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Conway's game of life
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; hyperfine &lt;span class="nt"&gt;-w&lt;/span&gt; 3 ./life-old ./life-new
Benchmark 1: ./life-old
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:      2.136 s ±  0.033 s    &lt;span class="o"&gt;[&lt;/span&gt;User: 1.699 s, System: 0.392 s]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:    2.097 s …  2.206 s    10 runs

Benchmark 2: ./life-new
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:      1.142 s ±  0.035 s    &lt;span class="o"&gt;[&lt;/span&gt;User: 1.087 s, System: 0.009 s]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:    1.102 s …  1.203 s    10 runs

Summary
  &lt;span class="s1"&gt;'./life-new'&lt;/span&gt; ran
    1.87 ± 0.06 &lt;span class="nb"&gt;times &lt;/span&gt;faster than &lt;span class="s1"&gt;'./life-old'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Hash map insertion
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; hyperfine &lt;span class="nt"&gt;-w&lt;/span&gt; 3 ./insert-old ./insert-new
Benchmark 1: ./insert-old
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:     909.0 ms ±  14.2 ms    &lt;span class="o"&gt;[&lt;/span&gt;User: 605.7 ms, System: 250.6 ms]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:   890.7 ms … 932.8 ms    10 runs

Benchmark 2: ./insert-new
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:     254.8 ms ±   5.1 ms    &lt;span class="o"&gt;[&lt;/span&gt;User: 189.1 ms, System: 15.0 ms]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:   247.0 ms … 263.4 ms    12 runs

Summary
  &lt;span class="s1"&gt;'./insert-new'&lt;/span&gt; ran
    3.57 ± 0.09 &lt;span class="nb"&gt;times &lt;/span&gt;faster than &lt;span class="s1"&gt;'./insert-old'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Hash map update
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; hyperfine &lt;span class="nt"&gt;-w&lt;/span&gt; 3 ./update-old ./update-new
Benchmark 1: ./update-old
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:      1.935 s ±  0.032 s    &lt;span class="o"&gt;[&lt;/span&gt;User: 1.405 s, System: 0.476 s]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:    1.879 s …  1.976 s    10 runs

Benchmark 2: ./update-new
  Time &lt;span class="o"&gt;(&lt;/span&gt;mean ± σ&lt;span class="o"&gt;)&lt;/span&gt;:     448.6 ms ±  14.4 ms    &lt;span class="o"&gt;[&lt;/span&gt;User: 381.5 ms, System: 15.1 ms]
  Range &lt;span class="o"&gt;(&lt;/span&gt;min … max&lt;span class="o"&gt;)&lt;/span&gt;:   430.9 ms … 484.3 ms    10 runs

Summary
  &lt;span class="s1"&gt;'./update-new'&lt;/span&gt; ran
    4.31 ± 0.16 &lt;span class="nb"&gt;times &lt;/span&gt;faster than &lt;span class="s1"&gt;'./update-old'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>functional</category>
      <category>koka</category>
      <category>lean</category>
    </item>
  </channel>
</rss>
