<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kojix2</title>
    <description>The latest articles on DEV Community by kojix2 (@kojix2).</description>
    <link>https://dev.to/kojix2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F129786%2F5f8821af-b2f8-4de9-8024-3a7be3c4cd16.png</url>
      <title>DEV Community: kojix2</title>
      <link>https://dev.to/kojix2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kojix2"/>
    <language>en</language>
    <item>
      <title>I Think Ruby Isn’t Dynamic Enough…</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 04 May 2026 14:58:40 +0000</pubDate>
      <link>https://dev.to/kojix2/i-think-ruby-isnt-dynamic-enough-1ga7</link>
      <guid>https://dev.to/kojix2/i-think-ruby-isnt-dynamic-enough-1ga7</guid>
      <description>&lt;p&gt;This article is a subjective personal essay, not a rigorous technical argument.&lt;/p&gt;

&lt;p&gt;For the past few years, I have become something of a Crystal believer. Crystal is a language that has achieved high performance by accepting strong constraints. Ruby, by contrast, is a language whose strength lies in dynamic qualities that Crystal cannot have. Looking at recent movements in Ruby from the perspective of a Crystal believer, I sometimes find myself thinking: “That is an area Crystal people have been digging into for years, and Ruby’s real strengths are not really there, are they…?” I have not been able to share this feeling with many people, which has been frustrating.&lt;/p&gt;

&lt;p&gt;I myself only really understand Ruby and Crystal, so I have not had much confidence in what I have been thinking, and have spent my time in a somewhat vague state. But if I do not write down these feelings, I will no longer be able to refer back to them, so I decided to summon some courage and write this personal essay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ruby Is Not Object-Oriented Enough Either
&lt;/h2&gt;

&lt;p&gt;The incompleteness I feel in Ruby is that many operations lack reversibility. You can define a variable, but it is difficult to cleanly delete it. There is &lt;code&gt;include&lt;/code&gt; for modules, but there is no &lt;code&gt;de-include&lt;/code&gt;. Mechanisms such as &lt;code&gt;remove_method&lt;/code&gt;, &lt;code&gt;remove_const&lt;/code&gt;, &lt;code&gt;undef_method&lt;/code&gt;, &lt;code&gt;UnboundMethod&lt;/code&gt;, and &lt;code&gt;define_method&lt;/code&gt; do exist, but there does not seem to be a consistent reversible model for taking methods or behavior out of one structure and safely transplanting them into another object structure.&lt;/p&gt;

&lt;p&gt;Ruby is considered a dynamic language, and it permits all kinds of changes at runtime. But that freedom seems to work strongly in the direction of “adding things later.” The freedom to remove what has been added, to decompose structures and reassemble them into another form, or to undo such changes, does not seem to have been systematized very much.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perhaps Ruby does not have enough of the qualities of a dynamic language.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ideally, I think it would be interesting if there were a Ruby implementation that, like machine learning, could be given input data and expected output, and then explore at the meta-level, at runtime, how to optimize its object structure. As a foundation for that, I imagine it would need mechanisms that allow it to observe, transform, and reconstruct its own objects. Although I do not know whether such a thing is truly possible.&lt;/p&gt;

&lt;p&gt;Even if something like that were realized, in practice it might end up being separated into two stages: “generation of object structures through learning or compilation,” and then “execution.” I feel that would be rather boring.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Hope to See from Ruby
&lt;/h2&gt;

&lt;p&gt;I am deeply absorbed in Crystal, and have drifted a little away from Ruby. There are several people like that. Seeing this, it would not be strange if some people thought Ruby should also become capable of doing more Crystal-like things.&lt;/p&gt;

&lt;p&gt;But what is actually needed is the opposite. Crystal has structural constraints that it simply cannot escape. Crystal is a language that achieves speed and low memory usage by placing strong constraints on Ruby. Since I am a Crystal believer, I think that if you want to make a language do Crystal-like things, Crystal is better at that. There is nothing interesting about Ruby trying to do the same thing. I want to see what only Ruby can do.&lt;/p&gt;

&lt;p&gt;Ruby is, compared with Crystal, a language used in industry, so I think there are ways in which it cannot move freely. A language that can freely transform the structure of objects at runtime would be dangerous and would probably not be welcomed by industry. Still, isn’t it strange that, among mainstream languages, Ruby is treated almost as if it were the most dynamic language? I cannot shake the feeling that there remains a vast frontier in the world of languages even more dynamic than Ruby.&lt;/p&gt;

&lt;p&gt;I hope that someday I will see an attempt to expand the very world of programming itself into an even more dynamic realm.&lt;/p&gt;




&lt;p&gt;This post was machine-translated from Japanese into English using ChatGPT.&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>crystal</category>
    </item>
    <item>
      <title>Porting Libraries to Crystal with AI</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Sat, 02 May 2026 06:40:43 +0000</pubDate>
      <link>https://dev.to/kojix2/porting-libraries-to-crystal-with-ai-1kl</link>
      <guid>https://dev.to/kojix2/porting-libraries-to-crystal-with-ai-1kl</guid>
      <description>&lt;p&gt;This post was originally written in Japanese and translated into English by the author using ChatGPT. The original post is available &lt;a href="https://qiita.com/kojix2/items/5680eb80af52a299763c" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;AI coding tools have become much better, and I now write code by hand far less often than before. I use GitHub Copilot in VSCode through the OSS benefit program, and recently I have also been trying Claude Code and Codex.&lt;/p&gt;

&lt;h2&gt;
  
  
  Porting libraries to the Crystal language
&lt;/h2&gt;

&lt;p&gt;Compared with other programming languages, Crystal does not have a very large library ecosystem.&lt;/p&gt;

&lt;p&gt;However, many open source libraries have licenses that allow porting, and AI can provide a lot of help with porting work. Because of this, when a library is missing, it is becoming realistic to consider porting as an option.&lt;/p&gt;

&lt;p&gt;In this article, I want to put into words and record what I actually do when porting libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing a reference library
&lt;/h2&gt;

&lt;p&gt;The first step is to choose the library to use as a reference.&lt;/p&gt;

&lt;p&gt;In Crystal, compared with C or Go, a higher level of abstraction is often expected. On the other hand, Crystal cannot use metaprogramming as freely as Ruby or Python.&lt;/p&gt;

&lt;p&gt;In that sense, Crystal is a language somewhere in the middle, and it is not always possible to choose a single reference library. Sometimes it is necessary to look at multiple libraries and think about what would be best. In my case, I often look at active Rust and Go projects, while using Ruby and Python APIs as references. When the reference is written in Rust, binding to it may sometimes be better than porting it.&lt;/p&gt;

&lt;p&gt;That kind of separation is the first step. If I decide that porting is the right approach, I move on. There may also be cases where porting and bindings are mixed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checking the license
&lt;/h2&gt;

&lt;p&gt;The most important point is the license. I check whether the original library is under a license that allows porting, such as MIT, BSD, Apache-2.0, or another license whose conditions I can comply with. I try to make the new library inherit the license terms of the original.&lt;/p&gt;

&lt;p&gt;I also clearly state where the original project came from. In my case, I add the reference repository as a whole using &lt;code&gt;git submodule&lt;/code&gt;. This also fixes the version of the code that Coding Agents refer to. It helps avoid unnecessary misunderstandings and trouble.&lt;/p&gt;

&lt;p&gt;When creating a final "tool", it may be reasonable to have multiple reference sources. But when creating a "library" by porting or binding, I think it is easier to maintain the project later if the main reference source is kept to one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making an overall plan with the web version of ChatGPT
&lt;/h2&gt;

&lt;p&gt;First, I upload an archive of the source library, either a tarball or a zip file, to ChatGPT and ask for an overall policy for porting the whole library to Crystal. I upload an archive simply because there is a limit on the number of files that can be uploaded.&lt;/p&gt;

&lt;p&gt;Why do I start with the web version of ChatGPT instead of a Coding Agent?&lt;/p&gt;

&lt;p&gt;The reason is based on experience: I tend to get better results when I first discuss the whole thing with the web version. This is only a guess, but I think there are probably two reasons.&lt;/p&gt;

&lt;p&gt;The first is the efficient use of internet search. Compared with local Coding Agents, the web version of ChatGPT is better at search. It can look through websites, technical blogs, and GitHub, and explore the policy from a wider point of view. When something is vague or unknown, it can search on its own and refine the plan.&lt;/p&gt;

&lt;p&gt;The second reason is probably that the cloud environment has more efficient access to documentation. Compared with searching a codebase locally while spending tokens, searching code in the cloud seems to work better, at least in my experience.&lt;/p&gt;

&lt;p&gt;Once the overall policy is produced, I ask additional questions about unclear points or my own requirements. At this stage, it may be useful to limit the amount of information by saying something like, "Please answer in three lines or less," so that ChatGPT's responses do not become too long or go off on a tangent.&lt;/p&gt;

&lt;p&gt;When building a Crystal library as a binding to a static language such as C, C++, or Rust, it is important to discuss the boundary of the binding, the build system, and the infrastructure, and to make sure the understanding is aligned.&lt;/p&gt;

&lt;p&gt;The desirable Crystal API design is often different from the language of the original library, such as Rust or Go. In such cases, I may also upload Ruby libraries as reference material and ask what kind of API would be desirable. However, Crystal is not Ruby, so simply making the API the same as Ruby is not necessarily the right answer. If I really want to make something good, I have to check it myself.&lt;/p&gt;

&lt;p&gt;Once the architecture and design are agreed on, I ask ChatGPT to write it down as a self-contained and executable plan, usually in a file such as &lt;code&gt;PLAN.md&lt;/code&gt;. I often have &lt;code&gt;PLAN.md&lt;/code&gt; written in Japanese so that I can read it easily.&lt;/p&gt;

&lt;p&gt;However, &lt;code&gt;PLAN.md&lt;/code&gt; may contain rough notes, wrong assumptions, or unorganized working context. For that reason, I usually do not publish this file as-is. Instead, I try to preserve important design decisions and caveats in a form that can be read later, such as in the README, issues, or commit messages.&lt;/p&gt;

&lt;p&gt;That said, if the tool is not just for myself and I want many people to use it, I may ask for &lt;code&gt;PLAN.md&lt;/code&gt; to be written in English from the beginning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing the porting work locally
&lt;/h2&gt;

&lt;p&gt;Because I want to make use of Copilot's free quota, I use VSCode locally. Recently, however, CLI agents have also become more capable, and it is becoming possible to use editors of one's choice, such as Zed.&lt;/p&gt;

&lt;p&gt;First, I initialize the project repository.&lt;/p&gt;

&lt;p&gt;I decide the Crystal project name, create a skeleton with &lt;code&gt;crystal init lib piyo&lt;/code&gt;, add the reference repository as a submodule, and place &lt;code&gt;PLAN.md&lt;/code&gt; in the repository.&lt;/p&gt;

&lt;p&gt;I still do not know whether &lt;code&gt;PLAN.md&lt;/code&gt; should be made more like a TODO list.&lt;/p&gt;

&lt;p&gt;I do not want to lose the work, so I create a new private repository on GitHub and prepare to push the project there. After that, I ask the Coding Agent to proceed with the project while looking at &lt;code&gt;PLAN.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;During the work cycle, I periodically run &lt;code&gt;crystal tool format&lt;/code&gt; for formatting and &lt;code&gt;crystal spec&lt;/code&gt; for tests.&lt;/p&gt;

&lt;p&gt;In my case, I run the linter, &lt;code&gt;ameba --no-color&lt;/code&gt;, by myself later, and then pass the result to the Agent and ask it to fix the issues in a batch.&lt;/p&gt;

&lt;p&gt;I also periodically ask it to review the plan, update it, or delete parts that have been completed.&lt;/p&gt;

&lt;p&gt;Coding agents make rapid progress at first, but at some point they often start taking very small steps, and the work stops moving forward.&lt;/p&gt;

&lt;p&gt;In such cases, I stop the agent for a while and ask questions such as: "What is the current problem?", "Are there places that should be refactored before continuing?", "Is there anything unrealistic in the plan itself?", or "Is there anything you want me to do, such as setting up the environment?"&lt;/p&gt;

&lt;p&gt;In some cases, I make a tarball of the current state, upload it to ChatGPT again, ask it to evaluate the whole situation from a broader perspective, and have it create &lt;code&gt;PLAN2.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;However, this kind of workflow is only possible because I am working alone and will eventually publish everything on GitHub. I do not think this is a possible workflow when multiple people are coding for work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing tests with fixtures
&lt;/h2&gt;

&lt;p&gt;Some reference libraries provide fixtures for tests. When the original repository is added as a submodule, those fixtures can also be referenced, so they can be used in the Crystal-side tests as well.&lt;/p&gt;

&lt;p&gt;However, my honest impression is that aiming for parity is not easy, because of differences caused by floating point errors, random numbers, race conditions, and so on.&lt;/p&gt;

&lt;p&gt;To absorb differences in random numbers, one possible method is to prepare a small piece of code in the original language that generates random numbers, save those generated numbers as a kind of fixture in JSON or another format, and use those values in Crystal tests. This method does not always work, but there are cases where it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding GitHub Actions
&lt;/h2&gt;

&lt;p&gt;Once the project has taken some shape, I add several GitHub Actions workflows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docs.yml&lt;/code&gt; for generating documentation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;build.yml&lt;/code&gt; for building and releasing CLI tools&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;test.yml&lt;/code&gt; for tests&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dependabot.yml&lt;/code&gt;, or Renovate, for updating GitHub Actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I end up adding these almost every time. In my case, I have a toy project called &lt;a href="https://github.com/kojix2/lolcat.cr" rel="noopener noreferrer"&gt;lolcat.cr&lt;/a&gt;, and I often copy workflows from it and then modify them.&lt;/p&gt;

&lt;p&gt;After adding them, I adjust things until the tests and builds pass, and then add badges to the README.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping README.md simple
&lt;/h2&gt;

&lt;p&gt;This depends on the purpose. In my case, most of the AI-ported libraries I create are for my own use. Compared with the time when I wrote libraries by hand, my feeling has changed a little: I do not really think that I want the library to become widely used. Maybe I instinctively want to avoid losing time to maintenance or spending more money on tokens.&lt;/p&gt;

&lt;p&gt;If README.md is too decorated, it starts to look like it was made by AI. A project with a gorgeous README but no maintenance feels, to me, like the ruins of a theme park with no afterglow. It does not leave a good impression.&lt;/p&gt;

&lt;p&gt;So I usually ask for README.md to be written in a "simple, plain, and purely practical" style.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deciding the granularity of commits
&lt;/h2&gt;

&lt;p&gt;As a general principle, in order to run a trustworthy project, it is desirable to avoid &lt;code&gt;force push&lt;/code&gt; and leave a transparent commit log. This is especially true when developing together in an open source community. Through pull requests and reviews, you can interact with people around the world and come to know what kind of people they are. I think this is one of the pleasures of participating in an open source community.&lt;/p&gt;

&lt;p&gt;However, best practices for Git workflows in solo personal development that depends heavily on AI have not yet been established. It is necessary to record why a certain piece of code was included, but I think it is better to lean on Git than to attach a separate memory system only for AI.&lt;/p&gt;

&lt;p&gt;That said, Git is chronological. When reordering commits, even if the final state of the code is the same, &lt;code&gt;force push&lt;/code&gt; becomes necessary. In the AI era, I feel that we may need a version control method based on semantics, one that can rebase without depending so strongly on chronological order.&lt;/p&gt;

&lt;p&gt;For now, personally, I ask AI to write commit messages, and then I manually commit them myself.&lt;/p&gt;

&lt;p&gt;I think this works as a minimal check to confirm that the human goal and the AI's work target are aligned. But there is also criticism that this is laundering or hiding AI-written code as if it had been written by a human, and I do not think it can be called a best practice.&lt;/p&gt;

&lt;p&gt;Embarrassingly, I also use &lt;code&gt;force push&lt;/code&gt; a lot.&lt;/p&gt;

&lt;p&gt;Especially in the early stage of a private repository where I am progressing through a plan, I repeatedly use &lt;code&gt;--amend&lt;/code&gt; and &lt;code&gt;force push&lt;/code&gt;, effectively using Git as a kind of "overwrite save" without leaving much history.&lt;/p&gt;

&lt;p&gt;Of course, this is mainly about private repositories before publication. The same thing should not be done on a shared public branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;What I have written here reflects the situation as of April 2026.&lt;/p&gt;

&lt;p&gt;I hope that when I reread this later, I will be able to think, "So that was what things were like at that time."&lt;/p&gt;

&lt;p&gt;Coding Agents have made it possible to ask for explanations of algorithms and ideas that were previously difficult to understand, at any desired level of detail. They have also made it easy to quickly implement and try out ideas that come to mind. This really is revolutionary.&lt;/p&gt;

&lt;p&gt;At the same time, I sometimes get too absorbed in AI coding, work for too many hours, and feel that it may harm my health. I think I need to be a little careful about that. Health comes first.&lt;/p&gt;

&lt;p&gt;At the beginning, I wrote that this article was written by hand, not by AI. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note added for the English version: This refers to the original Japanese version. This English translation was made with ChatGPT.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wrote that on purpose because these days I often generate text with AI too, and I wanted to deliberately do something different this time.&lt;/p&gt;

&lt;p&gt;That is all for this article.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Why Is Crystal Compilation So Slow?</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 08 Dec 2025 13:30:24 +0000</pubDate>
      <link>https://dev.to/kojix2/why-is-crystal-compilation-so-slow-29n0</link>
      <guid>https://dev.to/kojix2/why-is-crystal-compilation-so-slow-29n0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The Crystal programming language is notorious for its slow compilation times.&lt;/p&gt;

&lt;p&gt;But have you ever wondered where Crystal actually spends most of its compilation time?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sxp3hrdnh2o9yq4ez9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sxp3hrdnh2o9yq4ez9l.png" alt="overview" width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: Crystal uses LLVM as its backend&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crystal Compilation Pipeline
&lt;/h2&gt;

&lt;p&gt;The Crystal compiler's compilation process consists of the following stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;new_program&lt;/strong&gt; - Creating the program object&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parse&lt;/strong&gt; - Lexical analysis and parsing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;semantic&lt;/strong&gt; - Semantic analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;codegen&lt;/strong&gt; - Generating object files
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;Crystal&lt;/span&gt;
  &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Compiler&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Source&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;output_filename&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Result&lt;/span&gt;
      &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_a?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="c1"&gt;# 1 new_program&lt;/span&gt;
      &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_program&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="c1"&gt;# 2 parse&lt;/span&gt;
      &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;

      &lt;span class="c1"&gt;# 3 semantic&lt;/span&gt;
      &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;semantic&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;cleanup: &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;no_cleanup?&lt;/span&gt;

      &lt;span class="c1"&gt;# 4 codegen&lt;/span&gt;
      &lt;span class="n"&gt;units&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;codegen&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_filename&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="vi"&gt;@no_codegen&lt;/span&gt;

      &lt;span class="c1"&gt;# 5 cleanup&lt;/span&gt;
      &lt;span class="c1"&gt;# ... omission ...&lt;/span&gt;
      &lt;span class="no"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, linking is performed by the standard linker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Command-Line Options for Compilation Statistics
&lt;/h3&gt;

&lt;p&gt;Crystal provides a command-line option that displays compilation time statistics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build &lt;span class="nt"&gt;-s&lt;/span&gt; hoge.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, this method doesn't show the execution time of native LLVM functions, which was insufficient for this article's investigation.&lt;/p&gt;

&lt;p&gt;To get to the heart of the matter, I used &lt;a href="https://gist.github.com/kojix2/bf758a30ded3ea9aff9d3151df6a59c1" rel="noopener noreferrer"&gt;print debugging&lt;/a&gt; to measure the compilation time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Native LLVM Functions Called During Codegen
&lt;/h3&gt;

&lt;p&gt;During the codegen stage, the following native LLVM functions are called:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;LibLLVM.run_passes&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Applies optimization passes to LLVM IR&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;LibLLVM.target_machine_emit_to_file&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Generates object files&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I measured the execution time of these functions using &lt;a href="https://gist.github.com/kojix2/bf758a30ded3ea9aff9d3151df6a59c1" rel="noopener noreferrer"&gt;print debugging&lt;/a&gt; as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Here are the results from compiling the Crystal compiler itself:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Time (seconds)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;new_program&lt;/td&gt;
&lt;td&gt;0.000388207&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;parse&lt;/td&gt;
&lt;td&gt;0.000065000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;semantic&lt;/td&gt;
&lt;td&gt;12.552620028&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;codegen&lt;/td&gt;
&lt;td&gt;355.245409133&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- LibLLVM.run_passes&lt;/td&gt;
&lt;td&gt;252.340241198&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- LibLLVM.target_machine_emit_to_file&lt;/td&gt;
&lt;td&gt;93.280652845&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cleanup&lt;/td&gt;
&lt;td&gt;0.000013180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;total&lt;/td&gt;
&lt;td&gt;367.798495548&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let me visualize this with a bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb6g781iellwm49rnit0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb6g781iellwm49rnit0.png" alt="Compilation time breakdown" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;NOTE: This graph is from the original article and may differ slightly from the latest compiler.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Were the results what you expected?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lexical analysis and parsing take virtually no time!&lt;/li&gt;
&lt;li&gt;Semantic analysis (including type inference) also takes relatively little time!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In fact, the vast majority of the compilation time is spent in codegen, specifically in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;LibLLVM.run_passes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LibLLVM.target_machine_emit_to_file&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are external LLVM function calls that happen outside of Crystal's control!&lt;/p&gt;

&lt;p&gt;In this case of building the Crystal compiler itself with &lt;code&gt;--release&lt;/code&gt;, the majority of compilation time was spent on LLVM optimization and code generation.&lt;/p&gt;

&lt;p&gt;This might be a somewhat surprising result, don't you think?&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Speed Up the Crystal Compiler
&lt;/h3&gt;

&lt;p&gt;The parts of the Crystal compiler implemented in Crystal—namely lexical analysis, parsing, and semantic analysis—are already sufficiently fast. This means that to achieve further speedups, we would need hardcore approaches such as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Introducing parallelization even in release builds&lt;/li&gt;
&lt;li&gt;Optimizing LLVM itself (specifically for Crystal)&lt;/li&gt;
&lt;li&gt;Improving Crystal to generate LLVM IR that's easier for LLVM to process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, since these approaches aren't very practical for everyday use, let me introduce a more accessible method:&lt;/p&gt;

&lt;h3&gt;
  
  
  Use &lt;code&gt;-O3&lt;/code&gt; Instead of &lt;code&gt;--release&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;In the Crystal compiler, specifying &lt;code&gt;--release&lt;/code&gt; is equivalent to specifying both &lt;code&gt;-O3&lt;/code&gt; and &lt;code&gt;--single-module&lt;/code&gt;. If you're willing to sacrifice some optimization, you can specify only the &lt;code&gt;-O3&lt;/code&gt; option, which enables parallelization and can speed up compilation in many cases.&lt;/p&gt;

&lt;p&gt;From here on, there's a bit of a speculative element to the discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Crystal Doesn't Have Incremental Compilation or Shared Library Support
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Crystal's &lt;code&gt;--release&lt;/code&gt; Mode Includes &lt;code&gt;--single-module&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Crystal struggles with splitting code into separate compilation units and reusing the results. In particular, &lt;code&gt;--release&lt;/code&gt; builds enable &lt;code&gt;--single-module&lt;/code&gt;, which compiles everything into one massive LLVM module for optimization.&lt;/p&gt;

&lt;p&gt;For comparison, Rust performs separate compilation for each crate even with &lt;code&gt;--release&lt;/code&gt;. In Rust, you need to explicitly use &lt;code&gt;-C lto=fat&lt;/code&gt; to get behavior similar to Crystal's, where the entire LLVM IR is optimized together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crystal's Weak Caching Mechanism
&lt;/h3&gt;

&lt;p&gt;Crystal does have a mechanism that caches LLVM bitcode files (.bc) and object files on a per-type basis during normal builds, and can reuse object files only when the bitcode is completely unchanged.&lt;/p&gt;

&lt;p&gt;This allows the compiler to skip the expensive object file generation step in some cases.&lt;/p&gt;

&lt;p&gt;However, even in such cases, lexical analysis, parsing, and semantic analysis cannot be skipped. The comparison only happens after generating .bc files. And as we'll discuss later, cases where the bitcode is completely unchanged are actually quite rare.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crystal Is a Statically-Typed Language Where the Caller Determines Types
&lt;/h3&gt;

&lt;p&gt;Why can't Crystal split packages into multiple LLVM IR modules, precompile them, and reuse the results?&lt;/p&gt;

&lt;p&gt;The main reason is that Crystal has strong type inference and union types, and the concrete types of methods change depending on the calling context.&lt;/p&gt;

&lt;p&gt;Crystal is an unusual statically-typed language where &lt;strong&gt;the caller determines the types&lt;/strong&gt;, enabling duck typing. However, the trade-off is that type signatures need to be inferred with every compilation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Type IDs Change with Each Compilation
&lt;/h3&gt;

&lt;p&gt;The Crystal compiler assigns a number to every class to resolve types. With each compilation, every type that appears gets assigned a "number." Let's say class A gets assigned the number "10" in one compilation. If you make a small change to the code and recompile, "10" might be assigned to a different class. Linking object files created this way causes type inconsistencies and fails, because conditional branches based on types won't work correctly.&lt;/p&gt;

&lt;p&gt;Additionally, when loading multiple Crystal shared libraries simultaneously, there's the problem of runtime functions being multiply defined.&lt;/p&gt;

&lt;p&gt;This makes it difficult for Crystal to split code into parts, precompile them, and reuse them later.&lt;/p&gt;

&lt;p&gt;But is this an inherent characteristic of the Crystal language? Let's consider this from a more social context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crystal Language Community and Resource Constraints
&lt;/h2&gt;

&lt;p&gt;Crystal is known as a language with Ruby-like concise syntax that delivers excellent performance.&lt;/p&gt;

&lt;p&gt;However, the Crystal development team has limited resources. While there is a dedicated team at Manas.Tech and community contributors worldwide, the resources are still limited compared to large corporations.&lt;/p&gt;

&lt;p&gt;For instance, imagine if Apple were developing Crystal.&lt;/p&gt;

&lt;p&gt;Apple engineers might make changes to clang/LLVM itself to significantly improve compilation speed.&lt;/p&gt;

&lt;p&gt;Or, like Swift, they might define a proper ABI and create an intermediate language or binary format well-suited to Crystal. Similar to how Swift has SIL (Swift Intermediate Language) as an intermediate representation before converting to LLVM IR, Crystal could have its own optimized intermediate language. This would enable comparing modules at that stage, resolving types, and generating object files from there. (Though I'm not entirely sure if this is possible within the LLVM framework.)&lt;/p&gt;

&lt;p&gt;However, the Crystal compiler we have isn't like that. It generates monolithic, massive LLVM IR and delegates all optimization to LLVM. For package management, downloading source code directly from GitHub is the mainstream approach.&lt;/p&gt;

&lt;p&gt;There still seems to be room for improvement.&lt;/p&gt;

&lt;p&gt;The characteristic of slow compilation but fast execution is not purely a linguistic characteristic of Crystal, but also stems from the resource constraints of the Crystal development team. In other words, if significant resources were invested in development in the future, these issues could potentially be improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Designing an ABI specification or intermediate language for Crystal is extremely difficult. However, if someone achieves this, it could become Crystal 2.0 or Crystal 3.0.&lt;/p&gt;

&lt;p&gt;Even without going that far, finding ways to split the generated LLVM IR into multiple modules, or mangling function names and global variables, would represent significant progress.&lt;/p&gt;

&lt;p&gt;Crystal doesn't have as vibrant a library ecosystem as some other languages. While the reasons aren't entirely clear, as we improve the environment for code reuse, techniques for improving compilation speed may also develop.&lt;/p&gt;

&lt;p&gt;That's all for this article. Thank you for reading to the end!&lt;/p&gt;




&lt;p&gt;This article was originally written in 2024 and revised in December 2025. It was translated from Japanese to English using Claude Sonnet.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>A Practical Guide to Parallel Programming in Crystal (2025)</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 21 Nov 2025 07:24:48 +0000</pubDate>
      <link>https://dev.to/kojix2/a-practical-guide-to-parallel-programming-in-crystal-2025-1lbg</link>
      <guid>https://dev.to/kojix2/a-practical-guide-to-parallel-programming-in-crystal-2025-1lbg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is based on content created by kojix2 (a human) alternately calling DeepWiki and ChatGPT, but kojix2 (a human) has reviewed, edited, and proofread the entire text. The article was translated from Japanese to English using Claude. If you find any mistakes, &lt;strong&gt;please comment&lt;/strong&gt;. Thank you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Crystal's parallel processing is based on a hybrid model that primarily uses &lt;strong&gt;Fiber (cooperative and lightweight)&lt;/strong&gt; and utilizes &lt;strong&gt;Thread (OS threads)&lt;/strong&gt; when necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ExecutionContext&lt;/strong&gt;, which has been rapidly developed since around 2024-2025, provides a new abstraction layer for safely spreading Fibers across multiple threads.&lt;/p&gt;

&lt;p&gt;This article organizes the latest parallel execution model in Crystal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building with Parallel Execution Enabled
&lt;/h2&gt;

&lt;p&gt;As of November 19, 2025, you need to use the following two flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-Dpreview_mt&lt;/code&gt;: Enables parallel execution of Fibers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-Dexecution_context&lt;/code&gt;: Enables the use of ExecutionContext
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build &lt;span class="nt"&gt;-Dpreview_mt&lt;/span&gt; &lt;span class="nt"&gt;-Dexecution_context&lt;/span&gt; program.cr 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While Crystal's parallel execution is in preview, it has been over 6 years since its release and works without issues in many cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Crystal's Concurrency and Parallelism
&lt;/h2&gt;

&lt;p&gt;Crystal has five major execution models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Execution Unit&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fiber (default)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber (lightweight thread)&lt;/td&gt;
&lt;td&gt;Cooperative, automatic switching on I/O, lightweight&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Concurrent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber group&lt;/td&gt;
&lt;td&gt;Sequential execution on 1 thread (concurrent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Parallel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber group&lt;/td&gt;
&lt;td&gt;Execution on multiple threads (parallel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Isolated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 Fiber + 1 dedicated thread&lt;/td&gt;
&lt;td&gt;For GUI loops and blocking FFI calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Thread&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OS thread&lt;/td&gt;
&lt;td&gt;For handling low-level operations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The standard design is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Fiber&lt;/strong&gt; as the basis&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;ExecutionContext&lt;/strong&gt; only where parallelism is needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cooperative Scheduling of Fiber and I/O
&lt;/h2&gt;

&lt;p&gt;Fiber is a cooperative execution model that has existed for a while. By default (when parallel execution is disabled), switching occurs only when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;I/O&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sleep&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Channel&lt;/code&gt; &lt;code&gt;receive/send&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Fiber.yield&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are triggered. (&lt;code&gt;Fiber.suspend&lt;/code&gt; is called and the Fiber is suspended.)&lt;/p&gt;

&lt;p&gt;The basic approach in Crystal is to put I/O-bound processing on Fibers.&lt;/p&gt;

&lt;p&gt;Each Fiber has its own stack memory. The stack has a virtual size of 8MiB, but it's only reserved, and actual memory usage starts from 4KiB.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a "Stack" in Crystal?
&lt;/h3&gt;

&lt;p&gt;When reading Crystal documentation, you'll encounter the word "stack." Note that this differs from the general meaning of "stack" - it refers to a "memory region that behaves like a stack," which is actually memory allocated from the OS heap.&lt;/p&gt;

&lt;p&gt;What is placed on the stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Value types: &lt;code&gt;Struct&lt;/code&gt;, &lt;code&gt;Tuple&lt;/code&gt;, &lt;code&gt;StaticArray&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Primitive types: &lt;code&gt;Int32&lt;/code&gt;, &lt;code&gt;Float64&lt;/code&gt;, &lt;code&gt;Bool&lt;/code&gt;, &lt;code&gt;Char&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Pointers to reference types: &lt;code&gt;Array&lt;/code&gt;, &lt;code&gt;Hash&lt;/code&gt;, etc. (The reference type objects themselves are placed on the heap, but the pointers to them are placed on the stack)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Values placed on the stack are not directly targeted by GC, but they are scanned during GC execution to prevent heap objects referenced by stack variables from being mistakenly collected.&lt;/p&gt;

&lt;p&gt;As described later, the key point is that when captured by closures like &lt;code&gt;spawn do end&lt;/code&gt;, the above value types are exceptionally placed on the heap and become accessible from other threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background Knowledge: Thread / Scheduler / Fiber
&lt;/h2&gt;

&lt;p&gt;In Crystal, each thread has its own &lt;code&gt;Crystal::Scheduler&lt;/code&gt; that manages the fibers to be executed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Thread Creation and Initialization
&lt;/h3&gt;

&lt;p&gt;The main thread is automatically created by the OS when the program starts. Subsequently, when &lt;code&gt;Thread.current&lt;/code&gt; is called, a Thread object for the main thread is created. The stack address of the main thread is obtained with the &lt;code&gt;stack_address&lt;/code&gt; method. This is the actual thread stack allocated by the OS when the process starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Fiber Creation
&lt;/h3&gt;

&lt;p&gt;When the &lt;code&gt;Thread&lt;/code&gt; object is initialized, the main Fiber is created simultaneously. The main Fiber uses a special constructor &lt;code&gt;Fiber.new(stack : Void*, thread)&lt;/code&gt; to utilize the OS thread stack. Unlike normal Fibers, &lt;code&gt;makecontext&lt;/code&gt; is not called, and it uses the already running context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lazy Initialization of Scheduler
&lt;/h3&gt;

&lt;p&gt;The main thread's scheduler is initialized when &lt;code&gt;Thread#scheduler&lt;/code&gt; is called. The scheduler has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@event_loop&lt;/code&gt;: Platform-specific event loop&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@stack_pool&lt;/code&gt;: Fiber stack reuse pool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@runnable&lt;/code&gt;: Queue of runnable fibers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@main&lt;/code&gt;: Thread's main fiber&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Default Thread Configuration
&lt;/h3&gt;

&lt;p&gt;Without using &lt;code&gt;ExecutionContext&lt;/code&gt; and &lt;code&gt;preview_mt&lt;/code&gt;, only the main thread exists. The main thread has its own &lt;code&gt;Crystal::Scheduler&lt;/code&gt; instance, which manages all fibers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stack Allocation for New Fibers
&lt;/h3&gt;

&lt;p&gt;When a new Fiber is created, stack memory is obtained from Fiber::StackPool. When a Fiber terminates, its stack is returned to the pool through StackPool.release for reuse by the next Fiber. Stack allocation reserves 8MiB of virtual address space. Only the bottom page of the stack (4KiB) is committed to physical memory. When the stack grows and reaches a guard page, that page's guard status is removed and a new guard page is committed. This continues until reserved pages run out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel Execution with ExecutionContext
&lt;/h2&gt;

&lt;p&gt;ExecutionContext is a "virtual thread group" that executes Fibers together.&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Concurrent
&lt;/h3&gt;

&lt;p&gt;This is the same concurrent execution as traditional Fibers. It's safe and easy to handle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Concurrent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"workers"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Only one Fiber executes at a time&lt;/strong&gt; within the context&lt;/li&gt;
&lt;li&gt;Therefore, access contention to shared variables doesn't occur (however, using Mutex/Atomic is considered safer as "recommended safety")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Suitable when parallelization is unnecessary but you want to use Fibers.&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Parallel
&lt;/h3&gt;

&lt;p&gt;Parallel execution on multiple threads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"workers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Changing parallel size during execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Each thread runs its own scheduler

&lt;ul&gt;
&lt;li&gt;The scheduler is an instance of the &lt;code&gt;Fiber::ExecutionContext::Parallel::Scheduler&lt;/code&gt; class, responsible for executing individual Fibers. It has a local queue and manages runnable Fibers. It searches for and executes Fibers in the main loop (run_loop).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Fibers within the context are moved to and executed on arbitrary threads

&lt;ul&gt;
&lt;li&gt;When a Fiber moves between threads, only the execution context (registers and stack pointer) actually moves. The Fiber's stack memory (heap from the OS perspective) does not move. This memory region is fixed during the Fiber's lifetime. When a Fiber resumes on a new thread, the saved stack pointer is loaded and points to the original stack memory region.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Due to parallelism, &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; is mandatory for shared mutable state.

&lt;ul&gt;
&lt;li&gt;Local variables and instance variables (pointers) captured from the closure that spawns the Fiber are placed in a closure data structure allocated on the heap, and that pointer moves with the Fiber. This means that value type local variables (like StaticArray) that would normally be allocated on the stack are exceptionally allocated on the heap.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Parallel is the central feature of Crystal's goal of "safe and fast parallel execution."&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Isolated
&lt;/h3&gt;

&lt;p&gt;1 Fiber = 1 dedicated thread&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;gui&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Isolated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"GUI"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;Gtk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="n"&gt;gui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;A single Fiber monopolizes an OS thread&lt;/li&gt;
&lt;li&gt;Safe to use blocking I/O (e.g., GUI event loops, blocking FFI calls)&lt;/li&gt;
&lt;li&gt;Cannot add additional spawns within the context (they are forced to go to the default context)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Suitable for main loops of GUI applications and FFI that calls C functions with I/O bundle blocking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Default Fiber Without Using ExecutionContext
&lt;/h3&gt;

&lt;p&gt;When ExecutionContext is not specified, Fibers execute in the default ExecutionContext (&lt;code&gt;Fiber::ExecutionContext.default&lt;/code&gt;). The default ExecutionContext is Parallel, but since the initial parallelism is set to 1, it behaves the same as Concurrent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Basic Patterns of Channel and WaitGroup
&lt;/h2&gt;

&lt;p&gt;Crystal's parallel processing is based on a &lt;strong&gt;Channel + WaitGroup&lt;/strong&gt; pattern similar to Go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Producer-Consumer (Parallel)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;consumers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"consumers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;channel&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Channel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;wg&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;WaitGroup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Atomic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;consumers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive?&lt;/span&gt;
      &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;ensure&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;
&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;

&lt;span class="nb"&gt;p&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 523776&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Communication via Channel&lt;/li&gt;
&lt;li&gt;Synchronization via WaitGroup&lt;/li&gt;
&lt;li&gt;Safe updates of shared state via Atomic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the basic form of parallel execution in Crystal.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;32 consumer Fibers executing in parallel atomically add 1024 integer values (0-1023) received from the channel and calculate their sum (523776)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Protection of Shared Variables in Concurrent
&lt;/h3&gt;

&lt;p&gt;Concurrent is serial execution so contention doesn't occur, but Crystal officially states that using &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; is preferable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Atomic / Mutex / SpinLock
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Atomic
&lt;/h3&gt;

&lt;p&gt;A variable that can safely read and write values even when accessed simultaneously from multiple threads, a basic synchronization primitive for preventing race conditions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Directly mapped to LLVM atomic instructions&lt;/li&gt;
&lt;li&gt;compare_and_set, add, sub, get, set&lt;/li&gt;
&lt;li&gt;Same memory orders as C/C++: Acquire / Release / Relaxed, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Types that cannot be used with Atomic include value types such as structures (&lt;code&gt;Struct&lt;/code&gt;) and &lt;code&gt;StaticArray&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mutex
&lt;/h3&gt;

&lt;p&gt;A lock that protects code regions (critical sections) that must not be executed simultaneously by multiple Fibers, controlling so that only one Fiber can execute at a time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fiber-safe&lt;/li&gt;
&lt;li&gt;Three modes: Checked / Reentrant / Unchecked&lt;/li&gt;
&lt;li&gt;Re-entry prohibited by default (safe)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;mutex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;  
&lt;span class="n"&gt;shared_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;  

&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  
  &lt;span class="n"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
    &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;synchronize&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
      &lt;span class="c1"&gt;# Only one Fiber executes at a time within this block  &lt;/span&gt;
      &lt;span class="n"&gt;shared_array&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  
      &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;  
  &lt;span class="k"&gt;end&lt;/span&gt;  
&lt;span class="k"&gt;end&lt;/span&gt;  

&lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;  
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;shared_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example of manually locking/unlocking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;mutex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;  
&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  

&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
  &lt;span class="n"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
    &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lock&lt;/span&gt;  
    &lt;span class="k"&gt;begin&lt;/span&gt;  
      &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  
      &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;  
    &lt;span class="k"&gt;ensure&lt;/span&gt;  
      &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unlock&lt;/span&gt;  &lt;span class="c1"&gt;# Always unlock  &lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;  
  &lt;span class="k"&gt;end&lt;/span&gt;  
&lt;span class="k"&gt;end&lt;/span&gt;  

&lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;  
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SpinLock
&lt;/h3&gt;

&lt;p&gt;A lightweight lock specialized for very short-term locks. It continues to use CPU while waiting (spinning), so it's unsuitable for long-term locks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For very short critical sections&lt;/li&gt;
&lt;li&gt;Only effective with preview_mt / win32&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SpinLock is used in implementations such as &lt;code&gt;Crystal::Scheduler&lt;/code&gt;, &lt;code&gt;Crystal::ThreadLocalValue&lt;/code&gt;, &lt;code&gt;Crystal::Once&lt;/code&gt;, &lt;code&gt;Mutex&lt;/code&gt;, &lt;code&gt;WaitGroup&lt;/code&gt;, &lt;code&gt;EventLoop::Polling&lt;/code&gt;, and &lt;code&gt;Fiber::StackPool&lt;/code&gt;. There are almost no scenarios where users would directly use SpinLock in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Areas to Be Careful About in the Standard Library
&lt;/h2&gt;

&lt;p&gt;The following are areas in the Crystal standard library that may not guarantee complete thread safety and require caution.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Qualifies as a Shared Variable Subject to Contention?
&lt;/h3&gt;

&lt;p&gt;While we've used the term "shared variable," Crystal doesn't have user-accessible global variables, so the most typical shared variable is a class variable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Class variables: Always shared variables (determined by variable type)&lt;/li&gt;
&lt;li&gt;Instance variables and local variables: Determined by &lt;strong&gt;whether they are referenced from multiple Fibers or threads when spawned&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If captured by spawn, local variables can also become shared variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  ENV
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The safety of Unix's getenv/setenv/unsetenv is environment-dependent&lt;/li&gt;
&lt;li&gt;Parallel modification is not recommended&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also discussed in the Crystal Forum:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29" rel="noopener noreferrer"&gt;https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Class Variables
&lt;/h3&gt;

&lt;p&gt;In Crystal, you can use the &lt;code&gt;@[ThreadLocal]&lt;/code&gt; annotation to make class variables thread-local.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Foo&lt;/span&gt;
  &lt;span class="nd"&gt;@[ThreadLocal]&lt;/span&gt;
  &lt;span class="vc"&gt;@@var&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;
    &lt;span class="vc"&gt;@@var&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, each thread has an independent copy of &lt;code&gt;@@var&lt;/code&gt;, so changing the value in one thread doesn't affect other threads.&lt;/p&gt;

&lt;p&gt;Class variables without &lt;code&gt;@[ThreadLocal]&lt;/code&gt; are shared. In this case, you need to use &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; for parallel updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  IO (File, Socket, STDOUT/ERR)
&lt;/h3&gt;

&lt;p&gt;Safety may not be guaranteed when simultaneously operating on the same IO from multiple threads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logger
&lt;/h3&gt;

&lt;p&gt;Logger also uses IO internally. Writing to the same Logger from multiple threads may not be safe.&lt;/p&gt;

&lt;h3&gt;
  
  
  Report Any Issues You Find
&lt;/h3&gt;

&lt;p&gt;Crystal is a programming language with far fewer users compared to languages like Python and Java. User reports are very valuable and precious. It's important to continue improving the language and libraries by actively reporting bugs to Crystal Forum and GitHub issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cases Where Thread Should Be Used
&lt;/h2&gt;

&lt;p&gt;Thread directly represents the OS's native thread. It can be used when low-level control is needed.&lt;/p&gt;

&lt;p&gt;There are almost no cases where you should use Thread directly without using ExecutionContext.&lt;br&gt;
It may be an option in cases such as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Want to parallelize compute-intensive tasks&lt;/li&gt;
&lt;li&gt;FFI is blocking and cannot suspend Fiber (however, if the FFI function is CPU-intensive processing, blocking is considered desirable behavior)&lt;/li&gt;
&lt;li&gt;C library requires thread-local initialization&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Using Thread::Channel enables safe communication between threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  FFI (C Library Calls) and Parallel Execution
&lt;/h2&gt;

&lt;p&gt;Since C libraries are not necessarily thread-safe, following patterns like these is considered safe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrap with &lt;code&gt;Mutex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Isolate in &lt;code&gt;ExecutionContext::Isolated&lt;/code&gt; context&lt;/li&gt;
&lt;li&gt;Dedicated Thread + Thread::Channel&lt;/li&gt;
&lt;li&gt;Use ThreadLocal state&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Crystal's parallel execution is currently in the midst of major evolution. In addition to &lt;code&gt;Fiber&lt;/code&gt;, which has been used for concurrent execution in I/O-bound processing, &lt;code&gt;ExecutionContext::Parallel&lt;/code&gt; now enables full-fledged parallel processing. Using &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; / &lt;code&gt;Channel&lt;/code&gt; / &lt;code&gt;WaitGroup&lt;/code&gt;, you can build safe parallel processing similar to Go. &lt;code&gt;Execution::Isolated&lt;/code&gt; is effective for GUI / FFI. &lt;code&gt;Thread&lt;/code&gt; can be used in special cases where OS threads need to be handled directly. Note that there are ambiguous parts regarding thread safety in the standard library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical Guidelines for Parallel Execution in Crystal
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Leave I/O to &lt;code&gt;Fiber&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;No special action needed as Crystal's I/O model is tightly integrated with &lt;code&gt;Fiber&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Use Parallel or Thread for CPU-bound tasks

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ExecutionContext::Parallel&lt;/code&gt; is the first choice.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Protect shared state with &lt;code&gt;Atomic&lt;/code&gt; or &lt;code&gt;Mutex&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Treat gray zones like &lt;code&gt;ENV&lt;/code&gt; and &lt;code&gt;Logger&lt;/code&gt; conservatively&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Test explicitly using &lt;code&gt;-Dpreview_mt&lt;/code&gt; and &lt;code&gt;-Dexecution_context&lt;/code&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This concludes the article. Thank you for reading to the end.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Notes on Building CLI and GUI tools with Crystal</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Wed, 15 Oct 2025 03:25:14 +0000</pubDate>
      <link>https://dev.to/kojix2/notes-on-building-cli-and-gui-tools-with-crystal-4pcd</link>
      <guid>https://dev.to/kojix2/notes-on-building-cli-and-gui-tools-with-crystal-4pcd</guid>
      <description>&lt;p&gt;This post is just me writing down some vague thoughts that are floating around in my head right now.&lt;/p&gt;

&lt;p&gt;Sorry if you came here expecting a well-structured tutorial — but you know, if I try to organize everything perfectly, I’ll never publish anything.&lt;/p&gt;




&lt;p&gt;Crystal originated from the Ruby community, so there are many people who want to build web applications with it.&lt;/p&gt;

&lt;p&gt;However, the Crystal programming language itself can be described as “a statically compiled language with a Ruby-like syntax and a garbage collector, somewhat like C with GC and type inference.”&lt;/p&gt;

&lt;p&gt;It’s not necessarily optimized for web applications.&lt;/p&gt;

&lt;p&gt;Personally, I wanted to use Crystal for command-line tools and GUI apps.&lt;/p&gt;

&lt;p&gt;For some reason, though, there don’t seem to be many people building CLI tools in Crystal.&lt;/p&gt;

&lt;p&gt;The ecosystem for building and distributing binaries wasn’t very well developed for a long time.&lt;/p&gt;

&lt;p&gt;That used to be a real pain, but after &lt;a href="https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if"&gt;gradually solving those issues&lt;/a&gt;, I think we’re now at the point where most CLI tools I want can be built and distributed in Crystal without much trouble.&lt;/p&gt;

&lt;p&gt;On the GUI side, the situation is similar — there aren’t many libraries available.&lt;/p&gt;

&lt;p&gt;But this isn’t unique to Crystal. GUI programming, in general, depends heavily on opaque, platform-specific APIs, which don’t always play nicely with open-source development.&lt;/p&gt;

&lt;p&gt;Still, I decided to work on it. I created &lt;a href="https://github.com/libui-ng/libui-ng" rel="noopener noreferrer"&gt;libui-ng&lt;/a&gt; bindings for both &lt;a href="https://github.com/kojix2/LibUI" rel="noopener noreferrer"&gt;Ruby&lt;/a&gt; and &lt;a href="https://github.com/kojix2/uing" rel="noopener noreferrer"&gt;Crystal&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As it turned out, &lt;a href="https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6"&gt;libui-ng doesn’t work very well with garbage-collected languages&lt;/a&gt;, but I managed to make it usable anyway.&lt;/p&gt;

&lt;p&gt;Then I got curious about &lt;a href="https://tauri.app/" rel="noopener noreferrer"&gt;Tauri&lt;/a&gt; and &lt;a href="https://www.electronjs.org" rel="noopener noreferrer"&gt;Electron&lt;/a&gt; — the now-famous WebView-based app frameworks.&lt;/p&gt;

&lt;p&gt;Personally, I can barely read JavaScript, so I had no real interest in those at first, but their popularity made me curious.&lt;/p&gt;

&lt;p&gt;Crystal also has a &lt;a href="https://github.com/naqvis/webview" rel="noopener noreferrer"&gt;WebView binding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And as I mentioned earlier, web app development in the Crystal ecosystem is quite active.&lt;/p&gt;

&lt;p&gt;So I decided to give it a try.&lt;/p&gt;

&lt;p&gt;I learned that “WebView” isn’t a single library — each OS (Windows, Linux, macOS) provides its own.&lt;/p&gt;

&lt;p&gt;Projects like &lt;a href="https://github.com/webview/webview" rel="noopener noreferrer"&gt;webview/webview&lt;/a&gt; and Tauri’s &lt;a href="https://github.com/tauri-apps/wry" rel="noopener noreferrer"&gt;wry&lt;/a&gt; act as unifying layers over these platform-specific APIs.&lt;/p&gt;

&lt;p&gt;Tauri itself uses WebView under the hood while also providing a framework to handle security and integration with Rust backends.&lt;/p&gt;

&lt;p&gt;Maybe it’s possible to use TypeScript and other frontend tools with Crystal too, but personally, I prefer the more old-fashioned approach — something like &lt;a href="https://github.com/kemalcr/kemal" rel="noopener noreferrer"&gt;Kemal&lt;/a&gt; + &lt;a href="https://crystal-lang.org/api/ECR.html" rel="noopener noreferrer"&gt;ECR&lt;/a&gt;, the “classic amateur” way.&lt;/p&gt;

&lt;p&gt;When I actually started building an app with Crystal + WebView, I discovered a few things.&lt;/p&gt;

&lt;p&gt;First, you need to pay attention to event loops and thread management.&lt;/p&gt;

&lt;p&gt;The WebView itself runs in a separate process, and at the same time you need to run a Kemal server.&lt;/p&gt;

&lt;p&gt;That means you often have to make it multithreaded and carefully manage your execution contexts or Fibers — otherwise, things simply won’t run correctly.&lt;/p&gt;

&lt;p&gt;Then there’s the build, linking, and packaging pain.&lt;/p&gt;

&lt;p&gt;I sent &lt;a href="https://github.com/naqvis/webview/pulls?q=+author%3Akojix2" rel="noopener noreferrer"&gt;a few pull requests&lt;/a&gt; to the Crystal WebView project, which helped a bit, but building on MinGW is still rough.&lt;/p&gt;

&lt;p&gt;MSVC technically works, but it’s just too tedious to deal with, so I decided to stay away from it.&lt;/p&gt;

&lt;p&gt;Bundling shared libraries is also tricky.&lt;/p&gt;

&lt;p&gt;I’d prefer to lean toward static linking whenever possible, but depending on licensing and security update concerns, it’s sometimes better to link against system or bundled shared libraries.&lt;/p&gt;

&lt;p&gt;Even if you get the build and linking sorted out, packaging is still painful — creating &lt;a href="https://en.wikipedia.org/wiki/Application_package" rel="noopener noreferrer"&gt;application packages&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Apple_Disk_Image" rel="noopener noreferrer"&gt;Apple disk images (DMG)&lt;/a&gt;, or Windows installers with &lt;a href="https://jrsoftware.org/isinfo.php" rel="noopener noreferrer"&gt;Inno Setup&lt;/a&gt;, or even &lt;code&gt;.deb&lt;/code&gt; packages.&lt;/p&gt;

&lt;p&gt;I discovered tools like &lt;a href="https://github.com/jordansissel/fpm" rel="noopener noreferrer"&gt;fpm&lt;/a&gt;, which are really useful, but in the end, I still end up asking AI to help me write custom GitHub Actions YAML and shell scripts.&lt;/p&gt;

&lt;p&gt;And then, once you finally have a working binary, Windows or macOS antivirus software will happily flag it as suspicious.&lt;/p&gt;

&lt;p&gt;Maybe for people doing this professionally, all this doesn’t sound like a big deal, but as someone doing it for fun, it’s a lot of work.&lt;br&gt;
Even so, after all the pain, I’ve started to feel like — maybe, just maybe — this setup is actually pretty cool.&lt;/p&gt;




&lt;p&gt;This post was translated from the original Japanese version using ChatGPT.&lt;br&gt;&lt;br&gt;
You can read the original post &lt;a href="https://qiita.com/kojix2/items/e9bb62e9ff9f966b36a4" rel="noopener noreferrer"&gt;here&lt;/a&gt; [JA]&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>webview</category>
    </item>
    <item>
      <title>libui and Garbage Collection - Challenges in Creating Ruby and Crystal Bindings</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 26 Sep 2025 02:15:46 +0000</pubDate>
      <link>https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6</link>
      <guid>https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;libui is a GUI library that supports the three major operating systems: Windows, macOS, and Linux (currently, the successor project is libui-ng). Internally, it contains three different libraries that call native APIs, unified under a single &lt;code&gt;ui.h&lt;/code&gt; header file to provide similar UI functionality across all operating systems. It can also be easily used from other languages through FFI (Foreign Function Interface). While development has slowed somewhat recently, there are few similar libraries available, and libui continues to maintain its unique value.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui Bindings
&lt;/h2&gt;

&lt;p&gt;I have been creating &lt;a href="https://github.com/kojix2/libui" rel="noopener noreferrer"&gt;Ruby bindings&lt;/a&gt; and &lt;a href="https://github.com/kojix2/uing" rel="noopener noreferrer"&gt;Crystal bindings&lt;/a&gt; for libui. Through this process, I have come to realize how difficult it is to combine libui with garbage collection.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem of Disappearing Controls and Callback Functions
&lt;/h2&gt;

&lt;p&gt;Creating Ruby or Crystal bindings for libui is not particularly difficult. The work of checking function signatures and writing matching low-level bindings can be done mechanically.&lt;/p&gt;

&lt;p&gt;However, when you call these low-level APIs to create simple applications, the following problems occur with a certain probability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Controls disappear and memory access violations occur&lt;/li&gt;
&lt;li&gt;Callbacks disappear and memory access violations occur&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both Ruby and Crystal are languages that use garbage collection (GC), so memory that is determined to be unused gets reclaimed. As a result, pointers and callback functions that should be used in the future by the GUI main loop are mistakenly freed by the GC.&lt;/p&gt;

&lt;p&gt;In GC languages, the timing of memory deallocation is controlled indirectly through references.&lt;/p&gt;

&lt;p&gt;In Ruby, callback functions are unconditionally stored in a dedicated array. This effectively creates a memory leak (old callbacks remain in the array even after new ones are added), but since callback functions are usually finite in number in GUI applications, this is not a practical problem.&lt;/p&gt;

&lt;p&gt;Crystal uses a more complex management approach. Each callback function is tied to the instance of its related control. For example, a callback function that fires when a button is pressed is owned by that button. Additionally, the nested relationships of controls themselves are reproduced as an ownership tree. For example, a Window contains a Box, and the Box holds a Label and Button.&lt;/p&gt;

&lt;p&gt;By using this ownership tree, we can significantly reduce the problem of incorrect collection by the GC.&lt;/p&gt;

&lt;p&gt;By the way, why does Crystal's GC collect pointers even though controls may be referenced later in the main loop? I don't have a clear understanding of this point, but it's possible that memory tracking becomes difficult when closures are boxed.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui's Memory Management Rules
&lt;/h2&gt;

&lt;p&gt;libui is a C library designed for users to manage memory themselves. However, in practice, it introduces a mechanism where "when a parent control is freed, the memory of child controls is also freed." The controls that can be parent controls are Window, Box, Grid, Group, Tab, and Form.&lt;/p&gt;

&lt;p&gt;When you &lt;code&gt;destroy&lt;/code&gt; these, child controls are freed first, then the parent itself is freed. Therefore, in actual operation, you often free child controls collectively by destroying the Window.&lt;/p&gt;

&lt;p&gt;The problem is that on the Crystal side, we cannot detect such deallocation within native libraries. NULL checks might help us guess immediately after memory deallocation (libui sets pointers to NULL before deallocation), but this is unreliable.&lt;/p&gt;

&lt;p&gt;Window deallocation can happen automatically. When the [x] button in the Window's title bar is clicked, a callback function is triggered by &lt;code&gt;uiWindowOnClosing&lt;/code&gt;, and if the return value is true, the Window's destroy is automatically triggered.&lt;/p&gt;

&lt;p&gt;In contrast, &lt;code&gt;uiOnShouldQuit&lt;/code&gt; triggered from the Quit option in the menu bar represents application termination, so it does not automatically trigger destroy for the window. The user must destroy the Window themselves and call uiQuit.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui's Memory Leak Detection Mechanism
&lt;/h2&gt;

&lt;p&gt;libui has a built-in mechanism for detecting memory leaks. This is a very useful feature, but it often doesn't work well with GC languages. This is because in GC, the timing of memory deallocation is indefinite, and we cannot guarantee that all memory has been freed at the time of checking. Therefore, implementations that hook into GC's &lt;code&gt;finalize&lt;/code&gt; to perform deallocation should be avoided.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;Table is based on Model-View architecture, with TableModel and Table separated. A TableModel can only be freed after all Tables using that model have been destroyed. Therefore, the deallocation procedure is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove the Table from its parent control&lt;/li&gt;
&lt;li&gt;Explicitly destroy the Table&lt;/li&gt;
&lt;li&gt;Finally destroy the TableModel&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Area Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;Unlike Table, Area can be handled by simply destroying the control.&lt;/p&gt;

&lt;h2&gt;
  
  
  MultilineEntry Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;While detailed investigation of the cause is still in progress, on macOS there appear to be cases where problems occur unless you remove it from the parent control and destroy it individually, similar to Table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;When using libui (libui-ng), there are many important considerations regarding memory management, especially deallocation.&lt;/p&gt;

&lt;p&gt;In languages that use garbage collection like Crystal and Ruby, you normally don't need to worry about memory. Even with C language bindings, manual memory management often becomes unnecessary by using deallocation callback functions like &lt;code&gt;finalize&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;However, I learned that with libraries like GUI libraries that have interactive operations where timing and synchronization are important, there are cases where you cannot rely too much on GC and must manually free memory at appropriate times.&lt;/p&gt;

&lt;p&gt;In such cases, Ruby and Crystal often provide APIs that use blocks based on RAII (Resource Acquisition Is Initialization) concepts. This can handle more than half of the cases.&lt;/p&gt;

&lt;p&gt;There seem to be cases that are difficult to handle with this alone, but I am still learning and experimenting through trial and error.&lt;/p&gt;




&lt;p&gt;Thank you for reading. This article was translated from Japanese to English by Claude Sonnet4.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qiita.com/kojix2/items/de37dfa5f00926499c37" rel="noopener noreferrer"&gt;libui と ガベージコレクション - Ruby と Crystal のバインディングを作って感じた難しさ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>libui</category>
      <category>crystal</category>
      <category>ruby</category>
    </item>
    <item>
      <title>12 Things I Learned Writing CLI Tools in Crystal</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 22 Sep 2025 05:45:49 +0000</pubDate>
      <link>https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if</link>
      <guid>https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if</guid>
      <description>&lt;p&gt;I love the Crystal programming language. For the past two or three years, I have been building command-line tools with it. During this time, I often compared it with Ruby, and I encountered many differences, discoveries, and obstacles. In this article, I will share them.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Similarity to Ruby
&lt;/h2&gt;

&lt;p&gt;Crystal looks very similar to Ruby. Many common Ruby idioms also work in Crystal. Crystal is statically typed, but most of the time you do not need to write types explicitly. Type inference will do the work for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Use DeepWiki
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepwiki.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;DeepWiki&lt;/a&gt; is very useful for learning Crystal. For a niche language, it is one of the best resources. You can even ask questions in your native language.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Arrays and Hashes cannot mix types
&lt;/h2&gt;

&lt;p&gt;In Crystal, you cannot freely mix different types in an &lt;code&gt;Array&lt;/code&gt; or &lt;code&gt;Hash&lt;/code&gt;. Ruby allows this, but Crystal does not. You can use union types, but usually it is better to avoid them. Instead, consider one of these options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make a class or &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/structs.html" rel="noopener noreferrer"&gt;struct&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use a &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/structs.html#records" rel="noopener noreferrer"&gt;record&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use a &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/literals/tuple.html" rel="noopener noreferrer"&gt;Tuple&lt;/a&gt; for temporary data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first this may feel inconvenient, but you get used to it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Array(Int32 | String | Symbol) - not recommended&lt;/span&gt;
&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"two"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:three&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# OK: Tuple for fixed positions&lt;/span&gt;
&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"two"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:three&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# OK: record for structured data&lt;/span&gt;
&lt;span class="kp"&gt;record&lt;/span&gt; &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Symbol&lt;/span&gt;
&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"apple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="ss"&gt;:fruit&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"orange"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:fruit&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. No &lt;code&gt;eval&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Crystal does not have &lt;code&gt;eval&lt;/code&gt;. This is one big difference from Ruby.&lt;br&gt;
If you really need dynamic evaluation, you should use Ruby. Another choice is to embed mruby or use a library like &lt;a href="https://github.com/Anyolite/anyolite" rel="noopener noreferrer"&gt;Anyolite&lt;/a&gt;. Crystal itself has an interpreter, but it is not practical and slower than Ruby or mruby.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Ruby&lt;/span&gt;
&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1 + 2"&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="nb"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; 3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Crystal has no eval&lt;/span&gt;
&lt;span class="c1"&gt;# You must design differently&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Method overloading
&lt;/h2&gt;

&lt;p&gt;In Ruby, it is common to branch on the argument type inside one method.&lt;br&gt;
In Crystal, it is more natural to use method overloading. This makes the code clearer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# =&amp;gt; 144&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"12"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# =&amp;gt; 144&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Return types should be consistent
&lt;/h2&gt;

&lt;p&gt;In Ruby, a method can return values of different types. In Crystal, if the return type is not clear, you will run into trouble. If you want to return different types, you should split the method. You can use a union type, but it is not recommended.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# not recommended&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;maybe_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;
  &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"forty-two"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value_int&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value_str&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;
  &lt;span class="s2"&gt;"forty-two"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Handling Nil
&lt;/h2&gt;

&lt;p&gt;Pay attention to whether a variable can be &lt;code&gt;Nil&lt;/code&gt;.&lt;br&gt;
If it can, you need to handle it with &lt;code&gt;not_nil!&lt;/code&gt;, &lt;code&gt;if val = maybe_val&lt;/code&gt;, or the safe navigation operator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upcase&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
  &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="s2"&gt;"name is nil"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. Garbage collection
&lt;/h2&gt;

&lt;p&gt;Crystal uses LLVM and relies on an external GC (&lt;code&gt;libgc&lt;/code&gt;).&lt;br&gt;
Performance is often close to Rust or Nim, but memory profiling and tuning can be difficult. Also, the timing of GC is not predictable, so Crystal may not be suitable for real-time systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Asynchronous I/O
&lt;/h2&gt;

&lt;p&gt;Asynchronous I/O is available by default. Some developers feel it is easier to use than in Rust. &lt;/p&gt;

&lt;h2&gt;
  
  
  10. Linking when distributing
&lt;/h2&gt;

&lt;p&gt;Crystal programs are usually linked with &lt;code&gt;libgc&lt;/code&gt; and other libraries such as &lt;code&gt;libpcre2&lt;/code&gt;. Be careful when distributing binaries.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux: You can build statically linked binaries with GitHub Actions + Docker + musl&lt;/li&gt;
&lt;li&gt;macOS: You can prepare a Homebrew Tap, or build portable binaries with static linking for &lt;code&gt;libgc&lt;/code&gt;, &lt;code&gt;libpcre2&lt;/code&gt;, and others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See also: github actions &lt;a href="https://github.com/kojix2/lolcat.cr/blob/main/.github/workflows/build.yml" rel="noopener noreferrer"&gt;workflow&lt;/a&gt; in &lt;a href="https://github.com/kojix2/lolcat.cr" rel="noopener noreferrer"&gt;lolcat.cr&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  11. Windows support
&lt;/h2&gt;

&lt;p&gt;Crystal now works on &lt;a href="https://crystal-lang.org/install/on_windows/" rel="noopener noreferrer"&gt;Windows (MSVC / MinGW64)&lt;/a&gt; more stably than before. Parallel execution also works. However, solving C library dependencies can still be painful. If you are not familiar with Windows, you may need to ask AI for help.&lt;/p&gt;

&lt;h2&gt;
  
  
  ~~ 12. Limitations of OptionParser~~
&lt;/h2&gt;

&lt;p&gt;&lt;del&gt;The standard &lt;code&gt;OptionParser&lt;/code&gt; does not support combined short options.&lt;br&gt;
So &lt;code&gt;ls -l -h&lt;/code&gt; works, but &lt;code&gt;ls -lh&lt;/code&gt; does not.&lt;br&gt;
I plan to create a pull request to fix this in the future.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;Update: This has already been resolved as of February 2026. Starting from version 1.20, short option bundling will be enabled!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/crystal-lang/crystal/pull/16563" rel="noopener noreferrer"&gt;https://github.com/crystal-lang/crystal/pull/16563&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Writing command-line tools in Crystal is sometimes painful. But at the same time, you learn a lot. I believe the “best days” of the Crystal language are not in the past or present, but in the future.&lt;/p&gt;




&lt;p&gt;This post was originally based on my reply to a thread on Reddit, then expanded into a Japanese article on Qiita, and now translated into English with the help of ChatGPT.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/crystal_programming/comments/1nhwzy3/comment/nehjvnu/?utm_source=share&amp;amp;utm_medium=web3x&amp;amp;utm_name=web3xcss&amp;amp;utm_term=1&amp;amp;utm_content=share_button" rel="noopener noreferrer"&gt;Considering rewriting my CLI tool from Ruby to Crystal - what should I watch out for?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qiita.com/kojix2/items/c305d46aafd2b51a153c" rel="noopener noreferrer"&gt;Crystalでコマンドラインツールを作って気づいた12のこと&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Embedding the Crystal Compiler in Your Program</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Sat, 09 Aug 2025 09:46:26 +0000</pubDate>
      <link>https://dev.to/kojix2/embedding-the-crystal-compiler-in-your-program-2ief</link>
      <guid>https://dev.to/kojix2/embedding-the-crystal-compiler-in-your-program-2ief</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://forum.crystal-lang.org/t/how-can-i-use-crystal-compiler-as-a-library/6162/8" rel="noopener noreferrer"&gt;The Crystal compiler can be used as a library.&lt;/a&gt;&lt;br&gt;
This document explains how to set it up and use it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating the Project
&lt;/h2&gt;

&lt;p&gt;First, create a new Crystal project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal init app duck_egg
&lt;span class="nb"&gt;cd &lt;/span&gt;duck_egg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Editing &lt;code&gt;shard.yml&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Edit the &lt;code&gt;shard.yml&lt;/code&gt; file as follows.&lt;br&gt;
In this example, we add &lt;code&gt;markd&lt;/code&gt; and &lt;code&gt;reply&lt;/code&gt; to the &lt;code&gt;dependencies&lt;/code&gt; section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duck_egg&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.1.0&lt;/span&gt;

&lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;🥚&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/duck_egg.cr&lt;/span&gt;

&lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;markd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;icyleaf/markd&lt;/span&gt;
  &lt;span class="na"&gt;reply&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;I3oris/reply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating &lt;code&gt;duck_egg.cr&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;src/duck_egg.cr&lt;/code&gt; and add the following code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"compiler/requires"&lt;/span&gt;

&lt;span class="no"&gt;BIRDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐔"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"cluck!"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐓"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"cock-a-doodle-doo"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦃"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"gobble"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦆"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"quack"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦉"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"hoot"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦜"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"squawk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🕊"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"coo"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦢"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"honk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦩"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"brrrrt"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐧"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"honk honk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦤"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"boop"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦕"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Bwooooon!!"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦖"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Raaaaawr!!"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;BIRDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;

&lt;span class="n"&gt;compiler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Crystal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Compiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Crystal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Compiler&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sx"&gt;%Q(puts "&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;  &amp;lt; &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;sound&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;")&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;compiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bird&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this program, the Crystal compiler is embedded in the target 🥚.&lt;br&gt;
When 🥚 is executed, a random bird is selected.&lt;br&gt;
The embedded compiler generates a binary that displays the bird and its sound.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building and Running
&lt;/h2&gt;

&lt;p&gt;First, build the program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shards build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, check the &lt;code&gt;CRYSTAL_PATH&lt;/code&gt; environment variable to find the location of the Crystal standard library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal &lt;span class="nb"&gt;env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Crystal compiler requires the standard library even for very simple code such as &lt;code&gt;puts 0&lt;/code&gt;.&lt;br&gt;
Therefore, &lt;code&gt;CRYSTAL_PATH&lt;/code&gt; must be set to include the path to the standard library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CRYSTAL_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;lib:/usr/local/bin/../share/crystal/src
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/🥚
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🦖
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the generated binary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./🦖
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🦖  &amp;lt; Raaaaawr!!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;By using the Crystal compiler as a library, you can generate and compile code dynamically. This technique can be applied in many interesting ways.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Easily Visualize Debian Package Dependencies with debtree</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 08 Aug 2025 05:38:33 +0000</pubDate>
      <link>https://dev.to/kojix2/easily-visualize-debian-package-dependencies-with-debtree-2g1n</link>
      <guid>https://dev.to/kojix2/easily-visualize-debian-package-dependencies-with-debtree-2g1n</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Sometimes you might want a quick and easy way to visualize and understand which packages a given package depends on. With the &lt;code&gt;debtree&lt;/code&gt; package and &lt;code&gt;graphviz&lt;/code&gt;, you can do this in just a few steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Install both &lt;code&gt;debtree&lt;/code&gt; and &lt;code&gt;graphviz&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;debtree graphviz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Visualizing dependencies
&lt;/h2&gt;

&lt;p&gt;If you can specify the package name you want to visualize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dpkg &lt;span class="nt"&gt;-l&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;ufw &lt;span class="c"&gt;# Check if it exists&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can easily visualize the packages it depends on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree ufw | dot &lt;span class="nt"&gt;-T&lt;/span&gt; png &lt;span class="nt"&gt;-o&lt;/span&gt; ufw_deps.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9o8nokj5vfvfbz44wht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9o8nokj5vfvfbz44wht.png" alt="ufw_deps.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, I specified &lt;code&gt;-T png&lt;/code&gt; to output a PNG image for embedding in Qiita, but you can choose from many other formats like &lt;code&gt;svg&lt;/code&gt;.&lt;br&gt;
If you have a desktop environment, you can also visualize it instantly using &lt;code&gt;x11&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree ufw | dot &lt;span class="nt"&gt;-T&lt;/span&gt; x11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84azcv4pnbbg5e95vj63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84azcv4pnbbg5e95vj63.png" alt="ufw_x11.png" width="670" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing reverse dependencies
&lt;/h2&gt;

&lt;p&gt;To visualize reverse dependencies, you can use the &lt;code&gt;-R&lt;/code&gt; / &lt;code&gt;--show-rdeps&lt;/code&gt; option.&lt;/p&gt;

&lt;p&gt;However, using &lt;code&gt;-R&lt;/code&gt; alone will also display many packages that are not actually installed on your system.&lt;br&gt;
For a cleaner view, add the &lt;code&gt;-I&lt;/code&gt; / &lt;code&gt;--show-installed&lt;/code&gt; option to limit the output to installed packages only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt; iptables | dot &lt;span class="nt"&gt;-T&lt;/span&gt; x11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From this, you can see that &lt;code&gt;docker-ce&lt;/code&gt; and &lt;code&gt;ubuntu-standard&lt;/code&gt; depend on &lt;code&gt;iptables&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoqxc0n8ztgu1az4du4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoqxc0n8ztgu1az4du4w.png" alt="iptables_reve_deps" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it for today! &lt;/p&gt;

</description>
      <category>debian</category>
      <category>ubuntu</category>
      <category>debtree</category>
      <category>graphviz</category>
    </item>
    <item>
      <title>Writing SIMD in Crystal with Inline Assembly</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Thu, 07 Aug 2025 01:28:30 +0000</pubDate>
      <link>https://dev.to/kojix2/writing-simd-in-crystal-with-inline-assembly-1lkp</link>
      <guid>https://dev.to/kojix2/writing-simd-in-crystal-with-inline-assembly-1lkp</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, we explore how to write SIMD instructions—SSE for x86\64 and NEON for AArch64—using inline assembly in the Crystal programming language.&lt;br&gt;
Crystal uses LLVM as its backend, but &lt;a href="https://github.com/crystal-lang/crystal/issues/3057" rel="noopener noreferrer"&gt;it doesn’t yet fully optimize with SIMD&lt;/a&gt;.&lt;br&gt;
This is not a performance tuning guide, but rather a fun exploration into low-level programming with Crystal.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;code&gt;asm&lt;/code&gt; Syntax
&lt;/h2&gt;

&lt;p&gt;Crystal provides the &lt;code&gt;asm&lt;/code&gt; keyword for writing inline assembly. The syntax is based on LLVM's integrated assembler.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"template"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clobbers&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each section:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;template&lt;/code&gt;: LLVM-style assembly code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;outputs&lt;/code&gt;: Output operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inputs&lt;/code&gt;: Input operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;clobbers&lt;/code&gt;: Registers that will be modified&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flags&lt;/code&gt;: Optional (e.g., &lt;code&gt;"volatile"&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;For a detailed explanation, see the &lt;a href="https://crystal-lang.org/reference/latest/syntax_and_semantics/asm.html" rel="noopener noreferrer"&gt;official docs&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Types of SIMD Instructions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSE / AVX&lt;/strong&gt; for Intel and AMD CPUs (x86_64)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NEON&lt;/strong&gt; for ARM CPUs (like Apple Silicon)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of Registers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Registers Used in x86_64
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;General-purpose: &lt;code&gt;rax&lt;/code&gt;, &lt;code&gt;rbx&lt;/code&gt;, &lt;code&gt;rcx&lt;/code&gt;, &lt;code&gt;rdx&lt;/code&gt;, &lt;code&gt;rsi&lt;/code&gt;, &lt;code&gt;rdi&lt;/code&gt;, &lt;code&gt;rsp&lt;/code&gt;, &lt;code&gt;rbp&lt;/code&gt;, &lt;code&gt;r8–r15&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SIMD:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Width&lt;/th&gt;
&lt;th&gt;Instruction Set&lt;/th&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xmm0–xmm15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;128-bit&lt;/td&gt;
&lt;td&gt;SSE&lt;/td&gt;
&lt;td&gt;Floats, ints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ymm0–ymm15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;256-bit&lt;/td&gt;
&lt;td&gt;AVX&lt;/td&gt;
&lt;td&gt;Wider SIMD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;zmm0–zmm31&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;512-bit&lt;/td&gt;
&lt;td&gt;AVX-512&lt;/td&gt;
&lt;td&gt;Used in newer CPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Registers Used in AArch64 (NEON)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Vector registers: &lt;code&gt;v0&lt;/code&gt;–&lt;code&gt;v31&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;v0.4s&lt;/code&gt; = 4 × 32-bit floats&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;v1.8h&lt;/code&gt; = 8 × 16-bit half-precision floats&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Examples of Register Specification
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;SSE: &lt;code&gt;xmm0&lt;/code&gt;, &lt;code&gt;xmm1&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;NEON: &lt;code&gt;v0.4s&lt;/code&gt;, &lt;code&gt;v1.8h&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLVM assigns SSE registers automatically&lt;/li&gt;
&lt;li&gt;NEON requires explicit register naming in inline assembly&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To follow along:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emit LLVM IR:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  crystal build &lt;span class="nt"&gt;--emit&lt;/span&gt; llvm-ir foo.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Emit assembly:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  crystal build &lt;span class="nt"&gt;--emit&lt;/span&gt; asm foo.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Benchmarking tool: &lt;a href="https://github.com/sharkdp/hyperfine" rel="noopener noreferrer"&gt;&lt;code&gt;hyperfine&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use of &lt;code&gt;uninitialized&lt;/code&gt; and &lt;code&gt;to_unsafe&lt;/code&gt; for low-level memory access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Basic Vector Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     addps %xmm1, %xmm0      // perform parallel addition of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fadd v2.4s, v0.4s, v1.4s // add each element
     st1 {v2.4s}, [$0]        // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector Multiplication
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     mulps %xmm1, %xmm0      // perform parallel multiplication of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector multiplication: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fmul v2.4s, v0.4s, v1.4s // multiply each element
     st1 {v2.4s}, [$0]        // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector multiplication: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Aggregation Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector Sum
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector into xmm0
     haddps %xmm0, %xmm0     // horizontal add: [a+b, c+d, a+b, c+d]
     haddps %xmm0, %xmm0     // horizontal add again: [a+b+c+d, *, *, *]
     movss %xmm0, ($0)       // store the first element of result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector sum: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]         // load vector
     faddp v1.4s, v0.4s, v0.4s // pairwise add: [a+b, c+d, a+b, c+d]
     faddp v2.2s, v1.2s, v1.2s // pairwise add again: [a+b+c+d, *]
     str s2, [$0]              // store the final sum"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector sum: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Finding Maximum Value
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0          // load vector into xmm0
     movaps %xmm0, %xmm1         // copy xmm0 to xmm1
     shufps $$0x4E, %xmm1, %xmm1 // swap upper and lower pairs
     maxps %xmm1, %xmm0          // compute max of each pair
     movaps %xmm0, %xmm1         // copy result to xmm1
     shufps $$0x01, %xmm1, %xmm1 // shuffle adjacent elements
     maxps %xmm1, %xmm0          // compute final max
     movss %xmm0, ($0)           // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector max: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]         // load vector
     fmaxp v1.4s, v0.4s, v0.4s // pairwise max: [max(a, b), max(c, d), ...]
     fmaxp v2.2s, v1.2s, v1.2s // final pairwise max
     str s2, [$0]              // store result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector max: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integer Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integer Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;int_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;int_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movdqu ($1), %xmm0      // load integer vector a into xmm0
     movdqu ($2), %xmm1      // load integer vector b into xmm1
     paddd %xmm1, %xmm0      // perform parallel addition of four 32-bit integers
     movdqu %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Integer addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;int_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;int_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load integer vector a
     ld1 {v1.4s}, [$2]        // load integer vector b
     add v2.4s, v0.4s, v1.4s  // perform element-wise addition
     st1 {v2.4s}, [$0]        // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Integer addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Saturated Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;29_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sat_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;500_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movdqu ($1), %xmm0      // load 8 × 16-bit integers into xmm0
     movdqu ($2), %xmm1      // load 8 × 16-bit integers into xmm1
     paddsw %xmm1, %xmm0     // perform saturated addition
     movdqu %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Saturated addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sat_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;29_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sat_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;500_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.8h}, [$1]          // load 8 × 16-bit integers from a into v0
     ld1 {v1.8h}, [$2]          // load 8 × 16-bit integers from b into v1
     sqadd v2.8h, v0.8h, v1.8h  // perform saturated addition
     st1 {v2.8h}, [$0]          // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Saturated addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sat_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Examining LLVM-IR and Assembly
&lt;/h2&gt;

&lt;p&gt;To inspect LLVM IR output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build your_file.cr &lt;span class="nt"&gt;--emit&lt;/span&gt; llvm-ir &lt;span class="nt"&gt;--no-debug&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To inspect raw assembly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build your_file.cr &lt;span class="nt"&gt;--emit&lt;/span&gt; asm &lt;span class="nt"&gt;--no-debug&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see that your inline &lt;code&gt;asm&lt;/code&gt; blocks are preserved as-is, even with optimizations (&lt;code&gt;-O3&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight llvm"&gt;&lt;code&gt;&lt;span class="nl"&gt;__crystal_once.exit.i.i:&lt;/span&gt;                          &lt;span class="c1"&gt;; preds = %else.i.i.i, %.noexc98&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;store&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;1.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;2.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;3.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;4.000000e+00&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;store&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;5.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;6.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;7.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;8.000000e+00&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="k"&gt;asm&lt;/span&gt; &lt;span class="k"&gt;sideeffect&lt;/span&gt; &lt;span class="s"&gt;"ld1 {v0.4s}, [$1] \0Ald1 {v1.4s}, [$2] \0Afadd v2.4s, v0.4s, v1.4s \0Ast1 {v2.4s}, [$0]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r,r,r,~{v0},~{v1},~{v2},~{memory}"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="vg"&gt;#30&lt;/span&gt;
  &lt;span class="nv"&gt;%314&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;load&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;%315&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;invoke&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="vg"&gt;@GC_malloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="kt"&gt;label&lt;/span&gt; &lt;span class="nv"&gt;%.noexc100&lt;/span&gt; &lt;span class="k"&gt;unwind&lt;/span&gt; &lt;span class="kt"&gt;label&lt;/span&gt; &lt;span class="nv"&gt;%rescue2.loopexit.split-lp.loopexit.split-lp.loopexit.split-lp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lloh2300:
        ldr     q1, [x9, lCPI312_43@PAGEOFF]
        add     x8, sp, #164
        add     x9, sp, #128
        str     q0, [sp, #128]
        stur    q1, [x29, #-128]
        ; InlineAsm Start
        ld1.4s  { v0 }, [x9]
        ld1.4s  { v1 }, [x10]
        fadd.4s v2, v0, v1
        st1.4s  { v2 }, [x8]
        ; InlineAsm End
        ldr     q0, [x25]
        str     q0, [sp, #16]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Miscellaneous
&lt;/h2&gt;

&lt;p&gt;When using SIMD with parallelism, memory bandwidth can become the bottleneck.&lt;br&gt;
Although Crystal currently runs single-threaded by default, true parallelism is in progress, and memory limitations may become relevant in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We’ve explored how to write SIMD operations in Crystal using inline &lt;code&gt;asm&lt;/code&gt;, and examined how those instructions are lowered into LLVM IR and eventually into assembly.&lt;/p&gt;

&lt;p&gt;This was a deep dive into low-level Crystal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: SIMD Instruction Reference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SSE (x86_64)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instruction&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movups&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Float32 (unaligned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movaps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Float32 (aligned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movdqu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Int32 or 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Store scalar Float32 (lowest lane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;addps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mulps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multiply 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;paddd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Int32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;paddsw&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saturated add 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;haddps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Horizontal add of Float32 pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;maxps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Element-wise max (Float32)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;shufps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shuffle Float32 lanes (for reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  NEON (AArch64)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instruction&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ld1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load vector (e.g. &lt;code&gt;v0.4s&lt;/code&gt;, &lt;code&gt;v0.8h&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;st1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Store vector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;add&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Int32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sqadd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saturated add 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fadd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmul&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multiply 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;faddp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pairwise add (Float32 reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmaxp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pairwise max (Float32 reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;faddv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector-wide add (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmaxv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector-wide max (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SSE's &lt;code&gt;movaps&lt;/code&gt; and &lt;code&gt;movdqa&lt;/code&gt; require 16-byte alignment.&lt;/li&gt;
&lt;li&gt;NEON's &lt;code&gt;faddp&lt;/code&gt;, &lt;code&gt;fmaxp&lt;/code&gt; reduce in two steps: 4 → 2 → 1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shufps&lt;/code&gt; is used with masks like &lt;code&gt;0x4E&lt;/code&gt;, &lt;code&gt;0x01&lt;/code&gt; for reordering lanes during reduction.&lt;/li&gt;
&lt;li&gt;Saturated arithmetic (&lt;code&gt;paddsw&lt;/code&gt;, &lt;code&gt;sqadd&lt;/code&gt;) clamps values on overflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading — and happy crystaling! 💎&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>assembly</category>
    </item>
    <item>
      <title>Building Portable Crystal Binaries on macOS with GitHub Actions</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 21 Jul 2025 02:34:19 +0000</pubDate>
      <link>https://dev.to/kojix2/how-to-distribute-a-statically-linked-crystal-binary-on-macos-with-github-actions-1gc6</link>
      <guid>https://dev.to/kojix2/how-to-distribute-a-statically-linked-crystal-binary-on-macos-with-github-actions-1gc6</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;If you’ve ever tried to share a &lt;a href="https://github.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;Crystal&lt;/a&gt; tool you built, you may have noticed that distributing it on macOS isn’t as straightforward as on Linux. On Linux, you can just use the &lt;a href="https://crystal-lang.org/reference/1.17/guides/static_linking.html#musl-libc" rel="noopener noreferrer"&gt;official Docker image with musl&lt;/a&gt; to build fully static binaries.&lt;/p&gt;

&lt;p&gt;But macOS is different. Its design &lt;a href="https://crystal-lang.org/reference/1.17/guides/static_linking.html#macos" rel="noopener noreferrer"&gt;doesn’t allow fully static linking&lt;/a&gt;, so—just like with Rust or Go—you end up with binaries that must dynamically link to system libraries. These are what we call &lt;em&gt;portable binaries&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;By default, Crystal binaries on macOS depend on Homebrew libraries like &lt;code&gt;libgc&lt;/code&gt;, &lt;code&gt;libevent&lt;/code&gt;, and &lt;code&gt;libpcre&lt;/code&gt;. That’s not really portable. In this post, I’ll show you how to avoid those dependencies and build more portable binaries for macOS using GitHub Actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Crystal Resolves Libraries
&lt;/h2&gt;

&lt;p&gt;Crystal looks for libraries in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;CRYSTAL_LIBRARY_PATH&lt;/code&gt; environment variable&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ldflags&lt;/code&gt; from the &lt;code&gt;@[Link]&lt;/code&gt; annotation&lt;/li&gt;
&lt;li&gt;pkg-config&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Tries the specified &lt;code&gt;pkg_config&lt;/code&gt; name&lt;/li&gt;
&lt;li&gt;Falls back to the library name&lt;/li&gt;
&lt;li&gt;Only if both fail does it use a plain &lt;code&gt;-l&lt;/code&gt; flag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s the catch: even if you pass static libraries via &lt;code&gt;--link-flags&lt;/code&gt;, pkg-config runs first. If it succeeds, it usually chooses shared libraries—and ignores the static ones you gave.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workarounds
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Method 1: Use Symlinks
&lt;/h3&gt;

&lt;p&gt;One way around pkg-config is to symlink the static libraries and link them directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;brew install libgc pcre2&lt;/span&gt;
&lt;span class="s"&gt;ln -s $(brew ls libgc | grep libgc.a) .&lt;/span&gt;
&lt;span class="s"&gt;ln -s $(brew ls pcre2 | grep libpcre2-8.a) .&lt;/span&gt;
&lt;span class="s"&gt;shards build --link-flags="-L $(pwd) $(pwd)/libgc.a $(pwd)/libpcre2-8.a" --release&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 2: Disable PKG_CONFIG_PATH
&lt;/h3&gt;

&lt;p&gt;Another trick is to simply disable pkg-config so it can’t interfere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;brew install libgc pcre2&lt;/span&gt;
&lt;span class="s"&gt;unset PKG_CONFIG_PATH&lt;/span&gt;
&lt;span class="s"&gt;shards build --link-flags="$(brew ls libgc | grep libgc.a) $(brew ls pcre2 | grep libpcre2-8.a)" --release&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combining both methods is the most reliable -- especially for libraries like &lt;code&gt;libcrypto&lt;/code&gt; and &lt;code&gt;libssl&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Keep in Mind
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;latest-macos&lt;/code&gt; runner gives you an Apple Silicon (Arm) binary&lt;/li&gt;
&lt;li&gt;For Intel builds, use the &lt;code&gt;macos-13&lt;/code&gt; runner&lt;/li&gt;
&lt;li&gt;On some systems, macOS security may require users to &lt;a href="https://support.apple.com/102445" rel="noopener noreferrer"&gt;manually approve your binary&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Alternative: Homebrew Tap
&lt;/h2&gt;

&lt;p&gt;If you want the easiest experience for users, publishing a Homebrew tap is the way to go. That way, they can build your tool from source and let Homebrew handle dependencies.&lt;/p&gt;

&lt;p&gt;Still, prebuilt binaries are handy. With the approaches above, you can distribute Crystal binaries on macOS much like you would with Rust.&lt;/p&gt;




&lt;p&gt;That’s it for today. How about sharing the Crystal tool you built over the weekend?&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Writing Inline Assembly in the Crystal Programming Language</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 20 Jun 2025 04:17:04 +0000</pubDate>
      <link>https://dev.to/kojix2/writing-inline-assembly-in-the-crystal-programming-language-d9a</link>
      <guid>https://dev.to/kojix2/writing-inline-assembly-in-the-crystal-programming-language-d9a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you want to make your code run significantly faster, or just want to explore how computers work at a lower level, you might find yourself curious about writing instructions directly for the CPU. In Crystal, you can do this using &lt;strong&gt;inline assembly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Crystal is a programming language built on top of the LLVM compiler infrastructure. Thanks to this, it can access many of LLVM's powerful features. For low-level programming, Crystal provides both &lt;code&gt;Intrinsic&lt;/code&gt; functions and the &lt;code&gt;asm&lt;/code&gt; syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;asm&lt;/code&gt; Syntax
&lt;/h2&gt;

&lt;p&gt;Crystal supports writing inline assembly using the &lt;code&gt;asm&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;You can find the &lt;a href="https://crystal-lang.org/reference/1.16/syntax_and_semantics/asm.html" rel="noopener noreferrer"&gt;official documentation here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The basic syntax is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;asm("template" : outputs : inputs : clobbers : flags)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;template&lt;/code&gt; — Assembly code using LLVM’s integrated assembler syntax&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;outputs&lt;/code&gt; — Output operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inputs&lt;/code&gt; — Input operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;clobbers&lt;/code&gt; — Registers that may be modified&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flags&lt;/code&gt; — Optional flags (e.g., &lt;code&gt;"intel"&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This colon-separated syntax is quite unusual in Crystal and comes from GCC's inline assembly syntax.&lt;/p&gt;

&lt;p&gt;Let’s look at some examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  NOP Instruction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"nop"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting a Value Using an Output Operand
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov $$10, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;$$10&lt;/code&gt; is an immediate literal value, and &lt;code&gt;$0&lt;/code&gt; is a placeholder for the output operand.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;uninitialized Int32&lt;/code&gt; is optional; initializing with &lt;code&gt;dst = 0&lt;/code&gt; works as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using an Input Operand
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov $1, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Multiple Input Operands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"add $2, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Multiple Output Operands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
&lt;span class="n"&gt;dst2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"
  mov $$10, $0
  mov $$20, $1"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst1&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Intel Syntax
&lt;/h3&gt;

&lt;p&gt;You can also use Intel-style syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov dword ptr [$0], 10"&lt;/span&gt; &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"intel"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Intrinsics
&lt;/h2&gt;

&lt;p&gt;For relatively simple operations, LLVM provides &lt;strong&gt;intrinsics&lt;/strong&gt;. These functions are highly optimized, platform-independent, and often compatible with Crystal’s interpreter. However, for most basic operations, Crystal's standard library already provides efficient implementations, so using intrinsics does not always yield performance benefits.&lt;/p&gt;

&lt;p&gt;Available intrinsics are defined in the &lt;a href="https://crystal-lang.org/api/Intrinsics.html" rel="noopener noreferrer"&gt;&lt;code&gt;Intrinsics&lt;/code&gt;&lt;/a&gt; module.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Intrinsic Functions
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;memcpy&lt;/code&gt; — Copy memory
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_u8&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;dest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0_u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Copied: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;memmove&lt;/code&gt; — Move memory with overlap support
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_u8&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memmove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Moved: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;memset&lt;/code&gt; — Initialize memory
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0_u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="n"&gt;_u8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Set: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;debugtrap&lt;/code&gt; — Trigger debugger trap
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debugtrap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;pause&lt;/code&gt; — CPU pause (works on x86/x64 and AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pause&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is often used internally in Crystal’s &lt;code&gt;Mutex&lt;/code&gt; or &lt;code&gt;SpinLock&lt;/code&gt; implementations.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;read_cycle_counter&lt;/code&gt; — Read the CPU cycle counter
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_cycle_counter&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Cycles: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To observe it in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="kp"&gt;loop&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_cycle_counter&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Cycles: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bit Manipulation Intrinsics
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Bit Reversal
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bitreverse8&lt;/code&gt;, &lt;code&gt;bitreverse16&lt;/code&gt;, &lt;code&gt;bitreverse32&lt;/code&gt;, &lt;code&gt;bitreverse64&lt;/code&gt;, &lt;code&gt;bitreverse128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b1101001&lt;/span&gt;&lt;span class="n"&gt;_u8&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitreverse8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Reversed: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10010110&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Byte Swap
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bswap16&lt;/code&gt;, &lt;code&gt;bswap32&lt;/code&gt;, &lt;code&gt;bswap64&lt;/code&gt;, &lt;code&gt;bswap128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x12345678&lt;/span&gt;&lt;span class="n"&gt;_u32&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bswap32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Swapped: 0x&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 0x78563412&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Population Count
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;popcount8&lt;/code&gt;, &lt;code&gt;popcount16&lt;/code&gt;, &lt;code&gt;popcount32&lt;/code&gt;, &lt;code&gt;popcount64&lt;/code&gt;, &lt;code&gt;popcount128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b11010110&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popcount32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Bit count: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Count Leading Zeros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;countleading8&lt;/code&gt;, &lt;code&gt;countleading16&lt;/code&gt;, &lt;code&gt;countleading32&lt;/code&gt;, &lt;code&gt;countleading64&lt;/code&gt;, &lt;code&gt;countleading128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b00001111&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;countleading32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Leading zeros: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Count Trailing Zeros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;counttrailing8&lt;/code&gt;, &lt;code&gt;counttrailing16&lt;/code&gt;, &lt;code&gt;counttrailing32&lt;/code&gt;, &lt;code&gt;counttrailing64&lt;/code&gt;, &lt;code&gt;counttrailing128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b11110000&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counttrailing32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Trailing zeros: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Crystal still lacks extensive documentation in many languages, but &lt;a href="https://deepwiki.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;DeepWiki&lt;/a&gt; is a reliable source for answers to most questions. This article is based on what I’ve learned from DeepWiki, and all code examples have been tested to ensure they work correctly. I highly recommend it.&lt;/p&gt;

&lt;p&gt;That’s all for now — happy hacking with Crystal!&lt;/p&gt;




&lt;p&gt;This post was translated from Japanese to English by ChatGPT. &lt;br&gt;
Click &lt;a href="https://qiita.com/kojix2/items/08c1a6a9d32f15f5a921" rel="noopener noreferrer"&gt;here&lt;/a&gt; to see the original post.&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>assembly</category>
      <category>llvm</category>
    </item>
  </channel>
</rss>
