<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kojix2</title>
    <description>The latest articles on DEV Community by kojix2 (@kojix2).</description>
    <link>https://dev.to/kojix2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F129786%2F5f8821af-b2f8-4de9-8024-3a7be3c4cd16.png</url>
      <title>DEV Community: kojix2</title>
      <link>https://dev.to/kojix2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kojix2"/>
    <language>en</language>
    <item>
      <title>Why Is Crystal Compilation So Slow?</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 08 Dec 2025 13:30:24 +0000</pubDate>
      <link>https://dev.to/kojix2/why-is-crystal-compilation-so-slow-29n0</link>
      <guid>https://dev.to/kojix2/why-is-crystal-compilation-so-slow-29n0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The Crystal programming language is notorious for its slow compilation times.&lt;/p&gt;

&lt;p&gt;But have you ever wondered where Crystal actually spends most of its compilation time?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sxp3hrdnh2o9yq4ez9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sxp3hrdnh2o9yq4ez9l.png" alt="overview" width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure: Crystal uses LLVM as its backend&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crystal Compilation Pipeline
&lt;/h2&gt;

&lt;p&gt;The Crystal compiler's compilation process consists of the following stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;new_program&lt;/strong&gt; - Creating the program object&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parse&lt;/strong&gt; - Lexical analysis and parsing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;semantic&lt;/strong&gt; - Semantic analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;codegen&lt;/strong&gt; - Generating object files
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;Crystal&lt;/span&gt;
  &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Compiler&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Source&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;output_filename&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Result&lt;/span&gt;
      &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_a?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="c1"&gt;# 1 new_program&lt;/span&gt;
      &lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_program&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="c1"&gt;# 2 parse&lt;/span&gt;
      &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;

      &lt;span class="c1"&gt;# 3 semantic&lt;/span&gt;
      &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;semantic&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;cleanup: &lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;no_cleanup?&lt;/span&gt;

      &lt;span class="c1"&gt;# 4 codegen&lt;/span&gt;
      &lt;span class="n"&gt;units&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;codegen&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_filename&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="vi"&gt;@no_codegen&lt;/span&gt;

      &lt;span class="c1"&gt;# 5 cleanup&lt;/span&gt;
      &lt;span class="c1"&gt;# ... omission ...&lt;/span&gt;
      &lt;span class="no"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="n"&gt;program&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, linking is performed by the standard linker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Command-Line Options for Compilation Statistics
&lt;/h3&gt;

&lt;p&gt;Crystal provides a command-line option that displays compilation time statistics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build &lt;span class="nt"&gt;-s&lt;/span&gt; hoge.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, this method doesn't show the execution time of native LLVM functions, which was insufficient for this article's investigation.&lt;/p&gt;

&lt;p&gt;To get to the heart of the matter, I used &lt;a href="https://gist.github.com/kojix2/bf758a30ded3ea9aff9d3151df6a59c1" rel="noopener noreferrer"&gt;print debugging&lt;/a&gt; to measure the compilation time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Native LLVM Functions Called During Codegen
&lt;/h3&gt;

&lt;p&gt;During the codegen stage, the following native LLVM functions are called:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;LibLLVM.run_passes&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Applies optimization passes to LLVM IR&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;LibLLVM.target_machine_emit_to_file&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Generates object files&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I measured the execution time of these functions using &lt;a href="https://gist.github.com/kojix2/bf758a30ded3ea9aff9d3151df6a59c1" rel="noopener noreferrer"&gt;print debugging&lt;/a&gt; as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Here are the results from compiling the Crystal compiler itself:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Time (seconds)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;new_program&lt;/td&gt;
&lt;td&gt;0.000388207&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;parse&lt;/td&gt;
&lt;td&gt;0.000065000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;semantic&lt;/td&gt;
&lt;td&gt;12.552620028&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;codegen&lt;/td&gt;
&lt;td&gt;355.245409133&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- LibLLVM.run_passes&lt;/td&gt;
&lt;td&gt;252.340241198&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- LibLLVM.target_machine_emit_to_file&lt;/td&gt;
&lt;td&gt;93.280652845&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cleanup&lt;/td&gt;
&lt;td&gt;0.000013180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;total&lt;/td&gt;
&lt;td&gt;367.798495548&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let me visualize this with a bar chart:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb6g781iellwm49rnit0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffb6g781iellwm49rnit0.png" alt="Compilation time breakdown" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;NOTE: This graph is from the original article and may differ slightly from the latest compiler.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Were the results what you expected?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lexical analysis and parsing take virtually no time!&lt;/li&gt;
&lt;li&gt;Semantic analysis (including type inference) also takes relatively little time!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In fact, the vast majority of the compilation time is spent in codegen, specifically in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;LibLLVM.run_passes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LibLLVM.target_machine_emit_to_file&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are external LLVM function calls that happen outside of Crystal's control!&lt;/p&gt;

&lt;p&gt;In this case of building the Crystal compiler itself with &lt;code&gt;--release&lt;/code&gt;, the majority of compilation time was spent on LLVM optimization and code generation.&lt;/p&gt;

&lt;p&gt;This might be a somewhat surprising result, don't you think?&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Speed Up the Crystal Compiler
&lt;/h3&gt;

&lt;p&gt;The parts of the Crystal compiler implemented in Crystal—namely lexical analysis, parsing, and semantic analysis—are already sufficiently fast. This means that to achieve further speedups, we would need hardcore approaches such as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Introducing parallelization even in release builds&lt;/li&gt;
&lt;li&gt;Optimizing LLVM itself (specifically for Crystal)&lt;/li&gt;
&lt;li&gt;Improving Crystal to generate LLVM IR that's easier for LLVM to process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, since these approaches aren't very practical for everyday use, let me introduce a more accessible method:&lt;/p&gt;

&lt;h3&gt;
  
  
  Use &lt;code&gt;-O3&lt;/code&gt; Instead of &lt;code&gt;--release&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;In the Crystal compiler, specifying &lt;code&gt;--release&lt;/code&gt; is equivalent to specifying both &lt;code&gt;-O3&lt;/code&gt; and &lt;code&gt;--single-module&lt;/code&gt;. If you're willing to sacrifice some optimization, you can specify only the &lt;code&gt;-O3&lt;/code&gt; option, which enables parallelization and can speed up compilation in many cases.&lt;/p&gt;

&lt;p&gt;From here on, there's a bit of a speculative element to the discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Crystal Doesn't Have Incremental Compilation or Shared Library Support
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Crystal's &lt;code&gt;--release&lt;/code&gt; Mode Includes &lt;code&gt;--single-module&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Crystal struggles with splitting code into separate compilation units and reusing the results. In particular, &lt;code&gt;--release&lt;/code&gt; builds enable &lt;code&gt;--single-module&lt;/code&gt;, which compiles everything into one massive LLVM module for optimization.&lt;/p&gt;

&lt;p&gt;For comparison, Rust performs separate compilation for each crate even with &lt;code&gt;--release&lt;/code&gt;. In Rust, you need to explicitly use &lt;code&gt;-C lto=fat&lt;/code&gt; to get behavior similar to Crystal's, where the entire LLVM IR is optimized together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crystal's Weak Caching Mechanism
&lt;/h3&gt;

&lt;p&gt;Crystal does have a mechanism that caches LLVM bitcode files (.bc) and object files on a per-type basis during normal builds, and can reuse object files only when the bitcode is completely unchanged.&lt;/p&gt;

&lt;p&gt;This allows the compiler to skip the expensive object file generation step in some cases.&lt;/p&gt;

&lt;p&gt;However, even in such cases, lexical analysis, parsing, and semantic analysis cannot be skipped. The comparison only happens after generating .bc files. And as we'll discuss later, cases where the bitcode is completely unchanged are actually quite rare.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crystal Is a Statically-Typed Language Where the Caller Determines Types
&lt;/h3&gt;

&lt;p&gt;Why can't Crystal split packages into multiple LLVM IR modules, precompile them, and reuse the results?&lt;/p&gt;

&lt;p&gt;The main reason is that Crystal has strong type inference and union types, and the concrete types of methods change depending on the calling context.&lt;/p&gt;

&lt;p&gt;Crystal is an unusual statically-typed language where &lt;strong&gt;the caller determines the types&lt;/strong&gt;, enabling duck typing. However, the trade-off is that type signatures need to be inferred with every compilation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Type IDs Change with Each Compilation
&lt;/h3&gt;

&lt;p&gt;The Crystal compiler assigns a number to every class to resolve types. With each compilation, every type that appears gets assigned a "number." Let's say class A gets assigned the number "10" in one compilation. If you make a small change to the code and recompile, "10" might be assigned to a different class. Linking object files created this way causes type inconsistencies and fails, because conditional branches based on types won't work correctly.&lt;/p&gt;

&lt;p&gt;Additionally, when loading multiple Crystal shared libraries simultaneously, there's the problem of runtime functions being multiply defined.&lt;/p&gt;

&lt;p&gt;This makes it difficult for Crystal to split code into parts, precompile them, and reuse them later.&lt;/p&gt;

&lt;p&gt;But is this an inherent characteristic of the Crystal language? Let's consider this from a more social context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crystal Language Community and Resource Constraints
&lt;/h2&gt;

&lt;p&gt;Crystal is known as a language with Ruby-like concise syntax that delivers excellent performance.&lt;/p&gt;

&lt;p&gt;However, the Crystal development team has limited resources. While there is a dedicated team at Manas.Tech and community contributors worldwide, the resources are still limited compared to large corporations.&lt;/p&gt;

&lt;p&gt;For instance, imagine if Apple were developing Crystal.&lt;/p&gt;

&lt;p&gt;Apple engineers might make changes to clang/LLVM itself to significantly improve compilation speed.&lt;/p&gt;

&lt;p&gt;Or, like Swift, they might define a proper ABI and create an intermediate language or binary format well-suited to Crystal. Similar to how Swift has SIL (Swift Intermediate Language) as an intermediate representation before converting to LLVM IR, Crystal could have its own optimized intermediate language. This would enable comparing modules at that stage, resolving types, and generating object files from there. (Though I'm not entirely sure if this is possible within the LLVM framework.)&lt;/p&gt;

&lt;p&gt;However, the Crystal compiler we have isn't like that. It generates monolithic, massive LLVM IR and delegates all optimization to LLVM. For package management, downloading source code directly from GitHub is the mainstream approach.&lt;/p&gt;

&lt;p&gt;There still seems to be room for improvement.&lt;/p&gt;

&lt;p&gt;The characteristic of slow compilation but fast execution is not purely a linguistic characteristic of Crystal, but also stems from the resource constraints of the Crystal development team. In other words, if significant resources were invested in development in the future, these issues could potentially be improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Designing an ABI specification or intermediate language for Crystal is extremely difficult. However, if someone achieves this, it could become Crystal 2.0 or Crystal 3.0.&lt;/p&gt;

&lt;p&gt;Even without going that far, finding ways to split the generated LLVM IR into multiple modules, or mangling function names and global variables, would represent significant progress.&lt;/p&gt;

&lt;p&gt;Crystal doesn't have as vibrant a library ecosystem as some other languages. While the reasons aren't entirely clear, as we improve the environment for code reuse, techniques for improving compilation speed may also develop.&lt;/p&gt;

&lt;p&gt;That's all for this article. Thank you for reading to the end!&lt;/p&gt;




&lt;p&gt;This article was originally written in 2024 and revised in December 2025. It was translated from Japanese to English using Claude Sonnet.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>A Practical Guide to Parallel Programming in Crystal (2025)</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 21 Nov 2025 07:24:48 +0000</pubDate>
      <link>https://dev.to/kojix2/a-practical-guide-to-parallel-programming-in-crystal-2025-1lbg</link>
      <guid>https://dev.to/kojix2/a-practical-guide-to-parallel-programming-in-crystal-2025-1lbg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article is based on content created by kojix2 (a human) alternately calling DeepWiki and ChatGPT, but kojix2 (a human) has reviewed, edited, and proofread the entire text. The article was translated from Japanese to English using Claude. If you find any mistakes, &lt;strong&gt;please comment&lt;/strong&gt;. Thank you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Crystal's parallel processing is based on a hybrid model that primarily uses &lt;strong&gt;Fiber (cooperative and lightweight)&lt;/strong&gt; and utilizes &lt;strong&gt;Thread (OS threads)&lt;/strong&gt; when necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ExecutionContext&lt;/strong&gt;, which has been rapidly developed since around 2024-2025, provides a new abstraction layer for safely spreading Fibers across multiple threads.&lt;/p&gt;

&lt;p&gt;This article organizes the latest parallel execution model in Crystal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building with Parallel Execution Enabled
&lt;/h2&gt;

&lt;p&gt;As of November 19, 2025, you need to use the following two flags:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-Dpreview_mt&lt;/code&gt;: Enables parallel execution of Fibers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-Dexecution_context&lt;/code&gt;: Enables the use of ExecutionContext
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build &lt;span class="nt"&gt;-Dpreview_mt&lt;/span&gt; &lt;span class="nt"&gt;-Dexecution_context&lt;/span&gt; program.cr 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While Crystal's parallel execution is in preview, it has been over 6 years since its release and works without issues in many cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Crystal's Concurrency and Parallelism
&lt;/h2&gt;

&lt;p&gt;Crystal has five major execution models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Execution Unit&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fiber (default)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber (lightweight thread)&lt;/td&gt;
&lt;td&gt;Cooperative, automatic switching on I/O, lightweight&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Concurrent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber group&lt;/td&gt;
&lt;td&gt;Sequential execution on 1 thread (concurrent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Parallel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fiber group&lt;/td&gt;
&lt;td&gt;Execution on multiple threads (parallel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ExecutionContext::Isolated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 Fiber + 1 dedicated thread&lt;/td&gt;
&lt;td&gt;For GUI loops and blocking FFI calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Thread&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OS thread&lt;/td&gt;
&lt;td&gt;For handling low-level operations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The standard design is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Fiber&lt;/strong&gt; as the basis&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;ExecutionContext&lt;/strong&gt; only where parallelism is needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cooperative Scheduling of Fiber and I/O
&lt;/h2&gt;

&lt;p&gt;Fiber is a cooperative execution model that has existed for a while. By default (when parallel execution is disabled), switching occurs only when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;I/O&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sleep&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Channel&lt;/code&gt; &lt;code&gt;receive/send&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Fiber.yield&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are triggered. (&lt;code&gt;Fiber.suspend&lt;/code&gt; is called and the Fiber is suspended.)&lt;/p&gt;

&lt;p&gt;The basic approach in Crystal is to put I/O-bound processing on Fibers.&lt;/p&gt;

&lt;p&gt;Each Fiber has its own stack memory. The stack has a virtual size of 8MiB, but it's only reserved, and actual memory usage starts from 4KiB.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a "Stack" in Crystal?
&lt;/h3&gt;

&lt;p&gt;When reading Crystal documentation, you'll encounter the word "stack." Note that this differs from the general meaning of "stack" - it refers to a "memory region that behaves like a stack," which is actually memory allocated from the OS heap.&lt;/p&gt;

&lt;p&gt;What is placed on the stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Value types: &lt;code&gt;Struct&lt;/code&gt;, &lt;code&gt;Tuple&lt;/code&gt;, &lt;code&gt;StaticArray&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Primitive types: &lt;code&gt;Int32&lt;/code&gt;, &lt;code&gt;Float64&lt;/code&gt;, &lt;code&gt;Bool&lt;/code&gt;, &lt;code&gt;Char&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Pointers to reference types: &lt;code&gt;Array&lt;/code&gt;, &lt;code&gt;Hash&lt;/code&gt;, etc. (The reference type objects themselves are placed on the heap, but the pointers to them are placed on the stack)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Values placed on the stack are not directly targeted by GC, but they are scanned during GC execution to prevent heap objects referenced by stack variables from being mistakenly collected.&lt;/p&gt;

&lt;p&gt;As described later, the key point is that when captured by closures like &lt;code&gt;spawn do end&lt;/code&gt;, the above value types are exceptionally placed on the heap and become accessible from other threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background Knowledge: Thread / Scheduler / Fiber
&lt;/h2&gt;

&lt;p&gt;In Crystal, each thread has its own &lt;code&gt;Crystal::Scheduler&lt;/code&gt; that manages the fibers to be executed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Thread Creation and Initialization
&lt;/h3&gt;

&lt;p&gt;The main thread is automatically created by the OS when the program starts. Subsequently, when &lt;code&gt;Thread.current&lt;/code&gt; is called, a Thread object for the main thread is created. The stack address of the main thread is obtained with the &lt;code&gt;stack_address&lt;/code&gt; method. This is the actual thread stack allocated by the OS when the process starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Fiber Creation
&lt;/h3&gt;

&lt;p&gt;When the &lt;code&gt;Thread&lt;/code&gt; object is initialized, the main Fiber is created simultaneously. The main Fiber uses a special constructor &lt;code&gt;Fiber.new(stack : Void*, thread)&lt;/code&gt; to utilize the OS thread stack. Unlike normal Fibers, &lt;code&gt;makecontext&lt;/code&gt; is not called, and it uses the already running context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lazy Initialization of Scheduler
&lt;/h3&gt;

&lt;p&gt;The main thread's scheduler is initialized when &lt;code&gt;Thread#scheduler&lt;/code&gt; is called. The scheduler has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@event_loop&lt;/code&gt;: Platform-specific event loop&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@stack_pool&lt;/code&gt;: Fiber stack reuse pool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@runnable&lt;/code&gt;: Queue of runnable fibers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@main&lt;/code&gt;: Thread's main fiber&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Default Thread Configuration
&lt;/h3&gt;

&lt;p&gt;Without using &lt;code&gt;ExecutionContext&lt;/code&gt; and &lt;code&gt;preview_mt&lt;/code&gt;, only the main thread exists. The main thread has its own &lt;code&gt;Crystal::Scheduler&lt;/code&gt; instance, which manages all fibers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stack Allocation for New Fibers
&lt;/h3&gt;

&lt;p&gt;When a new Fiber is created, stack memory is obtained from Fiber::StackPool. When a Fiber terminates, its stack is returned to the pool through StackPool.release for reuse by the next Fiber. Stack allocation reserves 8MiB of virtual address space. Only the bottom page of the stack (4KiB) is committed to physical memory. When the stack grows and reaches a guard page, that page's guard status is removed and a new guard page is committed. This continues until reserved pages run out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel Execution with ExecutionContext
&lt;/h2&gt;

&lt;p&gt;ExecutionContext is a "virtual thread group" that executes Fibers together.&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Concurrent
&lt;/h3&gt;

&lt;p&gt;This is the same concurrent execution as traditional Fibers. It's safe and easy to handle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Concurrent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"workers"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Only one Fiber executes at a time&lt;/strong&gt; within the context&lt;/li&gt;
&lt;li&gt;Therefore, access contention to shared variables doesn't occur (however, using Mutex/Atomic is considered safer as "recommended safety")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Suitable when parallelization is unnecessary but you want to use Fibers.&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Parallel
&lt;/h3&gt;

&lt;p&gt;Parallel execution on multiple threads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"workers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Changing parallel size during execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Each thread runs its own scheduler

&lt;ul&gt;
&lt;li&gt;The scheduler is an instance of the &lt;code&gt;Fiber::ExecutionContext::Parallel::Scheduler&lt;/code&gt; class, responsible for executing individual Fibers. It has a local queue and manages runnable Fibers. It searches for and executes Fibers in the main loop (run_loop).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Fibers within the context are moved to and executed on arbitrary threads

&lt;ul&gt;
&lt;li&gt;When a Fiber moves between threads, only the execution context (registers and stack pointer) actually moves. The Fiber's stack memory (heap from the OS perspective) does not move. This memory region is fixed during the Fiber's lifetime. When a Fiber resumes on a new thread, the saved stack pointer is loaded and points to the original stack memory region.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Due to parallelism, &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; is mandatory for shared mutable state.

&lt;ul&gt;
&lt;li&gt;Local variables and instance variables (pointers) captured from the closure that spawns the Fiber are placed in a closure data structure allocated on the heap, and that pointer moves with the Fiber. This means that value type local variables (like StaticArray) that would normally be allocated on the stack are exceptionally allocated on the heap.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Parallel is the central feature of Crystal's goal of "safe and fast parallel execution."&lt;/p&gt;

&lt;h3&gt;
  
  
  ExecutionContext::Isolated
&lt;/h3&gt;

&lt;p&gt;1 Fiber = 1 dedicated thread&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;gui&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Isolated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"GUI"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="no"&gt;Gtk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="n"&gt;gui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;A single Fiber monopolizes an OS thread&lt;/li&gt;
&lt;li&gt;Safe to use blocking I/O (e.g., GUI event loops, blocking FFI calls)&lt;/li&gt;
&lt;li&gt;Cannot add additional spawns within the context (they are forced to go to the default context)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Suitable for main loops of GUI applications and FFI that calls C functions with I/O bundle blocking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Default Fiber Without Using ExecutionContext
&lt;/h3&gt;

&lt;p&gt;When ExecutionContext is not specified, Fibers execute in the default ExecutionContext (&lt;code&gt;Fiber::ExecutionContext.default&lt;/code&gt;). The default ExecutionContext is Parallel, but since the initial parallelism is set to 1, it behaves the same as Concurrent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Basic Patterns of Channel and WaitGroup
&lt;/h2&gt;

&lt;p&gt;Crystal's parallel processing is based on a &lt;strong&gt;Channel + WaitGroup&lt;/strong&gt; pattern similar to Go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Producer-Consumer (Parallel)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;consumers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Fiber&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ExecutionContext&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"consumers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;channel&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Channel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;wg&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;WaitGroup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Atomic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;consumers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive?&lt;/span&gt;
      &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;ensure&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;
&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;

&lt;span class="nb"&gt;p&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 523776&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Communication via Channel&lt;/li&gt;
&lt;li&gt;Synchronization via WaitGroup&lt;/li&gt;
&lt;li&gt;Safe updates of shared state via Atomic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the basic form of parallel execution in Crystal.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;32 consumer Fibers executing in parallel atomically add 1024 integer values (0-1023) received from the channel and calculate their sum (523776)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Protection of Shared Variables in Concurrent
&lt;/h3&gt;

&lt;p&gt;Concurrent is serial execution so contention doesn't occur, but Crystal officially states that using &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; is preferable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Atomic / Mutex / SpinLock
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Atomic
&lt;/h3&gt;

&lt;p&gt;A variable that can safely read and write values even when accessed simultaneously from multiple threads, a basic synchronization primitive for preventing race conditions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Directly mapped to LLVM atomic instructions&lt;/li&gt;
&lt;li&gt;compare_and_set, add, sub, get, set&lt;/li&gt;
&lt;li&gt;Same memory orders as C/C++: Acquire / Release / Relaxed, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Types that cannot be used with Atomic include value types such as structures (&lt;code&gt;Struct&lt;/code&gt;) and &lt;code&gt;StaticArray&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mutex
&lt;/h3&gt;

&lt;p&gt;A lock that protects code regions (critical sections) that must not be executed simultaneously by multiple Fibers, controlling so that only one Fiber can execute at a time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fiber-safe&lt;/li&gt;
&lt;li&gt;Three modes: Checked / Reentrant / Unchecked&lt;/li&gt;
&lt;li&gt;Re-entry prohibited by default (safe)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;mutex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;  
&lt;span class="n"&gt;shared_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;  

&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;  
  &lt;span class="n"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
    &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;synchronize&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
      &lt;span class="c1"&gt;# Only one Fiber executes at a time within this block  &lt;/span&gt;
      &lt;span class="n"&gt;shared_array&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  
      &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;  
  &lt;span class="k"&gt;end&lt;/span&gt;  
&lt;span class="k"&gt;end&lt;/span&gt;  

&lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;  
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;shared_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example of manually locking/unlocking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;mutex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;  
&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  

&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
  &lt;span class="n"&gt;spawn&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;  
    &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lock&lt;/span&gt;  
    &lt;span class="k"&gt;begin&lt;/span&gt;  
      &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  
      &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;  
    &lt;span class="k"&gt;ensure&lt;/span&gt;  
      &lt;span class="n"&gt;mutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unlock&lt;/span&gt;  &lt;span class="c1"&gt;# Always unlock  &lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;  
  &lt;span class="k"&gt;end&lt;/span&gt;  
&lt;span class="k"&gt;end&lt;/span&gt;  

&lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;  
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SpinLock
&lt;/h3&gt;

&lt;p&gt;A lightweight lock specialized for very short-term locks. It continues to use CPU while waiting (spinning), so it's unsuitable for long-term locks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For very short critical sections&lt;/li&gt;
&lt;li&gt;Only effective with preview_mt / win32&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SpinLock is used in implementations such as &lt;code&gt;Crystal::Scheduler&lt;/code&gt;, &lt;code&gt;Crystal::ThreadLocalValue&lt;/code&gt;, &lt;code&gt;Crystal::Once&lt;/code&gt;, &lt;code&gt;Mutex&lt;/code&gt;, &lt;code&gt;WaitGroup&lt;/code&gt;, &lt;code&gt;EventLoop::Polling&lt;/code&gt;, and &lt;code&gt;Fiber::StackPool&lt;/code&gt;. There are almost no scenarios where users would directly use SpinLock in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Areas to Be Careful About in the Standard Library
&lt;/h2&gt;

&lt;p&gt;The following are areas in the Crystal standard library that may not guarantee complete thread safety and require caution.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Qualifies as a Shared Variable Subject to Contention?
&lt;/h3&gt;

&lt;p&gt;While we've used the term "shared variable," Crystal doesn't have user-accessible global variables, so the most typical shared variable is a class variable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Class variables: Always shared variables (determined by variable type)&lt;/li&gt;
&lt;li&gt;Instance variables and local variables: Determined by &lt;strong&gt;whether they are referenced from multiple Fibers or threads when spawned&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If captured by spawn, local variables can also become shared variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  ENV
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The safety of Unix's getenv/setenv/unsetenv is environment-dependent&lt;/li&gt;
&lt;li&gt;Parallel modification is not recommended&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also discussed in the Crystal Forum:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29" rel="noopener noreferrer"&gt;https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Class Variables
&lt;/h3&gt;

&lt;p&gt;In Crystal, you can use the &lt;code&gt;@[ThreadLocal]&lt;/code&gt; annotation to make class variables thread-local.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Foo&lt;/span&gt;
  &lt;span class="nd"&gt;@[ThreadLocal]&lt;/span&gt;
  &lt;span class="vc"&gt;@@var&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;
    &lt;span class="vc"&gt;@@var&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, each thread has an independent copy of &lt;code&gt;@@var&lt;/code&gt;, so changing the value in one thread doesn't affect other threads.&lt;/p&gt;

&lt;p&gt;Class variables without &lt;code&gt;@[ThreadLocal]&lt;/code&gt; are shared. In this case, you need to use &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; for parallel updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  IO (File, Socket, STDOUT/ERR)
&lt;/h3&gt;

&lt;p&gt;Safety may not be guaranteed when simultaneously operating on the same IO from multiple threads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logger
&lt;/h3&gt;

&lt;p&gt;Logger also uses IO internally. Writing to the same Logger from multiple threads may not be safe.&lt;/p&gt;

&lt;h3&gt;
  
  
  Report Any Issues You Find
&lt;/h3&gt;

&lt;p&gt;Crystal is a programming language with far fewer users compared to languages like Python and Java. User reports are very valuable and precious. It's important to continue improving the language and libraries by actively reporting bugs to Crystal Forum and GitHub issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cases Where Thread Should Be Used
&lt;/h2&gt;

&lt;p&gt;Thread directly represents the OS's native thread. It can be used when low-level control is needed.&lt;/p&gt;

&lt;p&gt;There are almost no cases where you should use Thread directly without using ExecutionContext.&lt;br&gt;
It may be an option in cases such as:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Want to parallelize compute-intensive tasks&lt;/li&gt;
&lt;li&gt;FFI is blocking and cannot suspend Fiber (however, if the FFI function is CPU-intensive processing, blocking is considered desirable behavior)&lt;/li&gt;
&lt;li&gt;C library requires thread-local initialization&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Using Thread::Channel enables safe communication between threads.&lt;/p&gt;

&lt;h2&gt;
  
  
  FFI (C Library Calls) and Parallel Execution
&lt;/h2&gt;

&lt;p&gt;Since C libraries are not necessarily thread-safe, following patterns like these is considered safe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrap with &lt;code&gt;Mutex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Isolate in &lt;code&gt;ExecutionContext::Isolated&lt;/code&gt; context&lt;/li&gt;
&lt;li&gt;Dedicated Thread + Thread::Channel&lt;/li&gt;
&lt;li&gt;Use ThreadLocal state&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Crystal's parallel execution is currently in the midst of major evolution. In addition to &lt;code&gt;Fiber&lt;/code&gt;, which has been used for concurrent execution in I/O-bound processing, &lt;code&gt;ExecutionContext::Parallel&lt;/code&gt; now enables full-fledged parallel processing. Using &lt;code&gt;Atomic&lt;/code&gt; / &lt;code&gt;Mutex&lt;/code&gt; / &lt;code&gt;Channel&lt;/code&gt; / &lt;code&gt;WaitGroup&lt;/code&gt;, you can build safe parallel processing similar to Go. &lt;code&gt;Execution::Isolated&lt;/code&gt; is effective for GUI / FFI. &lt;code&gt;Thread&lt;/code&gt; can be used in special cases where OS threads need to be handled directly. Note that there are ambiguous parts regarding thread safety in the standard library.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical Guidelines for Parallel Execution in Crystal
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Leave I/O to &lt;code&gt;Fiber&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;No special action needed as Crystal's I/O model is tightly integrated with &lt;code&gt;Fiber&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Use Parallel or Thread for CPU-bound tasks

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ExecutionContext::Parallel&lt;/code&gt; is the first choice.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Protect shared state with &lt;code&gt;Atomic&lt;/code&gt; or &lt;code&gt;Mutex&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Treat gray zones like &lt;code&gt;ENV&lt;/code&gt; and &lt;code&gt;Logger&lt;/code&gt; conservatively&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Test explicitly using &lt;code&gt;-Dpreview_mt&lt;/code&gt; and &lt;code&gt;-Dexecution_context&lt;/code&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This concludes the article. Thank you for reading to the end.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Notes on Building CLI and GUI tools with Crystal</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Wed, 15 Oct 2025 03:25:14 +0000</pubDate>
      <link>https://dev.to/kojix2/notes-on-building-cli-and-gui-tools-with-crystal-4pcd</link>
      <guid>https://dev.to/kojix2/notes-on-building-cli-and-gui-tools-with-crystal-4pcd</guid>
      <description>&lt;p&gt;This post is just me writing down some vague thoughts that are floating around in my head right now.&lt;/p&gt;

&lt;p&gt;Sorry if you came here expecting a well-structured tutorial — but you know, if I try to organize everything perfectly, I’ll never publish anything.&lt;/p&gt;




&lt;p&gt;Crystal originated from the Ruby community, so there are many people who want to build web applications with it.&lt;/p&gt;

&lt;p&gt;However, the Crystal programming language itself can be described as “a statically compiled language with a Ruby-like syntax and a garbage collector, somewhat like C with GC and type inference.”&lt;/p&gt;

&lt;p&gt;It’s not necessarily optimized for web applications.&lt;/p&gt;

&lt;p&gt;Personally, I wanted to use Crystal for command-line tools and GUI apps.&lt;/p&gt;

&lt;p&gt;For some reason, though, there don’t seem to be many people building CLI tools in Crystal.&lt;/p&gt;

&lt;p&gt;The ecosystem for building and distributing binaries wasn’t very well developed for a long time.&lt;/p&gt;

&lt;p&gt;That used to be a real pain, but after &lt;a href="https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if"&gt;gradually solving those issues&lt;/a&gt;, I think we’re now at the point where most CLI tools I want can be built and distributed in Crystal without much trouble.&lt;/p&gt;

&lt;p&gt;On the GUI side, the situation is similar — there aren’t many libraries available.&lt;/p&gt;

&lt;p&gt;But this isn’t unique to Crystal. GUI programming, in general, depends heavily on opaque, platform-specific APIs, which don’t always play nicely with open-source development.&lt;/p&gt;

&lt;p&gt;Still, I decided to work on it. I created &lt;a href="https://github.com/libui-ng/libui-ng" rel="noopener noreferrer"&gt;libui-ng&lt;/a&gt; bindings for both &lt;a href="https://github.com/kojix2/LibUI" rel="noopener noreferrer"&gt;Ruby&lt;/a&gt; and &lt;a href="https://github.com/kojix2/uing" rel="noopener noreferrer"&gt;Crystal&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As it turned out, &lt;a href="https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6"&gt;libui-ng doesn’t work very well with garbage-collected languages&lt;/a&gt;, but I managed to make it usable anyway.&lt;/p&gt;

&lt;p&gt;Then I got curious about &lt;a href="https://tauri.app/" rel="noopener noreferrer"&gt;Tauri&lt;/a&gt; and &lt;a href="https://www.electronjs.org" rel="noopener noreferrer"&gt;Electron&lt;/a&gt; — the now-famous WebView-based app frameworks.&lt;/p&gt;

&lt;p&gt;Personally, I can barely read JavaScript, so I had no real interest in those at first, but their popularity made me curious.&lt;/p&gt;

&lt;p&gt;Crystal also has a &lt;a href="https://github.com/naqvis/webview" rel="noopener noreferrer"&gt;WebView binding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And as I mentioned earlier, web app development in the Crystal ecosystem is quite active.&lt;/p&gt;

&lt;p&gt;So I decided to give it a try.&lt;/p&gt;

&lt;p&gt;I learned that “WebView” isn’t a single library — each OS (Windows, Linux, macOS) provides its own.&lt;/p&gt;

&lt;p&gt;Projects like &lt;a href="https://github.com/webview/webview" rel="noopener noreferrer"&gt;webview/webview&lt;/a&gt; and Tauri’s &lt;a href="https://github.com/tauri-apps/wry" rel="noopener noreferrer"&gt;wry&lt;/a&gt; act as unifying layers over these platform-specific APIs.&lt;/p&gt;

&lt;p&gt;Tauri itself uses WebView under the hood while also providing a framework to handle security and integration with Rust backends.&lt;/p&gt;

&lt;p&gt;Maybe it’s possible to use TypeScript and other frontend tools with Crystal too, but personally, I prefer the more old-fashioned approach — something like &lt;a href="https://github.com/kemalcr/kemal" rel="noopener noreferrer"&gt;Kemal&lt;/a&gt; + &lt;a href="https://crystal-lang.org/api/ECR.html" rel="noopener noreferrer"&gt;ECR&lt;/a&gt;, the “classic amateur” way.&lt;/p&gt;

&lt;p&gt;When I actually started building an app with Crystal + WebView, I discovered a few things.&lt;/p&gt;

&lt;p&gt;First, you need to pay attention to event loops and thread management.&lt;/p&gt;

&lt;p&gt;The WebView itself runs in a separate process, and at the same time you need to run a Kemal server.&lt;/p&gt;

&lt;p&gt;That means you often have to make it multithreaded and carefully manage your execution contexts or Fibers — otherwise, things simply won’t run correctly.&lt;/p&gt;

&lt;p&gt;Then there’s the build, linking, and packaging pain.&lt;/p&gt;

&lt;p&gt;I sent &lt;a href="https://github.com/naqvis/webview/pulls?q=+author%3Akojix2" rel="noopener noreferrer"&gt;a few pull requests&lt;/a&gt; to the Crystal WebView project, which helped a bit, but building on MinGW is still rough.&lt;/p&gt;

&lt;p&gt;MSVC technically works, but it’s just too tedious to deal with, so I decided to stay away from it.&lt;/p&gt;

&lt;p&gt;Bundling shared libraries is also tricky.&lt;/p&gt;

&lt;p&gt;I’d prefer to lean toward static linking whenever possible, but depending on licensing and security update concerns, it’s sometimes better to link against system or bundled shared libraries.&lt;/p&gt;

&lt;p&gt;Even if you get the build and linking sorted out, packaging is still painful — creating &lt;a href="https://en.wikipedia.org/wiki/Application_package" rel="noopener noreferrer"&gt;application packages&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Apple_Disk_Image" rel="noopener noreferrer"&gt;Apple disk images (DMG)&lt;/a&gt;, or Windows installers with &lt;a href="https://jrsoftware.org/isinfo.php" rel="noopener noreferrer"&gt;Inno Setup&lt;/a&gt;, or even &lt;code&gt;.deb&lt;/code&gt; packages.&lt;/p&gt;

&lt;p&gt;I discovered tools like &lt;a href="https://github.com/jordansissel/fpm" rel="noopener noreferrer"&gt;fpm&lt;/a&gt;, which are really useful, but in the end, I still end up asking AI to help me write custom GitHub Actions YAML and shell scripts.&lt;/p&gt;

&lt;p&gt;And then, once you finally have a working binary, Windows or macOS antivirus software will happily flag it as suspicious.&lt;/p&gt;

&lt;p&gt;Maybe for people doing this professionally, all this doesn’t sound like a big deal, but as someone doing it for fun, it’s a lot of work.&lt;br&gt;
Even so, after all the pain, I’ve started to feel like — maybe, just maybe — this setup is actually pretty cool.&lt;/p&gt;




&lt;p&gt;This post was translated from the original Japanese version using ChatGPT.&lt;br&gt;&lt;br&gt;
You can read the original post &lt;a href="https://qiita.com/kojix2/items/e9bb62e9ff9f966b36a4" rel="noopener noreferrer"&gt;here&lt;/a&gt; [JA]&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>webview</category>
    </item>
    <item>
      <title>libui and Garbage Collection - Challenges in Creating Ruby and Crystal Bindings</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 26 Sep 2025 02:15:46 +0000</pubDate>
      <link>https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6</link>
      <guid>https://dev.to/kojix2/libui-and-garbage-collection-challenges-in-creating-ruby-and-crystal-bindings-9m6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;libui is a GUI library that supports the three major operating systems: Windows, macOS, and Linux (currently, the successor project is libui-ng). Internally, it contains three different libraries that call native APIs, unified under a single &lt;code&gt;ui.h&lt;/code&gt; header file to provide similar UI functionality across all operating systems. It can also be easily used from other languages through FFI (Foreign Function Interface). While development has slowed somewhat recently, there are few similar libraries available, and libui continues to maintain its unique value.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui Bindings
&lt;/h2&gt;

&lt;p&gt;I have been creating &lt;a href="https://github.com/kojix2/libui" rel="noopener noreferrer"&gt;Ruby bindings&lt;/a&gt; and &lt;a href="https://github.com/kojix2/uing" rel="noopener noreferrer"&gt;Crystal bindings&lt;/a&gt; for libui. Through this process, I have come to realize how difficult it is to combine libui with garbage collection.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem of Disappearing Controls and Callback Functions
&lt;/h2&gt;

&lt;p&gt;Creating Ruby or Crystal bindings for libui is not particularly difficult. The work of checking function signatures and writing matching low-level bindings can be done mechanically.&lt;/p&gt;

&lt;p&gt;However, when you call these low-level APIs to create simple applications, the following problems occur with a certain probability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Controls disappear and memory access violations occur&lt;/li&gt;
&lt;li&gt;Callbacks disappear and memory access violations occur&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both Ruby and Crystal are languages that use garbage collection (GC), so memory that is determined to be unused gets reclaimed. As a result, pointers and callback functions that should be used in the future by the GUI main loop are mistakenly freed by the GC.&lt;/p&gt;

&lt;p&gt;In GC languages, the timing of memory deallocation is controlled indirectly through references.&lt;/p&gt;

&lt;p&gt;In Ruby, callback functions are unconditionally stored in a dedicated array. This effectively creates a memory leak (old callbacks remain in the array even after new ones are added), but since callback functions are usually finite in number in GUI applications, this is not a practical problem.&lt;/p&gt;

&lt;p&gt;Crystal uses a more complex management approach. Each callback function is tied to the instance of its related control. For example, a callback function that fires when a button is pressed is owned by that button. Additionally, the nested relationships of controls themselves are reproduced as an ownership tree. For example, a Window contains a Box, and the Box holds a Label and Button.&lt;/p&gt;

&lt;p&gt;By using this ownership tree, we can significantly reduce the problem of incorrect collection by the GC.&lt;/p&gt;

&lt;p&gt;By the way, why does Crystal's GC collect pointers even though controls may be referenced later in the main loop? I don't have a clear understanding of this point, but it's possible that memory tracking becomes difficult when closures are boxed.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui's Memory Management Rules
&lt;/h2&gt;

&lt;p&gt;libui is a C library designed for users to manage memory themselves. However, in practice, it introduces a mechanism where "when a parent control is freed, the memory of child controls is also freed." The controls that can be parent controls are Window, Box, Grid, Group, Tab, and Form.&lt;/p&gt;

&lt;p&gt;When you &lt;code&gt;destroy&lt;/code&gt; these, child controls are freed first, then the parent itself is freed. Therefore, in actual operation, you often free child controls collectively by destroying the Window.&lt;/p&gt;

&lt;p&gt;The problem is that on the Crystal side, we cannot detect such deallocation within native libraries. NULL checks might help us guess immediately after memory deallocation (libui sets pointers to NULL before deallocation), but this is unreliable.&lt;/p&gt;

&lt;p&gt;Window deallocation can happen automatically. When the [x] button in the Window's title bar is clicked, a callback function is triggered by &lt;code&gt;uiWindowOnClosing&lt;/code&gt;, and if the return value is true, the Window's destroy is automatically triggered.&lt;/p&gt;

&lt;p&gt;In contrast, &lt;code&gt;uiOnShouldQuit&lt;/code&gt; triggered from the Quit option in the menu bar represents application termination, so it does not automatically trigger destroy for the window. The user must destroy the Window themselves and call uiQuit.&lt;/p&gt;

&lt;h2&gt;
  
  
  libui's Memory Leak Detection Mechanism
&lt;/h2&gt;

&lt;p&gt;libui has a built-in mechanism for detecting memory leaks. This is a very useful feature, but it often doesn't work well with GC languages. This is because in GC, the timing of memory deallocation is indefinite, and we cannot guarantee that all memory has been freed at the time of checking. Therefore, implementations that hook into GC's &lt;code&gt;finalize&lt;/code&gt; to perform deallocation should be avoided.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;Table is based on Model-View architecture, with TableModel and Table separated. A TableModel can only be freed after all Tables using that model have been destroyed. Therefore, the deallocation procedure is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove the Table from its parent control&lt;/li&gt;
&lt;li&gt;Explicitly destroy the Table&lt;/li&gt;
&lt;li&gt;Finally destroy the TableModel&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Area Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;Unlike Table, Area can be handled by simply destroying the control.&lt;/p&gt;

&lt;h2&gt;
  
  
  MultilineEntry Deallocation Procedure
&lt;/h2&gt;

&lt;p&gt;While detailed investigation of the cause is still in progress, on macOS there appear to be cases where problems occur unless you remove it from the parent control and destroy it individually, similar to Table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;When using libui (libui-ng), there are many important considerations regarding memory management, especially deallocation.&lt;/p&gt;

&lt;p&gt;In languages that use garbage collection like Crystal and Ruby, you normally don't need to worry about memory. Even with C language bindings, manual memory management often becomes unnecessary by using deallocation callback functions like &lt;code&gt;finalize&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;However, I learned that with libraries like GUI libraries that have interactive operations where timing and synchronization are important, there are cases where you cannot rely too much on GC and must manually free memory at appropriate times.&lt;/p&gt;

&lt;p&gt;In such cases, Ruby and Crystal often provide APIs that use blocks based on RAII (Resource Acquisition Is Initialization) concepts. This can handle more than half of the cases.&lt;/p&gt;

&lt;p&gt;There seem to be cases that are difficult to handle with this alone, but I am still learning and experimenting through trial and error.&lt;/p&gt;




&lt;p&gt;Thank you for reading. This article was translated from Japanese to English by Claude Sonnet4.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qiita.com/kojix2/items/de37dfa5f00926499c37" rel="noopener noreferrer"&gt;libui と ガベージコレクション - Ruby と Crystal のバインディングを作って感じた難しさ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>libui</category>
      <category>crystal</category>
      <category>ruby</category>
    </item>
    <item>
      <title>12 Things I Learned Writing CLI Tools in Crystal</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 22 Sep 2025 05:45:49 +0000</pubDate>
      <link>https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if</link>
      <guid>https://dev.to/kojix2/12-things-i-learned-writing-cli-tools-in-crystal-12if</guid>
      <description>&lt;p&gt;I love the Crystal programming language. For the past two or three years, I have been building command-line tools with it. During this time, I often compared it with Ruby, and I encountered many differences, discoveries, and obstacles. In this article, I will share them.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Similarity to Ruby
&lt;/h2&gt;

&lt;p&gt;Crystal looks very similar to Ruby. Many common Ruby idioms also work in Crystal. Crystal is statically typed, but most of the time you do not need to write types explicitly. Type inference will do the work for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Use DeepWiki
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepwiki.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;DeepWiki&lt;/a&gt; is very useful for learning Crystal. For a niche language, it is one of the best resources. You can even ask questions in your native language.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Arrays and Hashes cannot mix types
&lt;/h2&gt;

&lt;p&gt;In Crystal, you cannot freely mix different types in an &lt;code&gt;Array&lt;/code&gt; or &lt;code&gt;Hash&lt;/code&gt;. Ruby allows this, but Crystal does not. You can use union types, but usually it is better to avoid them. Instead, consider one of these options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make a class or &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/structs.html" rel="noopener noreferrer"&gt;struct&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use a &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/structs.html#records" rel="noopener noreferrer"&gt;record&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use a &lt;a href="https://crystal-lang.org/reference/syntax_and_semantics/literals/tuple.html" rel="noopener noreferrer"&gt;Tuple&lt;/a&gt; for temporary data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first this may feel inconvenient, but you get used to it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Array(Int32 | String | Symbol) - not recommended&lt;/span&gt;
&lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"two"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:three&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# OK: Tuple for fixed positions&lt;/span&gt;
&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"two"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:three&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# OK: record for structured data&lt;/span&gt;
&lt;span class="kp"&gt;record&lt;/span&gt; &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Symbol&lt;/span&gt;
&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"apple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="ss"&gt;:fruit&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="no"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"orange"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:fruit&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. No &lt;code&gt;eval&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Crystal does not have &lt;code&gt;eval&lt;/code&gt;. This is one big difference from Ruby.&lt;br&gt;
If you really need dynamic evaluation, you should use Ruby. Another choice is to embed mruby or use a library like &lt;a href="https://github.com/Anyolite/anyolite" rel="noopener noreferrer"&gt;Anyolite&lt;/a&gt;. Crystal itself has an interpreter, but it is not practical and slower than Ruby or mruby.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Ruby&lt;/span&gt;
&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1 + 2"&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="nb"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# =&amp;gt; 3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Crystal has no eval&lt;/span&gt;
&lt;span class="c1"&gt;# You must design differently&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Method overloading
&lt;/h2&gt;

&lt;p&gt;In Ruby, it is common to branch on the argument type inside one method.&lt;br&gt;
In Crystal, it is more natural to use method overloading. This makes the code clearer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# =&amp;gt; 144&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"12"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# =&amp;gt; 144&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Return types should be consistent
&lt;/h2&gt;

&lt;p&gt;In Ruby, a method can return values of different types. In Crystal, if the return type is not clear, you will run into trouble. If you want to return different types, you should split the method. You can use a union type, but it is not recommended.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="c1"&gt;# not recommended&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;maybe_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;
  &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"forty-two"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value_int&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
  &lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;value_str&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;
  &lt;span class="s2"&gt;"forty-two"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Handling Nil
&lt;/h2&gt;

&lt;p&gt;Pay attention to whether a variable can be &lt;code&gt;Nil&lt;/code&gt;.&lt;br&gt;
If it can, you need to handle it with &lt;code&gt;not_nil!&lt;/code&gt;, &lt;code&gt;if val = maybe_val&lt;/code&gt;, or the safe navigation operator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upcase&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
  &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="s2"&gt;"name is nil"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. Garbage collection
&lt;/h2&gt;

&lt;p&gt;Crystal uses LLVM and relies on an external GC (&lt;code&gt;libgc&lt;/code&gt;).&lt;br&gt;
Performance is often close to Rust or Nim, but memory profiling and tuning can be difficult. Also, the timing of GC is not predictable, so Crystal may not be suitable for real-time systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Asynchronous I/O
&lt;/h2&gt;

&lt;p&gt;Asynchronous I/O is available by default. Some developers feel it is easier to use than in Rust. &lt;/p&gt;

&lt;h2&gt;
  
  
  10. Linking when distributing
&lt;/h2&gt;

&lt;p&gt;Crystal programs are usually linked with &lt;code&gt;libgc&lt;/code&gt; and other libraries such as &lt;code&gt;libpcre2&lt;/code&gt;. Be careful when distributing binaries.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux: You can build statically linked binaries with GitHub Actions + Docker + musl&lt;/li&gt;
&lt;li&gt;macOS: You can prepare a Homebrew Tap, or build portable binaries with static linking for &lt;code&gt;libgc&lt;/code&gt;, &lt;code&gt;libpcre2&lt;/code&gt;, and others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See also: github actions &lt;a href="https://github.com/kojix2/lolcat.cr/blob/main/.github/workflows/build.yml" rel="noopener noreferrer"&gt;workflow&lt;/a&gt; in &lt;a href="https://github.com/kojix2/lolcat.cr" rel="noopener noreferrer"&gt;lolcat.cr&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  11. Windows support
&lt;/h2&gt;

&lt;p&gt;Crystal now works on &lt;a href="https://crystal-lang.org/install/on_windows/" rel="noopener noreferrer"&gt;Windows (MSVC / MinGW64)&lt;/a&gt; more stably than before. Parallel execution also works. However, solving C library dependencies can still be painful. If you are not familiar with Windows, you may need to ask AI for help.&lt;/p&gt;

&lt;h2&gt;
  
  
  ~~ 12. Limitations of OptionParser~~
&lt;/h2&gt;

&lt;p&gt;&lt;del&gt;The standard &lt;code&gt;OptionParser&lt;/code&gt; does not support combined short options.&lt;br&gt;
So &lt;code&gt;ls -l -h&lt;/code&gt; works, but &lt;code&gt;ls -lh&lt;/code&gt; does not.&lt;br&gt;
I plan to create a pull request to fix this in the future.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;Update: This has already been resolved as of February 2026. Starting from version 1.20, short option bundling will be enabled!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/crystal-lang/crystal/pull/16563" rel="noopener noreferrer"&gt;https://github.com/crystal-lang/crystal/pull/16563&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Writing command-line tools in Crystal is sometimes painful. But at the same time, you learn a lot. I believe the “best days” of the Crystal language are not in the past or present, but in the future.&lt;/p&gt;




&lt;p&gt;This post was originally based on my reply to a thread on Reddit, then expanded into a Japanese article on Qiita, and now translated into English with the help of ChatGPT.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/crystal_programming/comments/1nhwzy3/comment/nehjvnu/?utm_source=share&amp;amp;utm_medium=web3x&amp;amp;utm_name=web3xcss&amp;amp;utm_term=1&amp;amp;utm_content=share_button" rel="noopener noreferrer"&gt;Considering rewriting my CLI tool from Ruby to Crystal - what should I watch out for?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qiita.com/kojix2/items/c305d46aafd2b51a153c" rel="noopener noreferrer"&gt;Crystalでコマンドラインツールを作って気づいた12のこと&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Embedding the Crystal Compiler in Your Program</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Sat, 09 Aug 2025 09:46:26 +0000</pubDate>
      <link>https://dev.to/kojix2/embedding-the-crystal-compiler-in-your-program-2ief</link>
      <guid>https://dev.to/kojix2/embedding-the-crystal-compiler-in-your-program-2ief</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://forum.crystal-lang.org/t/how-can-i-use-crystal-compiler-as-a-library/6162/8" rel="noopener noreferrer"&gt;The Crystal compiler can be used as a library.&lt;/a&gt;&lt;br&gt;
This document explains how to set it up and use it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Creating the Project
&lt;/h2&gt;

&lt;p&gt;First, create a new Crystal project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal init app duck_egg
&lt;span class="nb"&gt;cd &lt;/span&gt;duck_egg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Editing &lt;code&gt;shard.yml&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Edit the &lt;code&gt;shard.yml&lt;/code&gt; file as follows.&lt;br&gt;
In this example, we add &lt;code&gt;markd&lt;/code&gt; and &lt;code&gt;reply&lt;/code&gt; to the &lt;code&gt;dependencies&lt;/code&gt; section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duck_egg&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.1.0&lt;/span&gt;

&lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;🥚&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/duck_egg.cr&lt;/span&gt;

&lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;markd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;icyleaf/markd&lt;/span&gt;
  &lt;span class="na"&gt;reply&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;I3oris/reply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating &lt;code&gt;duck_egg.cr&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;src/duck_egg.cr&lt;/code&gt; and add the following code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"compiler/requires"&lt;/span&gt;

&lt;span class="no"&gt;BIRDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐔"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"cluck!"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐓"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"cock-a-doodle-doo"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦃"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"gobble"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦆"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"quack"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦉"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"hoot"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦜"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"squawk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🕊"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"coo"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦢"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"honk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦩"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"brrrrt"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🐧"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"honk honk"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦤"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"boop"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦕"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Bwooooon!!"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"🦖"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Raaaaawr!!"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;BIRDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;

&lt;span class="n"&gt;compiler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Crystal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Compiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Crystal&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Compiler&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sx"&gt;%Q(puts "&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;bird&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;  &amp;lt; &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;sound&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;")&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;compiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bird&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this program, the Crystal compiler is embedded in the target 🥚.&lt;br&gt;
When 🥚 is executed, a random bird is selected.&lt;br&gt;
The embedded compiler generates a binary that displays the bird and its sound.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building and Running
&lt;/h2&gt;

&lt;p&gt;First, build the program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;shards build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, check the &lt;code&gt;CRYSTAL_PATH&lt;/code&gt; environment variable to find the location of the Crystal standard library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal &lt;span class="nb"&gt;env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Crystal compiler requires the standard library even for very simple code such as &lt;code&gt;puts 0&lt;/code&gt;.&lt;br&gt;
Therefore, &lt;code&gt;CRYSTAL_PATH&lt;/code&gt; must be set to include the path to the standard library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CRYSTAL_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;lib:/usr/local/bin/../share/crystal/src
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the program:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/🥚
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🦖
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the generated binary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./🦖
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🦖  &amp;lt; Raaaaawr!!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;By using the Crystal compiler as a library, you can generate and compile code dynamically. This technique can be applied in many interesting ways.&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Easily Visualize Debian Package Dependencies with debtree</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 08 Aug 2025 05:38:33 +0000</pubDate>
      <link>https://dev.to/kojix2/easily-visualize-debian-package-dependencies-with-debtree-2g1n</link>
      <guid>https://dev.to/kojix2/easily-visualize-debian-package-dependencies-with-debtree-2g1n</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Sometimes you might want a quick and easy way to visualize and understand which packages a given package depends on. With the &lt;code&gt;debtree&lt;/code&gt; package and &lt;code&gt;graphviz&lt;/code&gt;, you can do this in just a few steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Install both &lt;code&gt;debtree&lt;/code&gt; and &lt;code&gt;graphviz&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;debtree graphviz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Visualizing dependencies
&lt;/h2&gt;

&lt;p&gt;If you can specify the package name you want to visualize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dpkg &lt;span class="nt"&gt;-l&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;ufw &lt;span class="c"&gt;# Check if it exists&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can easily visualize the packages it depends on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree ufw | dot &lt;span class="nt"&gt;-T&lt;/span&gt; png &lt;span class="nt"&gt;-o&lt;/span&gt; ufw_deps.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9o8nokj5vfvfbz44wht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9o8nokj5vfvfbz44wht.png" alt="ufw_deps.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, I specified &lt;code&gt;-T png&lt;/code&gt; to output a PNG image for embedding in Qiita, but you can choose from many other formats like &lt;code&gt;svg&lt;/code&gt;.&lt;br&gt;
If you have a desktop environment, you can also visualize it instantly using &lt;code&gt;x11&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree ufw | dot &lt;span class="nt"&gt;-T&lt;/span&gt; x11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84azcv4pnbbg5e95vj63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84azcv4pnbbg5e95vj63.png" alt="ufw_x11.png" width="670" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing reverse dependencies
&lt;/h2&gt;

&lt;p&gt;To visualize reverse dependencies, you can use the &lt;code&gt;-R&lt;/code&gt; / &lt;code&gt;--show-rdeps&lt;/code&gt; option.&lt;/p&gt;

&lt;p&gt;However, using &lt;code&gt;-R&lt;/code&gt; alone will also display many packages that are not actually installed on your system.&lt;br&gt;
For a cleaner view, add the &lt;code&gt;-I&lt;/code&gt; / &lt;code&gt;--show-installed&lt;/code&gt; option to limit the output to installed packages only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;debtree &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt; iptables | dot &lt;span class="nt"&gt;-T&lt;/span&gt; x11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From this, you can see that &lt;code&gt;docker-ce&lt;/code&gt; and &lt;code&gt;ubuntu-standard&lt;/code&gt; depend on &lt;code&gt;iptables&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoqxc0n8ztgu1az4du4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoqxc0n8ztgu1az4du4w.png" alt="iptables_reve_deps" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it for today! &lt;/p&gt;

</description>
      <category>debian</category>
      <category>ubuntu</category>
      <category>debtree</category>
      <category>graphviz</category>
    </item>
    <item>
      <title>Writing SIMD in Crystal with Inline Assembly</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Thu, 07 Aug 2025 01:28:30 +0000</pubDate>
      <link>https://dev.to/kojix2/writing-simd-in-crystal-with-inline-assembly-1lkp</link>
      <guid>https://dev.to/kojix2/writing-simd-in-crystal-with-inline-assembly-1lkp</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, we explore how to write SIMD instructions—SSE for x86\64 and NEON for AArch64—using inline assembly in the Crystal programming language.&lt;br&gt;
Crystal uses LLVM as its backend, but &lt;a href="https://github.com/crystal-lang/crystal/issues/3057" rel="noopener noreferrer"&gt;it doesn’t yet fully optimize with SIMD&lt;/a&gt;.&lt;br&gt;
This is not a performance tuning guide, but rather a fun exploration into low-level programming with Crystal.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;code&gt;asm&lt;/code&gt; Syntax
&lt;/h2&gt;

&lt;p&gt;Crystal provides the &lt;code&gt;asm&lt;/code&gt; keyword for writing inline assembly. The syntax is based on LLVM's integrated assembler.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"template"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;clobbers&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each section:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;template&lt;/code&gt;: LLVM-style assembly code&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;outputs&lt;/code&gt;: Output operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inputs&lt;/code&gt;: Input operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;clobbers&lt;/code&gt;: Registers that will be modified&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flags&lt;/code&gt;: Optional (e.g., &lt;code&gt;"volatile"&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;For a detailed explanation, see the &lt;a href="https://crystal-lang.org/reference/latest/syntax_and_semantics/asm.html" rel="noopener noreferrer"&gt;official docs&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Types of SIMD Instructions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSE / AVX&lt;/strong&gt; for Intel and AMD CPUs (x86_64)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NEON&lt;/strong&gt; for ARM CPUs (like Apple Silicon)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of Registers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Registers Used in x86_64
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;General-purpose: &lt;code&gt;rax&lt;/code&gt;, &lt;code&gt;rbx&lt;/code&gt;, &lt;code&gt;rcx&lt;/code&gt;, &lt;code&gt;rdx&lt;/code&gt;, &lt;code&gt;rsi&lt;/code&gt;, &lt;code&gt;rdi&lt;/code&gt;, &lt;code&gt;rsp&lt;/code&gt;, &lt;code&gt;rbp&lt;/code&gt;, &lt;code&gt;r8–r15&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SIMD:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Width&lt;/th&gt;
&lt;th&gt;Instruction Set&lt;/th&gt;
&lt;th&gt;Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xmm0–xmm15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;128-bit&lt;/td&gt;
&lt;td&gt;SSE&lt;/td&gt;
&lt;td&gt;Floats, ints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ymm0–ymm15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;256-bit&lt;/td&gt;
&lt;td&gt;AVX&lt;/td&gt;
&lt;td&gt;Wider SIMD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;zmm0–zmm31&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;512-bit&lt;/td&gt;
&lt;td&gt;AVX-512&lt;/td&gt;
&lt;td&gt;Used in newer CPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Registers Used in AArch64 (NEON)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Vector registers: &lt;code&gt;v0&lt;/code&gt;–&lt;code&gt;v31&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;v0.4s&lt;/code&gt; = 4 × 32-bit floats&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;v1.8h&lt;/code&gt; = 8 × 16-bit half-precision floats&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Examples of Register Specification
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;SSE: &lt;code&gt;xmm0&lt;/code&gt;, &lt;code&gt;xmm1&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;NEON: &lt;code&gt;v0.4s&lt;/code&gt;, &lt;code&gt;v1.8h&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLVM assigns SSE registers automatically&lt;/li&gt;
&lt;li&gt;NEON requires explicit register naming in inline assembly&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To follow along:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emit LLVM IR:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  crystal build &lt;span class="nt"&gt;--emit&lt;/span&gt; llvm-ir foo.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Emit assembly:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  crystal build &lt;span class="nt"&gt;--emit&lt;/span&gt; asm foo.cr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Benchmarking tool: &lt;a href="https://github.com/sharkdp/hyperfine" rel="noopener noreferrer"&gt;&lt;code&gt;hyperfine&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use of &lt;code&gt;uninitialized&lt;/code&gt; and &lt;code&gt;to_unsafe&lt;/code&gt; for low-level memory access&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Basic Vector Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     addps %xmm1, %xmm0      // perform parallel addition of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fadd v2.4s, v0.4s, v1.4s // add each element
     st1 {v2.4s}, [$0]        // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector Multiplication
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     mulps %xmm1, %xmm0      // perform parallel multiplication of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector multiplication: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;5.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;6.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;7.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;8.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fmul v2.4s, v0.4s, v1.4s // multiply each element
     st1 {v2.4s}, [$0]        // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector multiplication: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Aggregation Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Vector Sum
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0      // load vector into xmm0
     haddps %xmm0, %xmm0     // horizontal add: [a+b, c+d, a+b, c+d]
     haddps %xmm0, %xmm0     // horizontal add again: [a+b+c+d, *, *, *]
     movss %xmm0, ($0)       // store the first element of result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector sum: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]         // load vector
     faddp v1.4s, v0.4s, v0.4s // pairwise add: [a+b, c+d, a+b, c+d]
     faddp v2.2s, v1.2s, v1.2s // pairwise add again: [a+b+c+d, *]
     str s2, [$0]              // store the final sum"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector sum: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Finding Maximum Value
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movups ($1), %xmm0          // load vector into xmm0
     movaps %xmm0, %xmm1         // copy xmm0 to xmm1
     shufps $$0x4E, %xmm1, %xmm1 // swap upper and lower pairs
     maxps %xmm1, %xmm0          // compute max of each pair
     movaps %xmm0, %xmm1         // copy result to xmm1
     shufps $$0x01, %xmm1, %xmm1 // shuffle adjacent elements
     maxps %xmm1, %xmm0          // compute final max
     movss %xmm0, ($0)           // store the result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector max: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0_f32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0_f32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Float32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Float32&lt;/span&gt;
  &lt;span class="n"&gt;vec_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]         // load vector
     fmaxp v1.4s, v0.4s, v0.4s // pairwise max: [max(a, b), max(c, d), ...]
     fmaxp v2.2s, v1.2s, v1.2s // final pairwise max
     str s2, [$0]              // store result"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Vector max: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_vector_max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integer Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integer Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;int_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;int_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movdqu ($1), %xmm0      // load integer vector a into xmm0
     movdqu ($2), %xmm1      // load integer vector b into xmm1
     paddd %xmm1, %xmm0      // perform parallel addition of four 32-bit integers
     movdqu %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Integer addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;int_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;int_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.4s}, [$1]        // load integer vector a
     ld1 {v1.4s}, [$2]        // load integer vector b
     add v2.4s, v0.4s, v1.4s  // perform element-wise addition
     st1 {v2.4s}, [$0]        // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Integer addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_int_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Saturated Addition
&lt;/h3&gt;

&lt;h4&gt;
  
  
  SSE (x86_64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;29_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sat_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;500_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"movdqu ($1), %xmm0      // load 8 × 16-bit integers into xmm0
     movdqu ($2), %xmm1      // load 8 × 16-bit integers into xmm1
     paddsw %xmm1, %xmm0     // perform saturated addition
     movdqu %xmm0, ($0)      // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"xmm0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"xmm1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Saturated addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sat_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  NEON (AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;29_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32_000_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;sat_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1_000_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mi"&gt;500_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;600_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;700_i16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;800_i16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;StaticArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Int16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;a_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;b_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;
  &lt;span class="n"&gt;result_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;

  &lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"ld1 {v0.8h}, [$1]          // load 8 × 16-bit integers from a into v0
     ld1 {v1.8h}, [$2]          // load 8 × 16-bit integers from b into v1
     sqadd v2.8h, v0.8h, v1.8h  // perform saturated addition
     st1 {v2.8h}, [$0]          // store result to memory"&lt;/span&gt;
          &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a_ptr&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b_ptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"v0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"memory"&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"volatile"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;result&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Saturated addition: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;simd_saturated_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sat_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sat_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Examining LLVM-IR and Assembly
&lt;/h2&gt;

&lt;p&gt;To inspect LLVM IR output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build your_file.cr &lt;span class="nt"&gt;--emit&lt;/span&gt; llvm-ir &lt;span class="nt"&gt;--no-debug&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To inspect raw assembly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crystal build your_file.cr &lt;span class="nt"&gt;--emit&lt;/span&gt; asm &lt;span class="nt"&gt;--no-debug&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see that your inline &lt;code&gt;asm&lt;/code&gt; blocks are preserved as-is, even with optimizations (&lt;code&gt;-O3&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight llvm"&gt;&lt;code&gt;&lt;span class="nl"&gt;__crystal_once.exit.i.i:&lt;/span&gt;                          &lt;span class="c1"&gt;; preds = %else.i.i.i, %.noexc98&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.start.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;store&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;1.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;2.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;3.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;4.000000e+00&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;store&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;5.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;6.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;7.000000e+00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="m"&gt;8.000000e+00&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="k"&gt;asm&lt;/span&gt; &lt;span class="k"&gt;sideeffect&lt;/span&gt; &lt;span class="s"&gt;"ld1 {v0.4s}, [$1] \0Ald1 {v1.4s}, [$2] \0Afadd v2.4s, v0.4s, v1.4s \0Ast1 {v2.4s}, [$0]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"r,r,r,~{v0},~{v1},~{v2},~{memory}"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="vg"&gt;#30&lt;/span&gt;
  &lt;span class="nv"&gt;%314&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;load&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="p"&gt;x&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;align&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%path.i.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%obj1.i.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;call&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="vg"&gt;@llvm.lifetime.end.p0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="k"&gt;nonnull&lt;/span&gt; &lt;span class="nv"&gt;%b2.i.i.i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;%315&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;invoke&lt;/span&gt; &lt;span class="kt"&gt;ptr&lt;/span&gt; &lt;span class="vg"&gt;@GC_malloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;i64&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="kt"&gt;label&lt;/span&gt; &lt;span class="nv"&gt;%.noexc100&lt;/span&gt; &lt;span class="k"&gt;unwind&lt;/span&gt; &lt;span class="kt"&gt;label&lt;/span&gt; &lt;span class="nv"&gt;%rescue2.loopexit.split-lp.loopexit.split-lp.loopexit.split-lp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lloh2300:
        ldr     q1, [x9, lCPI312_43@PAGEOFF]
        add     x8, sp, #164
        add     x9, sp, #128
        str     q0, [sp, #128]
        stur    q1, [x29, #-128]
        ; InlineAsm Start
        ld1.4s  { v0 }, [x9]
        ld1.4s  { v1 }, [x10]
        fadd.4s v2, v0, v1
        st1.4s  { v2 }, [x8]
        ; InlineAsm End
        ldr     q0, [x25]
        str     q0, [sp, #16]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Miscellaneous
&lt;/h2&gt;

&lt;p&gt;When using SIMD with parallelism, memory bandwidth can become the bottleneck.&lt;br&gt;
Although Crystal currently runs single-threaded by default, true parallelism is in progress, and memory limitations may become relevant in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We’ve explored how to write SIMD operations in Crystal using inline &lt;code&gt;asm&lt;/code&gt;, and examined how those instructions are lowered into LLVM IR and eventually into assembly.&lt;/p&gt;

&lt;p&gt;This was a deep dive into low-level Crystal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Appendix: SIMD Instruction Reference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SSE (x86_64)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instruction&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movups&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Float32 (unaligned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movaps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Float32 (aligned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movdqu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load/store 4 × Int32 or 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;movss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Store scalar Float32 (lowest lane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;addps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mulps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multiply 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;paddd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Int32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;paddsw&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saturated add 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;haddps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Horizontal add of Float32 pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;maxps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Element-wise max (Float32)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;shufps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Shuffle Float32 lanes (for reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  NEON (AArch64)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instruction&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ld1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Load vector (e.g. &lt;code&gt;v0.4s&lt;/code&gt;, &lt;code&gt;v0.8h&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;st1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Store vector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;add&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Int32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sqadd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saturated add 8 × Int16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fadd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmul&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multiply 4 × Float32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;faddp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pairwise add (Float32 reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmaxp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pairwise max (Float32 reduction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;faddv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector-wide add (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fmaxv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector-wide max (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SSE's &lt;code&gt;movaps&lt;/code&gt; and &lt;code&gt;movdqa&lt;/code&gt; require 16-byte alignment.&lt;/li&gt;
&lt;li&gt;NEON's &lt;code&gt;faddp&lt;/code&gt;, &lt;code&gt;fmaxp&lt;/code&gt; reduce in two steps: 4 → 2 → 1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;shufps&lt;/code&gt; is used with masks like &lt;code&gt;0x4E&lt;/code&gt;, &lt;code&gt;0x01&lt;/code&gt; for reordering lanes during reduction.&lt;/li&gt;
&lt;li&gt;Saturated arithmetic (&lt;code&gt;paddsw&lt;/code&gt;, &lt;code&gt;sqadd&lt;/code&gt;) clamps values on overflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading — and happy crystaling! 💎&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>assembly</category>
    </item>
    <item>
      <title>Building Portable Crystal Binaries on macOS with GitHub Actions</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Mon, 21 Jul 2025 02:34:19 +0000</pubDate>
      <link>https://dev.to/kojix2/how-to-distribute-a-statically-linked-crystal-binary-on-macos-with-github-actions-1gc6</link>
      <guid>https://dev.to/kojix2/how-to-distribute-a-statically-linked-crystal-binary-on-macos-with-github-actions-1gc6</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;If you’ve ever tried to share a &lt;a href="https://github.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;Crystal&lt;/a&gt; tool you built, you may have noticed that distributing it on macOS isn’t as straightforward as on Linux. On Linux, you can just use the &lt;a href="https://crystal-lang.org/reference/1.17/guides/static_linking.html#musl-libc" rel="noopener noreferrer"&gt;official Docker image with musl&lt;/a&gt; to build fully static binaries.&lt;/p&gt;

&lt;p&gt;But macOS is different. Its design &lt;a href="https://crystal-lang.org/reference/1.17/guides/static_linking.html#macos" rel="noopener noreferrer"&gt;doesn’t allow fully static linking&lt;/a&gt;, so—just like with Rust or Go—you end up with binaries that must dynamically link to system libraries. These are what we call &lt;em&gt;portable binaries&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;By default, Crystal binaries on macOS depend on Homebrew libraries like &lt;code&gt;libgc&lt;/code&gt;, &lt;code&gt;libevent&lt;/code&gt;, and &lt;code&gt;libpcre&lt;/code&gt;. That’s not really portable. In this post, I’ll show you how to avoid those dependencies and build more portable binaries for macOS using GitHub Actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Crystal Resolves Libraries
&lt;/h2&gt;

&lt;p&gt;Crystal looks for libraries in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;CRYSTAL_LIBRARY_PATH&lt;/code&gt; environment variable&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ldflags&lt;/code&gt; from the &lt;code&gt;@[Link]&lt;/code&gt; annotation&lt;/li&gt;
&lt;li&gt;pkg-config&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Tries the specified &lt;code&gt;pkg_config&lt;/code&gt; name&lt;/li&gt;
&lt;li&gt;Falls back to the library name&lt;/li&gt;
&lt;li&gt;Only if both fail does it use a plain &lt;code&gt;-l&lt;/code&gt; flag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s the catch: even if you pass static libraries via &lt;code&gt;--link-flags&lt;/code&gt;, pkg-config runs first. If it succeeds, it usually chooses shared libraries—and ignores the static ones you gave.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workarounds
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Method 1: Use Symlinks
&lt;/h3&gt;

&lt;p&gt;One way around pkg-config is to symlink the static libraries and link them directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;brew install libgc pcre2&lt;/span&gt;
&lt;span class="s"&gt;ln -s $(brew ls libgc | grep libgc.a) .&lt;/span&gt;
&lt;span class="s"&gt;ln -s $(brew ls pcre2 | grep libpcre2-8.a) .&lt;/span&gt;
&lt;span class="s"&gt;shards build --link-flags="-L $(pwd) $(pwd)/libgc.a $(pwd)/libpcre2-8.a" --release&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 2: Disable PKG_CONFIG_PATH
&lt;/h3&gt;

&lt;p&gt;Another trick is to simply disable pkg-config so it can’t interfere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;brew install libgc pcre2&lt;/span&gt;
&lt;span class="s"&gt;unset PKG_CONFIG_PATH&lt;/span&gt;
&lt;span class="s"&gt;shards build --link-flags="$(brew ls libgc | grep libgc.a) $(brew ls pcre2 | grep libpcre2-8.a)" --release&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combining both methods is the most reliable -- especially for libraries like &lt;code&gt;libcrypto&lt;/code&gt; and &lt;code&gt;libssl&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Keep in Mind
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;latest-macos&lt;/code&gt; runner gives you an Apple Silicon (Arm) binary&lt;/li&gt;
&lt;li&gt;For Intel builds, use the &lt;code&gt;macos-13&lt;/code&gt; runner&lt;/li&gt;
&lt;li&gt;On some systems, macOS security may require users to &lt;a href="https://support.apple.com/102445" rel="noopener noreferrer"&gt;manually approve your binary&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Alternative: Homebrew Tap
&lt;/h2&gt;

&lt;p&gt;If you want the easiest experience for users, publishing a Homebrew tap is the way to go. That way, they can build your tool from source and let Homebrew handle dependencies.&lt;/p&gt;

&lt;p&gt;Still, prebuilt binaries are handy. With the approaches above, you can distribute Crystal binaries on macOS much like you would with Rust.&lt;/p&gt;




&lt;p&gt;That’s it for today. How about sharing the Crystal tool you built over the weekend?&lt;/p&gt;

</description>
      <category>crystal</category>
    </item>
    <item>
      <title>Writing Inline Assembly in the Crystal Programming Language</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Fri, 20 Jun 2025 04:17:04 +0000</pubDate>
      <link>https://dev.to/kojix2/writing-inline-assembly-in-the-crystal-programming-language-d9a</link>
      <guid>https://dev.to/kojix2/writing-inline-assembly-in-the-crystal-programming-language-d9a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you want to make your code run significantly faster, or just want to explore how computers work at a lower level, you might find yourself curious about writing instructions directly for the CPU. In Crystal, you can do this using &lt;strong&gt;inline assembly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Crystal is a programming language built on top of the LLVM compiler infrastructure. Thanks to this, it can access many of LLVM's powerful features. For low-level programming, Crystal provides both &lt;code&gt;Intrinsic&lt;/code&gt; functions and the &lt;code&gt;asm&lt;/code&gt; syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;asm&lt;/code&gt; Syntax
&lt;/h2&gt;

&lt;p&gt;Crystal supports writing inline assembly using the &lt;code&gt;asm&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;You can find the &lt;a href="https://crystal-lang.org/reference/1.16/syntax_and_semantics/asm.html" rel="noopener noreferrer"&gt;official documentation here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The basic syntax is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;asm("template" : outputs : inputs : clobbers : flags)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;template&lt;/code&gt; — Assembly code using LLVM’s integrated assembler syntax&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;outputs&lt;/code&gt; — Output operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inputs&lt;/code&gt; — Input operands&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;clobbers&lt;/code&gt; — Registers that may be modified&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flags&lt;/code&gt; — Optional flags (e.g., &lt;code&gt;"intel"&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This colon-separated syntax is quite unusual in Crystal and comes from GCC's inline assembly syntax.&lt;/p&gt;

&lt;p&gt;Let’s look at some examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  NOP Instruction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"nop"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setting a Value Using an Output Operand
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov $$10, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;$$10&lt;/code&gt; is an immediate literal value, and &lt;code&gt;$0&lt;/code&gt; is a placeholder for the output operand.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;uninitialized Int32&lt;/code&gt; is optional; initializing with &lt;code&gt;dst = 0&lt;/code&gt; works as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using an Input Operand
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov $1, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Multiple Input Operands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"add $2, $0"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Multiple Output Operands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;
&lt;span class="n"&gt;dst2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"
  mov $$10, $0
  mov $$20, $1"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"=r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst1&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using Intel Syntax
&lt;/h3&gt;

&lt;p&gt;You can also use Intel-style syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uninitialized&lt;/span&gt; &lt;span class="no"&gt;Int32&lt;/span&gt;

&lt;span class="n"&gt;asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"mov dword ptr [$0], 10"&lt;/span&gt; &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pointerof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;::&lt;/span&gt; &lt;span class="s2"&gt;"intel"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Intrinsics
&lt;/h2&gt;

&lt;p&gt;For relatively simple operations, LLVM provides &lt;strong&gt;intrinsics&lt;/strong&gt;. These functions are highly optimized, platform-independent, and often compatible with Crystal’s interpreter. However, for most basic operations, Crystal's standard library already provides efficient implementations, so using intrinsics does not always yield performance benefits.&lt;/p&gt;

&lt;p&gt;Available intrinsics are defined in the &lt;a href="https://crystal-lang.org/api/Intrinsics.html" rel="noopener noreferrer"&gt;&lt;code&gt;Intrinsics&lt;/code&gt;&lt;/a&gt; module.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Intrinsic Functions
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;memcpy&lt;/code&gt; — Copy memory
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_u8&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;dest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0_u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Copied: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;memmove&lt;/code&gt; — Move memory with overlap support
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_u8&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memmove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_unsafe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Moved: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;memset&lt;/code&gt; — Initialize memory
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0_u8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mh"&gt;0xFF&lt;/span&gt;&lt;span class="n"&gt;_u8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;is_volatile: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Set: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;debugtrap&lt;/code&gt; — Trigger debugger trap
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debugtrap&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;code&gt;pause&lt;/code&gt; — CPU pause (works on x86/x64 and AArch64)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pause&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is often used internally in Crystal’s &lt;code&gt;Mutex&lt;/code&gt; or &lt;code&gt;SpinLock&lt;/code&gt; implementations.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;read_cycle_counter&lt;/code&gt; — Read the CPU cycle counter
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_cycle_counter&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Cycles: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To observe it in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="kp"&gt;loop&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_cycle_counter&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Cycles: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nb"&gt;sleep&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;second&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bit Manipulation Intrinsics
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Bit Reversal
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bitreverse8&lt;/code&gt;, &lt;code&gt;bitreverse16&lt;/code&gt;, &lt;code&gt;bitreverse32&lt;/code&gt;, &lt;code&gt;bitreverse64&lt;/code&gt;, &lt;code&gt;bitreverse128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b1101001&lt;/span&gt;&lt;span class="n"&gt;_u8&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitreverse8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Reversed: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 10010110&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Byte Swap
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;bswap16&lt;/code&gt;, &lt;code&gt;bswap32&lt;/code&gt;, &lt;code&gt;bswap64&lt;/code&gt;, &lt;code&gt;bswap128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mh"&gt;0x12345678&lt;/span&gt;&lt;span class="n"&gt;_u32&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bswap32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Swapped: 0x&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 0x78563412&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Population Count
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;popcount8&lt;/code&gt;, &lt;code&gt;popcount16&lt;/code&gt;, &lt;code&gt;popcount32&lt;/code&gt;, &lt;code&gt;popcount64&lt;/code&gt;, &lt;code&gt;popcount128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b11010110&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popcount32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Bit count: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Count Leading Zeros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;countleading8&lt;/code&gt;, &lt;code&gt;countleading16&lt;/code&gt;, &lt;code&gt;countleading32&lt;/code&gt;, &lt;code&gt;countleading64&lt;/code&gt;, &lt;code&gt;countleading128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b00001111&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;countleading32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Leading zeros: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Count Trailing Zeros
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;counttrailing8&lt;/code&gt;, &lt;code&gt;counttrailing16&lt;/code&gt;, &lt;code&gt;counttrailing32&lt;/code&gt;, &lt;code&gt;counttrailing64&lt;/code&gt;, &lt;code&gt;counttrailing128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mb"&gt;0b11110000&lt;/span&gt;&lt;span class="n"&gt;_i32&lt;/span&gt;
&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Intrinsics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counttrailing32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Trailing zeros: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# =&amp;gt; 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Crystal still lacks extensive documentation in many languages, but &lt;a href="https://deepwiki.com/crystal-lang/crystal" rel="noopener noreferrer"&gt;DeepWiki&lt;/a&gt; is a reliable source for answers to most questions. This article is based on what I’ve learned from DeepWiki, and all code examples have been tested to ensure they work correctly. I highly recommend it.&lt;/p&gt;

&lt;p&gt;That’s all for now — happy hacking with Crystal!&lt;/p&gt;




&lt;p&gt;This post was translated from Japanese to English by ChatGPT. &lt;br&gt;
Click &lt;a href="https://qiita.com/kojix2/items/08c1a6a9d32f15f5a921" rel="noopener noreferrer"&gt;here&lt;/a&gt; to see the original post.&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>assembly</category>
      <category>llvm</category>
    </item>
    <item>
      <title>Building My First Web App with the Help of AI</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Sun, 23 Mar 2025 10:25:56 +0000</pubDate>
      <link>https://dev.to/kojix2/building-my-first-web-app-with-the-help-of-ai-52m0</link>
      <guid>https://dev.to/kojix2/building-my-first-web-app-with-the-help-of-ai-52m0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Since the beginning of 2025, AI agents have made dramatic progress. We're now living in a time where you can simply share an idea with an AI, and it will generate an application for you.&lt;/p&gt;

&lt;p&gt;Although I’ve been casually programming for both work and fun, I had never created a full-fledged web application before. Web apps seemed a bit intimidating to me because of all the technical hurdles involved—authentication, databases, deployment—especially compared to command-line tools.&lt;/p&gt;

&lt;p&gt;But thanks to the power of AI agents, I was able to build my first web app, and I’d like to share my experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Web App: tokei-api
&lt;/h2&gt;

&lt;p&gt;🌐 &lt;a href="https://tokei.kojix2.net/" rel="noopener noreferrer"&gt;https://tokei.kojix2.net/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
📦 GitHub: &lt;a href="https://github.com/kojix2/tokei-api" rel="noopener noreferrer"&gt;https://github.com/kojix2/tokei-api&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuvxnnbc3bhwofavpp1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuvxnnbc3bhwofavpp1l.png" alt="image.png" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It also generates badges you can embed directly into your GitHub README.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why I Wanted GitHub Code Line Stats
&lt;/h2&gt;

&lt;p&gt;Whenever I find an interesting project on GitHub, I usually clone the repository and run &lt;a href="https://github.com/XAMPPRocky/tokei" rel="noopener noreferrer"&gt;&lt;code&gt;tokei&lt;/code&gt;&lt;/a&gt; to check how many lines of code it has. It gives me a rough idea of the project's scale and complexity. (Subjective, of course!)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0–1000 lines: likely a simple utility with a single function&lt;/li&gt;
&lt;li&gt;2000–3000 lines: medium-sized tool with multiple features&lt;/li&gt;
&lt;li&gt;5000+ lines: a well-structured project&lt;/li&gt;
&lt;li&gt;10,000+ lines: large-scale library or application&lt;/li&gt;
&lt;li&gt;20,000+ lines: likely built by a team for production use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, business apps can easily go into the hundreds of thousands of lines, but most public GitHub repositories tend to be libraries or tools, not enterprise software.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tokei&lt;/code&gt; is easy to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/user_name/repo_name
&lt;span class="nb"&gt;cd &lt;/span&gt;repo_name
tokei
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Crystal                15         1319          966          165          188
 CSS                     1          211          165           11           35
 ...
 Total                  23         2006         1339          334          333
===============================================================================
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also add the &lt;code&gt;-f&lt;/code&gt; option to see line counts per file.&lt;/p&gt;

&lt;p&gt;However, while browsing GitHub, switching to the terminal just to run &lt;code&gt;tokei&lt;/code&gt; can be a bit of a hassle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concept and Development of tokei-api
&lt;/h2&gt;

&lt;p&gt;So, I decided to build a web app that runs the &lt;code&gt;tokei&lt;/code&gt; command on public repositories—no login required, no risk of exposing sensitive data.&lt;/p&gt;

&lt;p&gt;I also added the ability to auto-generate badge code for embedding in README files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment on Koyeb
&lt;/h2&gt;

&lt;p&gt;🛠️ &lt;a href="https://www.koyeb.com/" rel="noopener noreferrer"&gt;https://www.koyeb.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I chose &lt;strong&gt;Koyeb&lt;/strong&gt; to host the app. It has a generous free tier and allows deployment in Washington D.C. or Frankfurt (Tokyo is available on paid plans).&lt;/p&gt;

&lt;p&gt;Apps go to sleep after periods of inactivity on the free tier, but that’s perfectly acceptable for my use case. Koyeb automatically builds and deploys your app by connecting to your GitHub repo and using your &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neon – Serverless PostgreSQL
&lt;/h2&gt;

&lt;p&gt;🗃️ &lt;a href="https://console.neon.tech/" rel="noopener noreferrer"&gt;https://console.neon.tech/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since persistent storage (volumes) on Koyeb’s free plan requires payment, file-based databases like SQLite get reset on every deploy. So, I used &lt;strong&gt;Neon&lt;/strong&gt;, a serverless PostgreSQL provider.&lt;/p&gt;

&lt;p&gt;Neon is also free and offers a region near Koyeb’s Washington D.C. servers (Ohio). While the free plan doesn't allow IP whitelisting, that's fine in this case since all data is public anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crystal Language and the Kemal Framework
&lt;/h2&gt;

&lt;p&gt;You might not be familiar with &lt;strong&gt;Crystal&lt;/strong&gt;, but it's a statically typed language with syntax very similar to Ruby. It offers much better performance, which is helpful for lightweight environments like free-tier servers.&lt;/p&gt;

&lt;p&gt;I used &lt;strong&gt;Kemal&lt;/strong&gt;, a Crystal framework similar to Ruby’s Sinatra:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/kemalcr/kemal" rel="noopener noreferrer"&gt;https://github.com/kemalcr/kemal&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Crystal lacks a mature ORM like ActiveRecord, so it’s not ideal for large-scale apps. But for building small services with help from AI, it’s a great fit.&lt;/p&gt;

&lt;p&gt;Note: While Crystal’s compiler requires a decent amount of memory, Koyeb handles builds in a separate environment, so it worked fine even on the free plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain Setup
&lt;/h2&gt;

&lt;p&gt;I used my pre-owned domain &lt;code&gt;kojix2.net&lt;/code&gt;, which I registered via Onamae.com.&lt;br&gt;&lt;br&gt;
Koyeb provides CNAME setup instructions in its UI, so configuring DNS on Onamae.com was straightforward.&lt;/p&gt;

&lt;p&gt;HTTPS was automatically enabled using Google Trust Services certificates.&lt;/p&gt;

&lt;p&gt;📝 Side note: I realized I had accidentally subscribed to an unnecessary hosting plan through Onamae.com in the past. Their UI sometimes adds items to the cart without notice—I'll be more careful going forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Cost
&lt;/h2&gt;

&lt;p&gt;I tried to use as many free services as possible, but the &lt;strong&gt;AI itself wasn’t cheap&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
I mainly used &lt;strong&gt;Claude 3.7 Sonnet&lt;/strong&gt; from Anthropic for detailed refinements, and I estimate the total cost came close to ¥10,000 (~$70).&lt;/p&gt;

&lt;p&gt;That said, Claude’s output quality is top-notch—significantly better than OpenAI for this task. I now understand why so many people are turning to Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This was my &lt;strong&gt;first time building and deploying a web application&lt;/strong&gt;, and I was able to get it done!&lt;/p&gt;

&lt;p&gt;It took about 6 hours for the initial implementation and another 4 hours of intermittent tweaks.&lt;/p&gt;

&lt;p&gt;As someone who’s not a professional web developer, being able to create a functional service in a single day is a testament to how much the world has changed.&lt;/p&gt;

&lt;p&gt;That said, this app only runs the &lt;code&gt;tokei&lt;/code&gt; command. For a more serious web app with authentication and sensitive data, I would absolutely consult a professional for security best practices and proper code reviews.&lt;/p&gt;

&lt;p&gt;Next, I’d like to try building an Android app or something more complex. Keeping up with the pace of AI is challenging, but I want to keep learning and building.&lt;/p&gt;

&lt;p&gt;Thanks for reading. Have a great day!&lt;/p&gt;




&lt;p&gt;Translated from my original Japanese article on Qiita, with the help of ChatGPT:&lt;br&gt;
&lt;a href="https://qiita.com/kojix2/items/b8fb4047e9547a156294" rel="noopener noreferrer"&gt;tokei-api というサイトを VSCode + Cline + Cluade 3.7 を使って Crystal + Koyeb + Neon で作成した話&lt;/a&gt;&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>koyeb</category>
      <category>neon</category>
      <category>cline</category>
    </item>
    <item>
      <title>Wombat - Syntax Highlighting with Rust's Bat Called from Crystal</title>
      <dc:creator>kojix2</dc:creator>
      <pubDate>Thu, 16 Jan 2025 04:26:50 +0000</pubDate>
      <link>https://dev.to/kojix2/wombat-syntax-highlighting-with-rusts-bat-called-from-crystal-1lo4</link>
      <guid>https://dev.to/kojix2/wombat-syntax-highlighting-with-rusts-bat-called-from-crystal-1lo4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qdu0r580nz4zgold2nv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qdu0r580nz4zgold2nv.png" alt="wombat logos" width="630" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hello!&lt;/p&gt;

&lt;p&gt;Have you heard of the command-line tool &lt;a href="https://github.com/sharkdp/bat" rel="noopener noreferrer"&gt;&lt;strong&gt;bat&lt;/strong&gt;&lt;/a&gt;, written in Rust?&lt;br&gt;&lt;br&gt;
&lt;strong&gt;bat&lt;/strong&gt; is a command-line tool similar to &lt;strong&gt;cat&lt;/strong&gt; that displays file contents in the terminal, but with additional features like line numbering, syntax highlighting, and paging.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bat hello.rb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the other hand, Crystal currently lacks a powerful syntax highlighting library.&lt;br&gt;&lt;br&gt;
So, I thought about using &lt;strong&gt;bat&lt;/strong&gt; as a library to solve this problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bat Can Also Be Used as a Rust Library
&lt;/h2&gt;

&lt;p&gt;In fact, &lt;strong&gt;Bat&lt;/strong&gt; can also be used as a &lt;a href="https://docs.rs/bat/latest/bat/" rel="noopener noreferrer"&gt;Rust library&lt;/a&gt;. This is possible through the &lt;a href="https://docs.rs/bat/latest/bat/struct.PrettyPrinter.html" rel="noopener noreferrer"&gt;&lt;code&gt;PrettyPrinter&lt;/code&gt;&lt;/a&gt; struct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;bat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PrettyPrinter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nn"&gt;PrettyPrinter&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.input_from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;b"&amp;lt;span style=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;color: #ff00cc&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;Hello world!&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.language&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"html"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bat&lt;/strong&gt; uses a library called &lt;a href="https://github.com/trishume/syntect" rel="noopener noreferrer"&gt;Syntect&lt;/a&gt; for syntax highlighting. However, Syntect is quite complex, so I thought it would be easier to use &lt;strong&gt;Bat&lt;/strong&gt; directly as a library.&lt;/p&gt;

&lt;p&gt;From the code above, you can see that the output is handled by the Rust side, specifically to the terminal. Originally, &lt;strong&gt;Bat&lt;/strong&gt; did not have a function to simply syntax highlight a string.&lt;/p&gt;

&lt;p&gt;In the open-source world, agility and the willingness to get hands-on are essential. So, after consulting with ChatGPT, I added a &lt;code&gt;print_with_writer&lt;/code&gt; function to &lt;strong&gt;PrettyPrinter&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;print_with_writer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;W&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This addition allows syntax highlighting of strings, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;bat&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PrettyPrinter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;output_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;String&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nn"&gt;PrettyPrinter&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.input_from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;b"&amp;lt;span style=&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;color: #ff00cc&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;Hello world!&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.language&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"html"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.print_with_writer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;output_str&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_str&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I submitted a &lt;a href="https://github.com/sharkdp/bat/pull/3070" rel="noopener noreferrer"&gt;pull request&lt;/a&gt;, and it was successfully merged. Starting from &lt;strong&gt;bat v0.25.0&lt;/strong&gt;, this &lt;code&gt;print_with_writer&lt;/code&gt; function is now available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a C Library to Call Bat from C
&lt;/h2&gt;

&lt;p&gt;Rust libraries cannot be called directly from Crystal. Therefore, I decided to create a lightweight wrapper library that allows &lt;strong&gt;Bat&lt;/strong&gt; to be called from the C programming language. This makes it easy to use &lt;strong&gt;Bat&lt;/strong&gt; not only from Crystal but also from various other languages. This is because many programming languages provide interfaces for calling C libraries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kojix2/bat-c" rel="noopener noreferrer"&gt;https://github.com/kojix2/bat-c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since I cannot read or write Rust, almost all of the code was generated with the help of ChatGPT and Copilot.&lt;/p&gt;

&lt;p&gt;I expect there will be more opportunities to create lightweight C wrappers for Rust libraries in the future, so I’ve noted down some of the things I learned.&lt;/p&gt;

&lt;p&gt;Rust is considered a low-level programming language, but C has an even lower level of abstraction compared to Rust. This gives more flexibility when designing the API for calling functions.&lt;/p&gt;

&lt;p&gt;For example, when calling a low-level C library from a high-level language like Python, the method signatures are uniquely defined. Think of how bindings are generated using &lt;strong&gt;libffi&lt;/strong&gt;. The uniqueness of method signatures is what makes &lt;strong&gt;libffi&lt;/strong&gt; bindings possible. Of course, after that, you would design a high-level API that aligns with object-oriented principles, but at the calling level, method signatures are strictly defined.&lt;/p&gt;

&lt;p&gt;However, calling Rust from C means calling a high-level library from a low-level language. This is similar to calling Python from C—since the level of abstraction decreases, the C-side interface is not strictly defined. This gives the developer some freedom in how to design the API. (To be precise, Python does have a C API, but imagine a scenario where that is abstracted away.)&lt;/p&gt;

&lt;p&gt;Therefore, even when using ChatGPT, it's important to carefully consider what kind of API you want to design and clearly specify that to the AI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Added a function to display the version. This allows users to identify which version of &lt;strong&gt;bat-c&lt;/strong&gt; they are using. In this implementation, the version is stored as a constant in static memory for the program's entire lifetime, and a pointer to it is returned.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[no_mangle]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;extern&lt;/span&gt; &lt;span class="s"&gt;"C"&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;bat_c_version&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="nb"&gt;c_char&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;VERSION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;concat!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;env!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"CARGO_PKG_VERSION"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\0&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;VERSION&lt;/span&gt;&lt;span class="nf"&gt;.as_ptr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="nb"&gt;c_char&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Memory allocation and deallocation for strings can easily become problematic. If the Rust library allocates memory for a string, it must also provide a function to free that memory on the Rust side.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cargo.toml Configuration&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specify the library types under &lt;code&gt;[lib]&lt;/code&gt;. Both of the following are set:
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cdylib&lt;/code&gt; to generate a dynamic library.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;staticlib&lt;/code&gt; to generate a static library.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rpath = true&lt;/code&gt; allows the dynamic library to be located using a relative path.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;[profile.release]&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LTO (Link Time Optimization):&lt;/strong&gt; Enabled to optimize and speed up the binary during linking.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;codegen-units = 1:&lt;/strong&gt; Sets the number of code generation units to 1 to maximize optimization.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;debug&lt;/strong&gt; and &lt;strong&gt;strip&lt;/strong&gt; are set to potentially reduce file size, though the impact may be minimal.
&lt;/li&gt;
&lt;li&gt;Considered using &lt;code&gt;opt-level = 3&lt;/code&gt; or &lt;code&gt;opt-level = "z"&lt;/code&gt; but left it as is for balance.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[package]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"bat-c"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.0.7"&lt;/span&gt;
&lt;span class="py"&gt;edition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2021"&lt;/span&gt;

&lt;span class="nn"&gt;[lib]&lt;/span&gt;
&lt;span class="py"&gt;crate-type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"cdylib"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"staticlib"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;bat&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.25.0"&lt;/span&gt;

&lt;span class="nn"&gt;[profile.dev]&lt;/span&gt;
&lt;span class="py"&gt;rpath&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[profile.release]&lt;/span&gt;
&lt;span class="py"&gt;lto&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"fat"&lt;/span&gt;
&lt;span class="py"&gt;codegen-units&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;rpath&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;debug&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="py"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;To automate library version updates, I introduced &lt;a href="https://mend.io/renovate" rel="noopener noreferrer"&gt;Renovate&lt;/a&gt;. Although I'm not very familiar with it, adding the following JSON file to the repository enables it to work:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;renovate.json&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://docs.renovatebot.com/renovate-schema.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extends"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"config:best-practices"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"schedule:quarterly"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;I configured the workflow to automatically trigger a release when a Git tag is created.
&lt;/li&gt;
&lt;li&gt;Since this is strictly a C library, I decided not to run &lt;code&gt;cargo publish&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Calling bat-c from Crystal
&lt;/h2&gt;

&lt;p&gt;At this point, calling &lt;strong&gt;bat-c&lt;/strong&gt; from Crystal becomes straightforward. For this purpose, I created a library called &lt;strong&gt;wombat&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kojix2/wombat" rel="noopener noreferrer"&gt;https://github.com/kojix2/wombat&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main challenge here is downloading and placing the library.&lt;/p&gt;

&lt;p&gt;If &lt;strong&gt;bat-c&lt;/strong&gt; were a well-developed and widely used library, packaging it for installation via a package manager would be an option. However, that is not the case this time. Therefore, I decided to simply allow the latest version of the library to be downloaded from the GitHub Releases page.&lt;/p&gt;

&lt;p&gt;Although both static and dynamic libraries are available, I chose to use the static library. After all, Rust makes it easy to create static libraries, and unlike Ruby or Python, Crystal can directly integrate static libraries, which is a significant advantage.&lt;/p&gt;

&lt;p&gt;In Ruby, you could freely write custom tasks in a &lt;strong&gt;Rakefile&lt;/strong&gt;, but Crystal doesn’t offer that level of flexibility. The closest mechanism is &lt;strong&gt;shards'&lt;/strong&gt; &lt;code&gt;post_install&lt;/code&gt; hook. So, I configured it to trigger a script that downloads the static library.&lt;/p&gt;

&lt;p&gt;I could have written Crystal code to handle the download, but unfortunately, Crystal's standard library is still limited and often struggles with redirects or proxy environments. Therefore, I created a shell script using &lt;strong&gt;curl&lt;/strong&gt; for Unix-like systems and a batch file for Windows, allowing the appropriate script to run depending on the OS.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Sample Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight crystal"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"../src/wombat"&lt;/span&gt;

&lt;span class="c1"&gt;# Output the file content with syntax highlighting by calling the Rust function&lt;/span&gt;
&lt;span class="no"&gt;Wombat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_print_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;__FILE__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output the highlighted string of the input by calling the Rust function&lt;/span&gt;
&lt;span class="no"&gt;Wombat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;input: &lt;/span&gt;&lt;span class="sx"&gt;%{fn main() { println!("Hello, world!"); }}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;language: &lt;/span&gt;&lt;span class="s2"&gt;"Rust"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the highlighted string of the input&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="no"&gt;Wombat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sx"&gt;%{puts "Hello, World!"}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;language: &lt;/span&gt;&lt;span class="s2"&gt;"Crystal"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;theme: &lt;/span&gt;&lt;span class="s2"&gt;"TwoDark"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running in GitHub Actions
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1qw71nauhog41857nu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1qw71nauhog41857nu.png" alt="GitHub Actions Wombat" width="577" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more details, please refer to the &lt;a href="https://kojix2.github.io/wombat/" rel="noopener noreferrer"&gt;API Documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current Issues and Areas for Improvement
&lt;/h2&gt;

&lt;p&gt;Although I was able to accomplish most of what I set out to do, there are a few concerns that remain:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The size of the generated library is quite large.
&lt;/li&gt;
&lt;li&gt;Ideally, the design should have considered C compatibility from the start at the Rust level. However, since the C wrapper was added afterward, the internal structure might not be as efficient.
&lt;/li&gt;
&lt;li&gt;Due to my limited knowledge of Rust and C, the API design might not be as refined as it could be.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That said, the main goal—to call &lt;strong&gt;Bat&lt;/strong&gt; as a library from Crystal with minimal maintenance—has been mostly achieved. Additionally, by establishing a method for calling Rust from Crystal, I've opened up new possibilities for future projects.&lt;/p&gt;

&lt;p&gt;Of course, since this is a hobby project developed by an individual, there may be inconveniences or bugs. If you notice any issues, I would greatly appreciate it if you could submit an issue or a pull request.&lt;/p&gt;

&lt;p&gt;That’s all for this article. Have a great day!&lt;/p&gt;




&lt;p&gt;Original post in Japanese on Qiita - &lt;a href="https://qiita.com/kojix2/items/1b739c2a3eaddea8a3bb" rel="noopener noreferrer"&gt;Wombat - RustのBatをCrystalから呼び出しシンタックスハイライティングする&lt;/a&gt;&lt;br&gt;
Translated into English by ChatGPT.&lt;/p&gt;

</description>
      <category>crystal</category>
      <category>rust</category>
      <category>c</category>
      <category>bat</category>
    </item>
  </channel>
</rss>
