DEV Community: kojix2

I Think Ruby Isn’t Dynamic Enough…

kojix2 — Mon, 04 May 2026 14:58:40 +0000

This article is a subjective personal essay, not a rigorous technical argument.

For the past few years, I have become something of a Crystal believer. Crystal is a language that has achieved high performance by accepting strong constraints. Ruby, by contrast, is a language whose strength lies in dynamic qualities that Crystal cannot have. Looking at recent movements in Ruby from the perspective of a Crystal believer, I sometimes find myself thinking: “That is an area Crystal people have been digging into for years, and Ruby’s real strengths are not really there, are they…?” I have not been able to share this feeling with many people, which has been frustrating.

I myself only really understand Ruby and Crystal, so I have not had much confidence in what I have been thinking, and have spent my time in a somewhat vague state. But if I do not write down these feelings, I will no longer be able to refer back to them, so I decided to summon some courage and write this personal essay.

Ruby Is Not Object-Oriented Enough Either

The incompleteness I feel in Ruby is that many operations lack reversibility. You can define a variable, but it is difficult to cleanly delete it. There is include for modules, but there is no de-include. Mechanisms such as remove_method, remove_const, undef_method, UnboundMethod, and define_method do exist, but there does not seem to be a consistent reversible model for taking methods or behavior out of one structure and safely transplanting them into another object structure.

Ruby is considered a dynamic language, and it permits all kinds of changes at runtime. But that freedom seems to work strongly in the direction of “adding things later.” The freedom to remove what has been added, to decompose structures and reassemble them into another form, or to undo such changes, does not seem to have been systematized very much.

Perhaps Ruby does not have enough of the qualities of a dynamic language.

Ideally, I think it would be interesting if there were a Ruby implementation that, like machine learning, could be given input data and expected output, and then explore at the meta-level, at runtime, how to optimize its object structure. As a foundation for that, I imagine it would need mechanisms that allow it to observe, transform, and reconstruct its own objects. Although I do not know whether such a thing is truly possible.

Even if something like that were realized, in practice it might end up being separated into two stages: “generation of object structures through learning or compilation,” and then “execution.” I feel that would be rather boring.

What I Hope to See from Ruby

I am deeply absorbed in Crystal, and have drifted a little away from Ruby. There are several people like that. Seeing this, it would not be strange if some people thought Ruby should also become capable of doing more Crystal-like things.

But what is actually needed is the opposite. Crystal has structural constraints that it simply cannot escape. Crystal is a language that achieves speed and low memory usage by placing strong constraints on Ruby. Since I am a Crystal believer, I think that if you want to make a language do Crystal-like things, Crystal is better at that. There is nothing interesting about Ruby trying to do the same thing. I want to see what only Ruby can do.

Ruby is, compared with Crystal, a language used in industry, so I think there are ways in which it cannot move freely. A language that can freely transform the structure of objects at runtime would be dangerous and would probably not be welcomed by industry. Still, isn’t it strange that, among mainstream languages, Ruby is treated almost as if it were the most dynamic language? I cannot shake the feeling that there remains a vast frontier in the world of languages even more dynamic than Ruby.

I hope that someday I will see an attempt to expand the very world of programming itself into an even more dynamic realm.

This post was machine-translated from Japanese into English using ChatGPT.

Porting Libraries to Crystal with AI

kojix2 — Sat, 02 May 2026 06:40:43 +0000

This post was originally written in Japanese and translated into English by the author using ChatGPT. The original post is available here.

Introduction

AI coding tools have become much better, and I now write code by hand far less often than before. I use GitHub Copilot in VSCode through the OSS benefit program, and recently I have also been trying Claude Code and Codex.

Porting libraries to the Crystal language

Compared with other programming languages, Crystal does not have a very large library ecosystem.

However, many open source libraries have licenses that allow porting, and AI can provide a lot of help with porting work. Because of this, when a library is missing, it is becoming realistic to consider porting as an option.

In this article, I want to put into words and record what I actually do when porting libraries.

Choosing a reference library

The first step is to choose the library to use as a reference.

In Crystal, compared with C or Go, a higher level of abstraction is often expected. On the other hand, Crystal cannot use metaprogramming as freely as Ruby or Python.

In that sense, Crystal is a language somewhere in the middle, and it is not always possible to choose a single reference library. Sometimes it is necessary to look at multiple libraries and think about what would be best. In my case, I often look at active Rust and Go projects, while using Ruby and Python APIs as references. When the reference is written in Rust, binding to it may sometimes be better than porting it.

That kind of separation is the first step. If I decide that porting is the right approach, I move on. There may also be cases where porting and bindings are mixed.

Checking the license

The most important point is the license. I check whether the original library is under a license that allows porting, such as MIT, BSD, Apache-2.0, or another license whose conditions I can comply with. I try to make the new library inherit the license terms of the original.

I also clearly state where the original project came from. In my case, I add the reference repository as a whole using git submodule. This also fixes the version of the code that Coding Agents refer to. It helps avoid unnecessary misunderstandings and trouble.

When creating a final "tool", it may be reasonable to have multiple reference sources. But when creating a "library" by porting or binding, I think it is easier to maintain the project later if the main reference source is kept to one.

Making an overall plan with the web version of ChatGPT

First, I upload an archive of the source library, either a tarball or a zip file, to ChatGPT and ask for an overall policy for porting the whole library to Crystal. I upload an archive simply because there is a limit on the number of files that can be uploaded.

Why do I start with the web version of ChatGPT instead of a Coding Agent?

The reason is based on experience: I tend to get better results when I first discuss the whole thing with the web version. This is only a guess, but I think there are probably two reasons.

The first is the efficient use of internet search. Compared with local Coding Agents, the web version of ChatGPT is better at search. It can look through websites, technical blogs, and GitHub, and explore the policy from a wider point of view. When something is vague or unknown, it can search on its own and refine the plan.

The second reason is probably that the cloud environment has more efficient access to documentation. Compared with searching a codebase locally while spending tokens, searching code in the cloud seems to work better, at least in my experience.

Once the overall policy is produced, I ask additional questions about unclear points or my own requirements. At this stage, it may be useful to limit the amount of information by saying something like, "Please answer in three lines or less," so that ChatGPT's responses do not become too long or go off on a tangent.

When building a Crystal library as a binding to a static language such as C, C++, or Rust, it is important to discuss the boundary of the binding, the build system, and the infrastructure, and to make sure the understanding is aligned.

The desirable Crystal API design is often different from the language of the original library, such as Rust or Go. In such cases, I may also upload Ruby libraries as reference material and ask what kind of API would be desirable. However, Crystal is not Ruby, so simply making the API the same as Ruby is not necessarily the right answer. If I really want to make something good, I have to check it myself.

Once the architecture and design are agreed on, I ask ChatGPT to write it down as a self-contained and executable plan, usually in a file such as PLAN.md. I often have PLAN.md written in Japanese so that I can read it easily.

However, PLAN.md may contain rough notes, wrong assumptions, or unorganized working context. For that reason, I usually do not publish this file as-is. Instead, I try to preserve important design decisions and caveats in a form that can be read later, such as in the README, issues, or commit messages.

That said, if the tool is not just for myself and I want many people to use it, I may ask for PLAN.md to be written in English from the beginning.

Doing the porting work locally

Because I want to make use of Copilot's free quota, I use VSCode locally. Recently, however, CLI agents have also become more capable, and it is becoming possible to use editors of one's choice, such as Zed.

First, I initialize the project repository.

I decide the Crystal project name, create a skeleton with crystal init lib piyo, add the reference repository as a submodule, and place PLAN.md in the repository.

I still do not know whether PLAN.md should be made more like a TODO list.

I do not want to lose the work, so I create a new private repository on GitHub and prepare to push the project there. After that, I ask the Coding Agent to proceed with the project while looking at PLAN.md.

During the work cycle, I periodically run crystal tool format for formatting and crystal spec for tests.

In my case, I run the linter, ameba --no-color, by myself later, and then pass the result to the Agent and ask it to fix the issues in a batch.

I also periodically ask it to review the plan, update it, or delete parts that have been completed.

Coding agents make rapid progress at first, but at some point they often start taking very small steps, and the work stops moving forward.

In such cases, I stop the agent for a while and ask questions such as: "What is the current problem?", "Are there places that should be refactored before continuing?", "Is there anything unrealistic in the plan itself?", or "Is there anything you want me to do, such as setting up the environment?"

In some cases, I make a tarball of the current state, upload it to ChatGPT again, ask it to evaluate the whole situation from a broader perspective, and have it create PLAN2.md.

However, this kind of workflow is only possible because I am working alone and will eventually publish everything on GitHub. I do not think this is a possible workflow when multiple people are coding for work.

Writing tests with fixtures

Some reference libraries provide fixtures for tests. When the original repository is added as a submodule, those fixtures can also be referenced, so they can be used in the Crystal-side tests as well.

However, my honest impression is that aiming for parity is not easy, because of differences caused by floating point errors, random numbers, race conditions, and so on.

To absorb differences in random numbers, one possible method is to prepare a small piece of code in the original language that generates random numbers, save those generated numbers as a kind of fixture in JSON or another format, and use those values in Crystal tests. This method does not always work, but there are cases where it does.

Adding GitHub Actions

Once the project has taken some shape, I add several GitHub Actions workflows.

docs.yml for generating documentation
build.yml for building and releasing CLI tools
test.yml for tests
dependabot.yml, or Renovate, for updating GitHub Actions

I end up adding these almost every time. In my case, I have a toy project called lolcat.cr, and I often copy workflows from it and then modify them.

After adding them, I adjust things until the tests and builds pass, and then add badges to the README.

Keeping README.md simple

This depends on the purpose. In my case, most of the AI-ported libraries I create are for my own use. Compared with the time when I wrote libraries by hand, my feeling has changed a little: I do not really think that I want the library to become widely used. Maybe I instinctively want to avoid losing time to maintenance or spending more money on tokens.

If README.md is too decorated, it starts to look like it was made by AI. A project with a gorgeous README but no maintenance feels, to me, like the ruins of a theme park with no afterglow. It does not leave a good impression.

So I usually ask for README.md to be written in a "simple, plain, and purely practical" style.

Deciding the granularity of commits

As a general principle, in order to run a trustworthy project, it is desirable to avoid force push and leave a transparent commit log. This is especially true when developing together in an open source community. Through pull requests and reviews, you can interact with people around the world and come to know what kind of people they are. I think this is one of the pleasures of participating in an open source community.

However, best practices for Git workflows in solo personal development that depends heavily on AI have not yet been established. It is necessary to record why a certain piece of code was included, but I think it is better to lean on Git than to attach a separate memory system only for AI.

That said, Git is chronological. When reordering commits, even if the final state of the code is the same, force push becomes necessary. In the AI era, I feel that we may need a version control method based on semantics, one that can rebase without depending so strongly on chronological order.

For now, personally, I ask AI to write commit messages, and then I manually commit them myself.

I think this works as a minimal check to confirm that the human goal and the AI's work target are aligned. But there is also criticism that this is laundering or hiding AI-written code as if it had been written by a human, and I do not think it can be called a best practice.

Embarrassingly, I also use force push a lot.

Especially in the early stage of a private repository where I am progressing through a plan, I repeatedly use --amend and force push, effectively using Git as a kind of "overwrite save" without leaving much history.

Of course, this is mainly about private repositories before publication. The same thing should not be done on a shared public branch.

Conclusion

What I have written here reflects the situation as of April 2026.

I hope that when I reread this later, I will be able to think, "So that was what things were like at that time."

Coding Agents have made it possible to ask for explanations of algorithms and ideas that were previously difficult to understand, at any desired level of detail. They have also made it easy to quickly implement and try out ideas that come to mind. This really is revolutionary.

At the same time, I sometimes get too absorbed in AI coding, work for too many hours, and feel that it may harm my health. I think I need to be a little careful about that. Health comes first.

At the beginning, I wrote that this article was written by hand, not by AI.

Note added for the English version: This refers to the original Japanese version. This English translation was made with ChatGPT.

I wrote that on purpose because these days I often generate text with AI too, and I wanted to deliberately do something different this time.

That is all for this article.

Why Is Crystal Compilation So Slow?

kojix2 — Mon, 08 Dec 2025 13:30:24 +0000

Introduction

The Crystal programming language is notorious for its slow compilation times.

But have you ever wondered where Crystal actually spends most of its compilation time?

Figure: Crystal uses LLVM as its backend

The Crystal Compilation Pipeline

The Crystal compiler's compilation process consists of the following stages:

new_program - Creating the program object
parse - Lexical analysis and parsing
semantic - Semantic analysis
codegen - Generating object files

module Crystal
  class Compiler
    def compile(source : Source | Array(Source), output_filename : String) : Result
      source = [source] unless source.is_a?(Array)
      # 1 new_program
      program = new_program(source)

      # 2 parse
      node = parse program, source

      # 3 semantic
      node = program.semantic node, cleanup: !no_cleanup?

      # 4 codegen
      units = codegen program, node, source, output_filename unless @no_codegen

      # 5 cleanup
      # ... omission ...
      Result.new program, node
    end
  end
end

After this, linking is performed by the standard linker.

Command-Line Options for Compilation Statistics

Crystal provides a command-line option that displays compilation time statistics:

crystal build -s hoge.cr

However, this method doesn't show the execution time of native LLVM functions, which was insufficient for this article's investigation.

To get to the heart of the matter, I used print debugging to measure the compilation time.

Native LLVM Functions Called During Codegen

During the codegen stage, the following native LLVM functions are called:

LibLLVM.run_passes
- Applies optimization passes to LLVM IR
LibLLVM.target_machine_emit_to_file
- Generates object files

I measured the execution time of these functions using print debugging as well.

Results

Here are the results from compiling the Crystal compiler itself:

Stage	Time (seconds)
new_program	0.000388207
parse	0.000065000
semantic	12.552620028
codegen	355.245409133
- LibLLVM.run_passes	252.340241198
- LibLLVM.target_machine_emit_to_file	93.280652845
cleanup	0.000013180
total	367.798495548

Let me visualize this with a bar chart:

NOTE: This graph is from the original article and may differ slightly from the latest compiler.

Were the results what you expected?

Lexical analysis and parsing take virtually no time!
Semantic analysis (including type inference) also takes relatively little time!

In fact, the vast majority of the compilation time is spent in codegen, specifically in:

LibLLVM.run_passes
LibLLVM.target_machine_emit_to_file

These are external LLVM function calls that happen outside of Crystal's control!

In this case of building the Crystal compiler itself with --release, the majority of compilation time was spent on LLVM optimization and code generation.

This might be a somewhat surprising result, don't you think?

How to Speed Up the Crystal Compiler

The parts of the Crystal compiler implemented in Crystal—namely lexical analysis, parsing, and semantic analysis—are already sufficiently fast. This means that to achieve further speedups, we would need hardcore approaches such as:

Introducing parallelization even in release builds
Optimizing LLVM itself (specifically for Crystal)
Improving Crystal to generate LLVM IR that's easier for LLVM to process

However, since these approaches aren't very practical for everyday use, let me introduce a more accessible method:

Use `-O3` Instead of `--release`

In the Crystal compiler, specifying --release is equivalent to specifying both -O3 and --single-module. If you're willing to sacrifice some optimization, you can specify only the -O3 option, which enables parallelization and can speed up compilation in many cases.

From here on, there's a bit of a speculative element to the discussion.

Why Crystal Doesn't Have Incremental Compilation or Shared Library Support

Crystal's `--release` Mode Includes `--single-module`

Crystal struggles with splitting code into separate compilation units and reusing the results. In particular, --release builds enable --single-module, which compiles everything into one massive LLVM module for optimization.

For comparison, Rust performs separate compilation for each crate even with --release. In Rust, you need to explicitly use -C lto=fat to get behavior similar to Crystal's, where the entire LLVM IR is optimized together.

Crystal's Weak Caching Mechanism

Crystal does have a mechanism that caches LLVM bitcode files (.bc) and object files on a per-type basis during normal builds, and can reuse object files only when the bitcode is completely unchanged.

This allows the compiler to skip the expensive object file generation step in some cases.

However, even in such cases, lexical analysis, parsing, and semantic analysis cannot be skipped. The comparison only happens after generating .bc files. And as we'll discuss later, cases where the bitcode is completely unchanged are actually quite rare.

Crystal Is a Statically-Typed Language Where the Caller Determines Types

Why can't Crystal split packages into multiple LLVM IR modules, precompile them, and reuse the results?

The main reason is that Crystal has strong type inference and union types, and the concrete types of methods change depending on the calling context.

Crystal is an unusual statically-typed language where the caller determines the types, enabling duck typing. However, the trade-off is that type signatures need to be inferred with every compilation.

Type IDs Change with Each Compilation

The Crystal compiler assigns a number to every class to resolve types. With each compilation, every type that appears gets assigned a "number." Let's say class A gets assigned the number "10" in one compilation. If you make a small change to the code and recompile, "10" might be assigned to a different class. Linking object files created this way causes type inconsistencies and fails, because conditional branches based on types won't work correctly.

Additionally, when loading multiple Crystal shared libraries simultaneously, there's the problem of runtime functions being multiply defined.

This makes it difficult for Crystal to split code into parts, precompile them, and reuse them later.

But is this an inherent characteristic of the Crystal language? Let's consider this from a more social context.

The Crystal Language Community and Resource Constraints

Crystal is known as a language with Ruby-like concise syntax that delivers excellent performance.

However, the Crystal development team has limited resources. While there is a dedicated team at Manas.Tech and community contributors worldwide, the resources are still limited compared to large corporations.

For instance, imagine if Apple were developing Crystal.

Apple engineers might make changes to clang/LLVM itself to significantly improve compilation speed.

Or, like Swift, they might define a proper ABI and create an intermediate language or binary format well-suited to Crystal. Similar to how Swift has SIL (Swift Intermediate Language) as an intermediate representation before converting to LLVM IR, Crystal could have its own optimized intermediate language. This would enable comparing modules at that stage, resolving types, and generating object files from there. (Though I'm not entirely sure if this is possible within the LLVM framework.)

However, the Crystal compiler we have isn't like that. It generates monolithic, massive LLVM IR and delegates all optimization to LLVM. For package management, downloading source code directly from GitHub is the mainstream approach.

There still seems to be room for improvement.

The characteristic of slow compilation but fast execution is not purely a linguistic characteristic of Crystal, but also stems from the resource constraints of the Crystal development team. In other words, if significant resources were invested in development in the future, these issues could potentially be improved.

Conclusion

Designing an ABI specification or intermediate language for Crystal is extremely difficult. However, if someone achieves this, it could become Crystal 2.0 or Crystal 3.0.

Even without going that far, finding ways to split the generated LLVM IR into multiple modules, or mangling function names and global variables, would represent significant progress.

Crystal doesn't have as vibrant a library ecosystem as some other languages. While the reasons aren't entirely clear, as we improve the environment for code reuse, techniques for improving compilation speed may also develop.

That's all for this article. Thank you for reading to the end!

This article was originally written in 2024 and revised in December 2025. It was translated from Japanese to English using Claude Sonnet.

A Practical Guide to Parallel Programming in Crystal (2025)

kojix2 — Fri, 21 Nov 2025 07:24:48 +0000

This article is based on content created by kojix2 (a human) alternately calling DeepWiki and ChatGPT, but kojix2 (a human) has reviewed, edited, and proofread the entire text. The article was translated from Japanese to English using Claude. If you find any mistakes, please comment. Thank you.

Crystal's parallel processing is based on a hybrid model that primarily uses Fiber (cooperative and lightweight) and utilizes Thread (OS threads) when necessary.

ExecutionContext, which has been rapidly developed since around 2024-2025, provides a new abstraction layer for safely spreading Fibers across multiple threads.

This article organizes the latest parallel execution model in Crystal.

Building with Parallel Execution Enabled

As of November 19, 2025, you need to use the following two flags:

-Dpreview_mt: Enables parallel execution of Fibers
-Dexecution_context: Enables the use of ExecutionContext

crystal build -Dpreview_mt -Dexecution_context program.cr

While Crystal's parallel execution is in preview, it has been over 6 years since its release and works without issues in many cases.

Overview of Crystal's Concurrency and Parallelism

Crystal has five major execution models:

Model	Execution Unit	Characteristics
Fiber (default)	Fiber (lightweight thread)	Cooperative, automatic switching on I/O, lightweight
ExecutionContext::Concurrent	Fiber group	Sequential execution on 1 thread (concurrent)
ExecutionContext::Parallel	Fiber group	Execution on multiple threads (parallel)
ExecutionContext::Isolated	1 Fiber + 1 dedicated thread	For GUI loops and blocking FFI calls
Thread	OS thread	For handling low-level operations

The standard design is as follows:

Use Fiber as the basis
Use ExecutionContext only where parallelism is needed

Cooperative Scheduling of Fiber and I/O

Fiber is a cooperative execution model that has existed for a while. By default (when parallel execution is disabled), switching occurs only when:

I/O
sleep
Channel receive/send
Fiber.yield

are triggered. (Fiber.suspend is called and the Fiber is suspended.)

The basic approach in Crystal is to put I/O-bound processing on Fibers.

Each Fiber has its own stack memory. The stack has a virtual size of 8MiB, but it's only reserved, and actual memory usage starts from 4KiB.

What is a "Stack" in Crystal?

When reading Crystal documentation, you'll encounter the word "stack." Note that this differs from the general meaning of "stack" - it refers to a "memory region that behaves like a stack," which is actually memory allocated from the OS heap.

What is placed on the stack:

Value types: Struct, Tuple, StaticArray, etc.
Primitive types: Int32, Float64, Bool, Char, etc.
Pointers to reference types: Array, Hash, etc. (The reference type objects themselves are placed on the heap, but the pointers to them are placed on the stack)

Values placed on the stack are not directly targeted by GC, but they are scanned during GC execution to prevent heap objects referenced by stack variables from being mistakenly collected.

As described later, the key point is that when captured by closures like spawn do end, the above value types are exceptionally placed on the heap and become accessible from other threads.

Background Knowledge: Thread / Scheduler / Fiber

In Crystal, each thread has its own Crystal::Scheduler that manages the fibers to be executed.

Main Thread Creation and Initialization

The main thread is automatically created by the OS when the program starts. Subsequently, when Thread.current is called, a Thread object for the main thread is created. The stack address of the main thread is obtained with the stack_address method. This is the actual thread stack allocated by the OS when the process starts.

Main Fiber Creation

When the Thread object is initialized, the main Fiber is created simultaneously. The main Fiber uses a special constructor Fiber.new(stack : Void*, thread) to utilize the OS thread stack. Unlike normal Fibers, makecontext is not called, and it uses the already running context.

Lazy Initialization of Scheduler

The main thread's scheduler is initialized when Thread#scheduler is called. The scheduler has:

@event_loop: Platform-specific event loop
@stack_pool: Fiber stack reuse pool
@runnable: Queue of runnable fibers
@main: Thread's main fiber

Default Thread Configuration

Without using ExecutionContext and preview_mt, only the main thread exists. The main thread has its own Crystal::Scheduler instance, which manages all fibers.

Stack Allocation for New Fibers

When a new Fiber is created, stack memory is obtained from Fiber::StackPool. When a Fiber terminates, its stack is returned to the pool through StackPool.release for reuse by the next Fiber. Stack allocation reserves 8MiB of virtual address space. Only the bottom page of the stack (4KiB) is committed to physical memory. When the stack grows and reaches a guard page, that page's guard status is removed and a new guard page is committed. This continues until reserved pages run out.

Parallel Execution with ExecutionContext

ExecutionContext is a "virtual thread group" that executes Fibers together.

ExecutionContext::Concurrent

This is the same concurrent execution as traditional Fibers. It's safe and easy to handle.

ctx = Fiber::ExecutionContext::Concurrent.new("workers")

Only one Fiber executes at a time within the context
Therefore, access contention to shared variables doesn't occur (however, using Mutex/Atomic is considered safer as "recommended safety")

Suitable when parallelization is unnecessary but you want to use Fibers.

ExecutionContext::Parallel

Parallel execution on multiple threads.

ctx = Fiber::ExecutionContext::Parallel.new("workers", 8)

Changing parallel size during execution:

ctx.resize(count)

Each thread runs its own scheduler
- The scheduler is an instance of the Fiber::ExecutionContext::Parallel::Scheduler class, responsible for executing individual Fibers. It has a local queue and manages runnable Fibers. It searches for and executes Fibers in the main loop (run_loop).
Fibers within the context are moved to and executed on arbitrary threads
- When a Fiber moves between threads, only the execution context (registers and stack pointer) actually moves. The Fiber's stack memory (heap from the OS perspective) does not move. This memory region is fixed during the Fiber's lifetime. When a Fiber resumes on a new thread, the saved stack pointer is loaded and points to the original stack memory region.
Due to parallelism, Atomic / Mutex is mandatory for shared mutable state.
- Local variables and instance variables (pointers) captured from the closure that spawns the Fiber are placed in a closure data structure allocated on the heap, and that pointer moves with the Fiber. This means that value type local variables (like StaticArray) that would normally be allocated on the stack are exceptionally allocated on the heap.

Parallel is the central feature of Crystal's goal of "safe and fast parallel execution."

ExecutionContext::Isolated

1 Fiber = 1 dedicated thread

gui = Fiber::ExecutionContext::Isolated.new("GUI") do
  Gtk.main
end
gui.wait

A single Fiber monopolizes an OS thread
Safe to use blocking I/O (e.g., GUI event loops, blocking FFI calls)
Cannot add additional spawns within the context (they are forced to go to the default context)

Suitable for main loops of GUI applications and FFI that calls C functions with I/O bundle blocking.

Default Fiber Without Using ExecutionContext

When ExecutionContext is not specified, Fibers execute in the default ExecutionContext (Fiber::ExecutionContext.default). The default ExecutionContext is Parallel, but since the initial parallelism is set to 1, it behaves the same as Concurrent.

Fiber::ExecutionContext.default.size # => 1

Basic Patterns of Channel and WaitGroup

Crystal's parallel processing is based on a Channel + WaitGroup pattern similar to Go.

Producer-Consumer (Parallel)

consumers = Fiber::ExecutionContext::Parallel.new("consumers", 8)
channel    = Channel(Int32).new(64)
wg         = WaitGroup.new(32)
result     = Atomic.new(0)

32.times do
  consumers.spawn do
    while value = channel.receive?
      result.add(value)
    end
  ensure
    wg.done
  end
end

1024.times { |i| channel.send(i) }
channel.close
wg.wait

p result.get  # => 523776

Communication via Channel
Synchronization via WaitGroup
Safe updates of shared state via Atomic

This is the basic form of parallel execution in Crystal.

32 consumer Fibers executing in parallel atomically add 1024 integer values (0-1023) received from the channel and calculate their sum (523776)

Protection of Shared Variables in Concurrent

Concurrent is serial execution so contention doesn't occur, but Crystal officially states that using Atomic / Mutex is preferable.

Atomic / Mutex / SpinLock

Atomic

A variable that can safely read and write values even when accessed simultaneously from multiple threads, a basic synchronization primitive for preventing race conditions.

Directly mapped to LLVM atomic instructions
compare_and_set, add, sub, get, set
Same memory orders as C/C++: Acquire / Release / Relaxed, etc.

Types that cannot be used with Atomic include value types such as structures (Struct) and StaticArray.

Mutex

A lock that protects code regions (critical sections) that must not be executed simultaneously by multiple Fibers, controlling so that only one Fiber can execute at a time.

Fiber-safe
Three modes: Checked / Reentrant / Unchecked
Re-entry prohibited by default (safe)

mutex = Mutex.new  
shared_array = [] of Int32  

10.times do |i|  
  spawn do  
    mutex.synchronize do  
      # Only one Fiber executes at a time within this block  
      shared_array << i  
      sleep 0.001.seconds
    end  
  end  
end  

sleep 1.second  
puts shared_array.size  # => 10

Example of manually locking/unlocking:

mutex = Mutex.new  
counter = 0  

10.times do  
  spawn do  
    mutex.lock  
    begin  
      counter += 1  
      sleep 0.001.seconds  
    ensure  
      mutex.unlock  # Always unlock  
    end  
  end  
end  

sleep 1.second  
puts counter  # => 10

SpinLock

A lightweight lock specialized for very short-term locks. It continues to use CPU while waiting (spinning), so it's unsuitable for long-term locks.

For very short critical sections
Only effective with preview_mt / win32

SpinLock is used in implementations such as Crystal::Scheduler, Crystal::ThreadLocalValue, Crystal::Once, Mutex, WaitGroup, EventLoop::Polling, and Fiber::StackPool. There are almost no scenarios where users would directly use SpinLock in code.

Areas to Be Careful About in the Standard Library

The following are areas in the Crystal standard library that may not guarantee complete thread safety and require caution.

What Qualifies as a Shared Variable Subject to Contention?

While we've used the term "shared variable," Crystal doesn't have user-accessible global variables, so the most typical shared variable is a class variable.

Class variables: Always shared variables (determined by variable type)
Instance variables and local variables: Determined by whether they are referenced from multiple Fibers or threads when spawned

If captured by spawn, local variables can also become shared variables.

ENV

The safety of Unix's getenv/setenv/unsetenv is environment-dependent
Parallel modification is not recommended

This is also discussed in the Crystal Forum:

https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29

Class Variables

In Crystal, you can use the @[ThreadLocal] annotation to make class variables thread-local.

class Foo
  @[ThreadLocal]
  @@var = 123

  def self.var
    @@var
  end
end

In this case, each thread has an independent copy of @@var, so changing the value in one thread doesn't affect other threads.

Class variables without @[ThreadLocal] are shared. In this case, you need to use Atomic / Mutex for parallel updates.

IO (File, Socket, STDOUT/ERR)

Safety may not be guaranteed when simultaneously operating on the same IO from multiple threads.

Logger

Logger also uses IO internally. Writing to the same Logger from multiple threads may not be safe.

Report Any Issues You Find

Crystal is a programming language with far fewer users compared to languages like Python and Java. User reports are very valuable and precious. It's important to continue improving the language and libraries by actively reporting bugs to Crystal Forum and GitHub issues.

Cases Where Thread Should Be Used

Thread directly represents the OS's native thread. It can be used when low-level control is needed.

There are almost no cases where you should use Thread directly without using ExecutionContext.
It may be an option in cases such as:

Want to parallelize compute-intensive tasks
FFI is blocking and cannot suspend Fiber (however, if the FFI function is CPU-intensive processing, blocking is considered desirable behavior)
C library requires thread-local initialization

Using Thread::Channel enables safe communication between threads.

FFI (C Library Calls) and Parallel Execution

Since C libraries are not necessarily thread-safe, following patterns like these is considered safe:

Wrap with Mutex
Isolate in ExecutionContext::Isolated context
Dedicated Thread + Thread::Channel
Use ThreadLocal state

Summary

Crystal's parallel execution is currently in the midst of major evolution. In addition to Fiber, which has been used for concurrent execution in I/O-bound processing, ExecutionContext::Parallel now enables full-fledged parallel processing. Using Atomic / Mutex / Channel / WaitGroup, you can build safe parallel processing similar to Go. Execution::Isolated is effective for GUI / FFI. Thread can be used in special cases where OS threads need to be handled directly. Note that there are ambiguous parts regarding thread safety in the standard library.

Practical Guidelines for Parallel Execution in Crystal

Leave I/O to Fiber
- No special action needed as Crystal's I/O model is tightly integrated with Fiber.
Use Parallel or Thread for CPU-bound tasks
- ExecutionContext::Parallel is the first choice.
Protect shared state with Atomic or Mutex
- Treat gray zones like ENV and Logger conservatively
Test explicitly using -Dpreview_mt and -Dexecution_context

This concludes the article. Thank you for reading to the end.

Notes on Building CLI and GUI tools with Crystal

kojix2 — Wed, 15 Oct 2025 03:25:14 +0000

This post is just me writing down some vague thoughts that are floating around in my head right now.

Sorry if you came here expecting a well-structured tutorial — but you know, if I try to organize everything perfectly, I’ll never publish anything.

Crystal originated from the Ruby community, so there are many people who want to build web applications with it.

However, the Crystal programming language itself can be described as “a statically compiled language with a Ruby-like syntax and a garbage collector, somewhat like C with GC and type inference.”

It’s not necessarily optimized for web applications.

Personally, I wanted to use Crystal for command-line tools and GUI apps.

For some reason, though, there don’t seem to be many people building CLI tools in Crystal.

The ecosystem for building and distributing binaries wasn’t very well developed for a long time.

That used to be a real pain, but after gradually solving those issues, I think we’re now at the point where most CLI tools I want can be built and distributed in Crystal without much trouble.

On the GUI side, the situation is similar — there aren’t many libraries available.

But this isn’t unique to Crystal. GUI programming, in general, depends heavily on opaque, platform-specific APIs, which don’t always play nicely with open-source development.

Still, I decided to work on it. I created libui-ng bindings for both Ruby and Crystal.

As it turned out, libui-ng doesn’t work very well with garbage-collected languages, but I managed to make it usable anyway.

Then I got curious about Tauri and Electron — the now-famous WebView-based app frameworks.

Personally, I can barely read JavaScript, so I had no real interest in those at first, but their popularity made me curious.

Crystal also has a WebView binding.

And as I mentioned earlier, web app development in the Crystal ecosystem is quite active.

So I decided to give it a try.

I learned that “WebView” isn’t a single library — each OS (Windows, Linux, macOS) provides its own.

Projects like webview/webview and Tauri’s wry act as unifying layers over these platform-specific APIs.

Tauri itself uses WebView under the hood while also providing a framework to handle security and integration with Rust backends.

Maybe it’s possible to use TypeScript and other frontend tools with Crystal too, but personally, I prefer the more old-fashioned approach — something like Kemal + ECR, the “classic amateur” way.

When I actually started building an app with Crystal + WebView, I discovered a few things.

First, you need to pay attention to event loops and thread management.

The WebView itself runs in a separate process, and at the same time you need to run a Kemal server.

That means you often have to make it multithreaded and carefully manage your execution contexts or Fibers — otherwise, things simply won’t run correctly.

Then there’s the build, linking, and packaging pain.

I sent a few pull requests to the Crystal WebView project, which helped a bit, but building on MinGW is still rough.

MSVC technically works, but it’s just too tedious to deal with, so I decided to stay away from it.

Bundling shared libraries is also tricky.

I’d prefer to lean toward static linking whenever possible, but depending on licensing and security update concerns, it’s sometimes better to link against system or bundled shared libraries.

Even if you get the build and linking sorted out, packaging is still painful — creating application packages, Apple disk images (DMG), or Windows installers with Inno Setup, or even .deb packages.

I discovered tools like fpm, which are really useful, but in the end, I still end up asking AI to help me write custom GitHub Actions YAML and shell scripts.

And then, once you finally have a working binary, Windows or macOS antivirus software will happily flag it as suspicious.

Maybe for people doing this professionally, all this doesn’t sound like a big deal, but as someone doing it for fun, it’s a lot of work.
Even so, after all the pain, I’ve started to feel like — maybe, just maybe — this setup is actually pretty cool.

This post was translated from the original Japanese version using ChatGPT.

You can read the original post here [JA]

libui and Garbage Collection - Challenges in Creating Ruby and Crystal Bindings

kojix2 — Fri, 26 Sep 2025 02:15:46 +0000

Introduction

libui is a GUI library that supports the three major operating systems: Windows, macOS, and Linux (currently, the successor project is libui-ng). Internally, it contains three different libraries that call native APIs, unified under a single ui.h header file to provide similar UI functionality across all operating systems. It can also be easily used from other languages through FFI (Foreign Function Interface). While development has slowed somewhat recently, there are few similar libraries available, and libui continues to maintain its unique value.

libui Bindings

I have been creating Ruby bindings and Crystal bindings for libui. Through this process, I have come to realize how difficult it is to combine libui with garbage collection.

The Problem of Disappearing Controls and Callback Functions

Creating Ruby or Crystal bindings for libui is not particularly difficult. The work of checking function signatures and writing matching low-level bindings can be done mechanically.

However, when you call these low-level APIs to create simple applications, the following problems occur with a certain probability:

Controls disappear and memory access violations occur
Callbacks disappear and memory access violations occur

Both Ruby and Crystal are languages that use garbage collection (GC), so memory that is determined to be unused gets reclaimed. As a result, pointers and callback functions that should be used in the future by the GUI main loop are mistakenly freed by the GC.

In GC languages, the timing of memory deallocation is controlled indirectly through references.

In Ruby, callback functions are unconditionally stored in a dedicated array. This effectively creates a memory leak (old callbacks remain in the array even after new ones are added), but since callback functions are usually finite in number in GUI applications, this is not a practical problem.

Crystal uses a more complex management approach. Each callback function is tied to the instance of its related control. For example, a callback function that fires when a button is pressed is owned by that button. Additionally, the nested relationships of controls themselves are reproduced as an ownership tree. For example, a Window contains a Box, and the Box holds a Label and Button.

By using this ownership tree, we can significantly reduce the problem of incorrect collection by the GC.

By the way, why does Crystal's GC collect pointers even though controls may be referenced later in the main loop? I don't have a clear understanding of this point, but it's possible that memory tracking becomes difficult when closures are boxed.

libui's Memory Management Rules

libui is a C library designed for users to manage memory themselves. However, in practice, it introduces a mechanism where "when a parent control is freed, the memory of child controls is also freed." The controls that can be parent controls are Window, Box, Grid, Group, Tab, and Form.

When you destroy these, child controls are freed first, then the parent itself is freed. Therefore, in actual operation, you often free child controls collectively by destroying the Window.

The problem is that on the Crystal side, we cannot detect such deallocation within native libraries. NULL checks might help us guess immediately after memory deallocation (libui sets pointers to NULL before deallocation), but this is unreliable.

Window deallocation can happen automatically. When the [x] button in the Window's title bar is clicked, a callback function is triggered by uiWindowOnClosing, and if the return value is true, the Window's destroy is automatically triggered.

In contrast, uiOnShouldQuit triggered from the Quit option in the menu bar represents application termination, so it does not automatically trigger destroy for the window. The user must destroy the Window themselves and call uiQuit.

libui's Memory Leak Detection Mechanism

libui has a built-in mechanism for detecting memory leaks. This is a very useful feature, but it often doesn't work well with GC languages. This is because in GC, the timing of memory deallocation is indefinite, and we cannot guarantee that all memory has been freed at the time of checking. Therefore, implementations that hook into GC's finalize to perform deallocation should be avoided.

Table Deallocation Procedure

Table is based on Model-View architecture, with TableModel and Table separated. A TableModel can only be freed after all Tables using that model have been destroyed. Therefore, the deallocation procedure is as follows:

Remove the Table from its parent control
Explicitly destroy the Table
Finally destroy the TableModel

Area Deallocation Procedure

Unlike Table, Area can be handled by simply destroying the control.

MultilineEntry Deallocation Procedure

While detailed investigation of the cause is still in progress, on macOS there appear to be cases where problems occur unless you remove it from the parent control and destroy it individually, similar to Table.

Summary

When using libui (libui-ng), there are many important considerations regarding memory management, especially deallocation.

In languages that use garbage collection like Crystal and Ruby, you normally don't need to worry about memory. Even with C language bindings, manual memory management often becomes unnecessary by using deallocation callback functions like finalize.

However, I learned that with libraries like GUI libraries that have interactive operations where timing and synchronization are important, there are cases where you cannot rely too much on GC and must manually free memory at appropriate times.

In such cases, Ruby and Crystal often provide APIs that use blocks based on RAII (Resource Acquisition Is Initialization) concepts. This can handle more than half of the cases.

There seem to be cases that are difficult to handle with this alone, but I am still learning and experimenting through trial and error.

Thank you for reading. This article was translated from Japanese to English by Claude Sonnet4.

libui とガベージコレクション - Ruby と Crystal のバインディングを作って感じた難しさ

12 Things I Learned Writing CLI Tools in Crystal

kojix2 — Mon, 22 Sep 2025 05:45:49 +0000

I love the Crystal programming language. For the past two or three years, I have been building command-line tools with it. During this time, I often compared it with Ruby, and I encountered many differences, discoveries, and obstacles. In this article, I will share them.

1. Similarity to Ruby

Crystal looks very similar to Ruby. Many common Ruby idioms also work in Crystal. Crystal is statically typed, but most of the time you do not need to write types explicitly. Type inference will do the work for you.

2. Use DeepWiki

DeepWiki is very useful for learning Crystal. For a niche language, it is one of the best resources. You can even ask questions in your native language.

3. Arrays and Hashes cannot mix types

In Crystal, you cannot freely mix different types in an Array or Hash. Ruby allows this, but Crystal does not. You can use union types, but usually it is better to avoid them. Instead, consider one of these options:

Make a class or struct
Use a record
Use a Tuple for temporary data

At first this may feel inconvenient, but you get used to it.

# Array(Int32 | String | Symbol) - not recommended
arr = [1, "two", :three]

# OK: Tuple for fixed positions
t = {1, "two", :three}

# OK: record for structured data
record Item, id : Int32, name : String, tag : Symbol
items = [
  Item.new(1, "apple",  :fruit),
  Item.new(2, "orange", :fruit),
]

4. No `eval`

Crystal does not have eval. This is one big difference from Ruby.
If you really need dynamic evaluation, you should use Ruby. Another choice is to embed mruby or use a library like Anyolite. Crystal itself has an interpreter, but it is not practical and slower than Ruby or mruby.

# Ruby
code = "1 + 2"
puts eval(code) # => 3

# Crystal has no eval
# You must design differently

5. Method overloading

In Ruby, it is common to branch on the argument type inside one method.
In Crystal, it is more natural to use method overloading. This makes the code clearer.

def square(x : Int32) : Int32
  x * x
end

def square(x : String) : Int32
  square(x.to_i)
end

puts square(12)     # => 144
puts square("12")   # => 144

6. Return types should be consistent

In Ruby, a method can return values of different types. In Crystal, if the return type is not clear, you will run into trouble. If you want to return different types, you should split the method. You can use a union type, but it is not recommended.

# not recommended
def maybe_value(flag : Bool) : Int32 | String
  flag ? 42 : "forty-two"
end

def value_int : Int32
  42
end

def value_str : String
  "forty-two"
end

7. Handling Nil

Pay attention to whether a variable can be Nil.
If it can, you need to handle it with not_nil!, if val = maybe_val, or the safe navigation operator.

name : String? = nil

if n = name
  puts n.upcase
else
  raise "name is nil"
end

8. Garbage collection

Crystal uses LLVM and relies on an external GC (libgc).
Performance is often close to Rust or Nim, but memory profiling and tuning can be difficult. Also, the timing of GC is not predictable, so Crystal may not be suitable for real-time systems.

9. Asynchronous I/O

Asynchronous I/O is available by default. Some developers feel it is easier to use than in Rust.

10. Linking when distributing

Crystal programs are usually linked with libgc and other libraries such as libpcre2. Be careful when distributing binaries.

Linux: You can build statically linked binaries with GitHub Actions + Docker + musl
macOS: You can prepare a Homebrew Tap, or build portable binaries with static linking for libgc, libpcre2, and others

See also: github actions workflow in lolcat.cr

11. Windows support

Crystal now works on Windows (MSVC / MinGW64) more stably than before. Parallel execution also works. However, solving C library dependencies can still be painful. If you are not familiar with Windows, you may need to ask AI for help.

12. Limitations of OptionParser

The standard OptionParser does not support combined short options.
So ls -l -h works, but ls -lh does not.
I plan to create a pull request to fix this in the future.

Update: This has already been resolved as of February 2026. Starting from version 1.20, short option bundling will be enabled!

https://github.com/crystal-lang/crystal/pull/16563

Conclusion

Writing command-line tools in Crystal is sometimes painful. But at the same time, you learn a lot. I believe the “best days” of the Crystal language are not in the past or present, but in the future.

This post was originally based on my reply to a thread on Reddit, then expanded into a Japanese article on Qiita, and now translated into English with the help of ChatGPT.

Embedding the Crystal Compiler in Your Program

kojix2 — Sat, 09 Aug 2025 09:46:26 +0000

Introduction

The Crystal compiler can be used as a library.
This document explains how to set it up and use it.

Creating the Project

First, create a new Crystal project.

crystal init app duck_egg
cd duck_egg

Editing `shard.yml`

Edit the shard.yml file as follows.
In this example, we add markd and reply to the dependencies section.

name: duck_egg
version: 0.1.0

targets:
  🥚:
    main: src/duck_egg.cr

dependencies:
  markd:
    github: icyleaf/markd
  reply:
    github: I3oris/reply

Creating `duck_egg.cr`

Create src/duck_egg.cr and add the following code.

require "compiler/requires"

BIRDS = [
  { "🐔", "cluck!" },
  { "🐓", "cock-a-doodle-doo" },
  { "🦃", "gobble" },
  { "🦆", "quack" },
  { "🦉", "hoot" },
  { "🦜", "squawk" },
  { "🕊", "coo" },
  { "🦢", "honk" },
  { "🦩", "brrrrt" },
  { "🐧", "honk honk" },
  { "🦤", "boop" },
  { "🦕", "Bwooooon!!" },
  { "🦖", "Raaaaawr!!" }
]

bird, sound = BIRDS.sample

compiler = Crystal::Compiler.new
source = Crystal::Compiler::Source.new(bird, %Q(puts "#{bird}  < #{sound}"))
compiler.compile source, bird

In this program, the Crystal compiler is embedded in the target 🥚.
When 🥚 is executed, a random bird is selected.
The embedded compiler generates a binary that displays the bird and its sound.

Building and Running

First, build the program.

shards build

Next, check the CRYSTAL_PATH environment variable to find the location of the Crystal standard library.

crystal env

The Crystal compiler requires the standard library even for very simple code such as puts 0.
Therefore, CRYSTAL_PATH must be set to include the path to the standard library.

export CRYSTAL_PATH=lib:/usr/local/bin/../share/crystal/src

Run the program:

bin/🥚

Example output:

🦖

Run the generated binary:

./🦖

Output:

🦖  < Raaaaawr!!

Summary

By using the Crystal compiler as a library, you can generate and compile code dynamically. This technique can be applied in many interesting ways.

Easily Visualize Debian Package Dependencies with debtree

kojix2 — Fri, 08 Aug 2025 05:38:33 +0000

Introduction

Sometimes you might want a quick and easy way to visualize and understand which packages a given package depends on. With the debtree package and graphviz, you can do this in just a few steps.

Installation

Install both debtree and graphviz:

apt install debtree graphviz

Visualizing dependencies

If you can specify the package name you want to visualize:

dpkg -l | grep ufw # Check if it exists

You can easily visualize the packages it depends on:

debtree ufw | dot -T png -o ufw_deps.png

Here, I specified -T png to output a PNG image for embedding in Qiita, but you can choose from many other formats like svg.
If you have a desktop environment, you can also visualize it instantly using x11:

debtree ufw | dot -T x11

Visualizing reverse dependencies

To visualize reverse dependencies, you can use the -R / --show-rdeps option.

However, using -R alone will also display many packages that are not actually installed on your system.
For a cleaner view, add the -I / --show-installed option to limit the output to installed packages only:

debtree -R -I iptables | dot -T x11

From this, you can see that docker-ce and ubuntu-standard depend on iptables.

That’s it for today!

Writing SIMD in Crystal with Inline Assembly

kojix2 — Thu, 07 Aug 2025 01:28:30 +0000

Introduction

In this article, we explore how to write SIMD instructions—SSE for x86\64 and NEON for AArch64—using inline assembly in the Crystal programming language.
Crystal uses LLVM as its backend, but it doesn’t yet fully optimize with SIMD.
This is not a performance tuning guide, but rather a fun exploration into low-level programming with Crystal.

`asm` Syntax

Crystal provides the asm keyword for writing inline assembly. The syntax is based on LLVM's integrated assembler.

asm("template" : outputs : inputs : clobbers : flags)

Each section:

template: LLVM-style assembly code
outputs: Output operands
inputs: Input operands
clobbers: Registers that will be modified
flags: Optional (e.g., "volatile")

For a detailed explanation, see the official docs

Types of SIMD Instructions

SSE / AVX for Intel and AMD CPUs (x86_64)
NEON for ARM CPUs (like Apple Silicon)

Types of Registers

Registers Used in x86_64

General-purpose: rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp, r8–r15
SIMD:

Name	Width	Instruction Set	Usage
`xmm0–xmm15`	128-bit	SSE	Floats, ints
`ymm0–ymm15`	256-bit	AVX	Wider SIMD
`zmm0–zmm31`	512-bit	AVX-512	Used in newer CPUs

Registers Used in AArch64 (NEON)

Vector registers: v0–v31
- v0.4s = 4 × 32-bit floats
- v1.8h = 8 × 16-bit half-precision floats

Examples of Register Specification

SSE: xmm0, xmm1, etc.
NEON: v0.4s, v1.8h, etc.

Note:

LLVM assigns SSE registers automatically
NEON requires explicit register naming in inline assembly

Prerequisites

To follow along:

Emit LLVM IR:

  crystal build --emit llvm-ir foo.cr

Emit assembly:

  crystal build --emit asm foo.cr

Benchmarking tool: hyperfine
Use of uninitialized and to_unsafe for low-level memory access

Basic Vector Operations

Vector Addition

SSE (x86_64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]
b = StaticArray[5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]

def simd_vector_add(a : StaticArray(Float32, 4), b : StaticArray(Float32, 4)) : StaticArray(Float32, 4)
  result = uninitialized StaticArray(Float32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     addps %xmm1, %xmm0      // perform parallel addition of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "xmm0", "xmm1", "memory"
          : "volatile"
  )

  result
end

puts "Vector addition: #{simd_vector_add(a, b)}"

NEON (AArch64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]
b = StaticArray[5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]

def simd_vector_add(a : StaticArray(Float32, 4), b : StaticArray(Float32, 4)) : StaticArray(Float32, 4)
  result = uninitialized StaticArray(Float32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fadd v2.4s, v0.4s, v1.4s // add each element
     st1 {v2.4s}, [$0]        // store the result"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Vector addition: #{simd_vector_add(a, b)}"

Vector Multiplication

SSE (x86_64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]
b = StaticArray[5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]

def simd_vector_multiply(a : StaticArray(Float32, 4), b : StaticArray(Float32, 4)) : StaticArray(Float32, 4)
  result = uninitialized StaticArray(Float32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "movups ($1), %xmm0      // load vector a into xmm0
     movups ($2), %xmm1      // load vector b into xmm1
     mulps %xmm1, %xmm0      // perform parallel multiplication of four 32-bit floats
     movups %xmm0, ($0)      // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "xmm0", "xmm1", "memory"
          : "volatile"
  )

  result
end

puts "Vector multiplication: #{simd_vector_multiply(a, b)}"

NEON (AArch64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]
b = StaticArray[5.0_f32, 6.0_f32, 7.0_f32, 8.0_f32]

def simd_vector_multiply(a : StaticArray(Float32, 4), b : StaticArray(Float32, 4)) : StaticArray(Float32, 4)
  result = uninitialized StaticArray(Float32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "ld1 {v0.4s}, [$1]        // load vector a
     ld1 {v1.4s}, [$2]        // load vector b
     fmul v2.4s, v0.4s, v1.4s // multiply each element
     st1 {v2.4s}, [$0]        // store the result"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Vector multiplication: #{simd_vector_multiply(a, b)}"

Aggregation Operations

Vector Sum

SSE (x86_64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]

def simd_vector_sum(vec : StaticArray(Float32, 4)) : Float32
  result = uninitialized Float32
  vec_ptr = vec.to_unsafe
  result_ptr = pointerof(result)

  asm(
    "movups ($1), %xmm0      // load vector into xmm0
     haddps %xmm0, %xmm0     // horizontal add: [a+b, c+d, a+b, c+d]
     haddps %xmm0, %xmm0     // horizontal add again: [a+b+c+d, *, *, *]
     movss %xmm0, ($0)       // store the first element of result"
          :: "r"(result_ptr), "r"(vec_ptr)
          : "xmm0", "memory"
          : "volatile"
  )

  result
end

puts "Vector sum: #{simd_vector_sum(a)}"

NEON (AArch64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]

def simd_vector_sum(vec : StaticArray(Float32, 4)) : Float32
  result = uninitialized Float32
  vec_ptr = vec.to_unsafe
  result_ptr = pointerof(result)

  asm(
    "ld1 {v0.4s}, [$1]         // load vector
     faddp v1.4s, v0.4s, v0.4s // pairwise add: [a+b, c+d, a+b, c+d]
     faddp v2.2s, v1.2s, v1.2s // pairwise add again: [a+b+c+d, *]
     str s2, [$0]              // store the final sum"
          :: "r"(result_ptr), "r"(vec_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Vector sum: #{simd_vector_sum(a)}"

Finding Maximum Value

SSE (x86_64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]

def simd_vector_max(vec : StaticArray(Float32, 4)) : Float32
  result = uninitialized Float32
  vec_ptr = vec.to_unsafe
  result_ptr = pointerof(result)

  asm(
    "movups ($1), %xmm0          // load vector into xmm0
     movaps %xmm0, %xmm1         // copy xmm0 to xmm1
     shufps $$0x4E, %xmm1, %xmm1 // swap upper and lower pairs
     maxps %xmm1, %xmm0          // compute max of each pair
     movaps %xmm0, %xmm1         // copy result to xmm1
     shufps $$0x01, %xmm1, %xmm1 // shuffle adjacent elements
     maxps %xmm1, %xmm0          // compute final max
     movss %xmm0, ($0)           // store the result"
          :: "r"(result_ptr), "r"(vec_ptr)
          : "xmm0", "xmm1", "memory"
          : "volatile"
  )

  result
end

puts "Vector max: #{simd_vector_max(a)}"

NEON (AArch64)

a = StaticArray[1.0_f32, 2.0_f32, 3.0_f32, 4.0_f32]

def simd_vector_max(vec : StaticArray(Float32, 4)) : Float32
  result = uninitialized Float32
  vec_ptr = vec.to_unsafe
  result_ptr = pointerof(result)

  asm(
    "ld1 {v0.4s}, [$1]         // load vector
     fmaxp v1.4s, v0.4s, v0.4s // pairwise max: [max(a, b), max(c, d), ...]
     fmaxp v2.2s, v1.2s, v1.2s // final pairwise max
     str s2, [$0]              // store result"
          :: "r"(result_ptr), "r"(vec_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Vector max: #{simd_vector_max(a)}"

Integer Operations

Integer Addition

SSE (x86_64)

int_a = StaticArray[1, 2, 3, 4]
int_b = StaticArray[10, 20, 30, 40]

def simd_int_add(a : StaticArray(Int32, 4), b : StaticArray(Int32, 4)) : StaticArray(Int32, 4)
  result = uninitialized StaticArray(Int32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "movdqu ($1), %xmm0      // load integer vector a into xmm0
     movdqu ($2), %xmm1      // load integer vector b into xmm1
     paddd %xmm1, %xmm0      // perform parallel addition of four 32-bit integers
     movdqu %xmm0, ($0)      // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "xmm0", "xmm1", "memory"
          : "volatile"
  )

  result
end

puts "Integer addition: #{simd_int_add(int_a, int_b)}"

NEON (AArch64)

int_a = StaticArray[1, 2, 3, 4]
int_b = StaticArray[10, 20, 30, 40]

def simd_int_add(a : StaticArray(Int32, 4), b : StaticArray(Int32, 4)) : StaticArray(Int32, 4)
  result = uninitialized StaticArray(Int32, 4)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "ld1 {v0.4s}, [$1]        // load integer vector a
     ld1 {v1.4s}, [$2]        // load integer vector b
     add v2.4s, v0.4s, v1.4s  // perform element-wise addition
     st1 {v2.4s}, [$0]        // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Integer addition: #{simd_int_add(int_a, int_b)}"

Saturated Addition

SSE (x86_64)

sat_a = StaticArray[29_000_i16, 30_000_i16, 31_000_i16, 32_000_i16,
  32_000_i16, 32_000_i16, 32_000_i16, 32_000_i16]
sat_b = StaticArray[1_000_i16, 1_000_i16, 1_000_i16, 1_000_i16,
  500_i16, 600_i16, 700_i16, 800_i16]

def simd_saturated_add(a : StaticArray(Int16, 8), b : StaticArray(Int16, 8)) : StaticArray(Int16, 8)
  result = uninitialized StaticArray(Int16, 8)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "movdqu ($1), %xmm0      // load 8 × 16-bit integers into xmm0
     movdqu ($2), %xmm1      // load 8 × 16-bit integers into xmm1
     paddsw %xmm1, %xmm0     // perform saturated addition
     movdqu %xmm0, ($0)      // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "xmm0", "xmm1", "memory"
          : "volatile"
  )

  result
end

puts "Saturated addition: #{simd_saturated_add(sat_a, sat_b)}"

NEON (AArch64)

sat_a = StaticArray[29_000_i16, 30_000_i16, 31_000_i16, 32_000_i16,
  32_000_i16, 32_000_i16, 32_000_i16, 32_000_i16]
sat_b = StaticArray[1_000_i16, 1_000_i16, 1_000_i16, 1_000_i16,
  500_i16, 600_i16, 700_i16, 800_i16]

def simd_saturated_add(a : StaticArray(Int16, 8), b : StaticArray(Int16, 8)) : StaticArray(Int16, 8)
  result = uninitialized StaticArray(Int16, 8)
  a_ptr = a.to_unsafe
  b_ptr = b.to_unsafe
  result_ptr = result.to_unsafe

  asm(
    "ld1 {v0.8h}, [$1]          // load 8 × 16-bit integers from a into v0
     ld1 {v1.8h}, [$2]          // load 8 × 16-bit integers from b into v1
     sqadd v2.8h, v0.8h, v1.8h  // perform saturated addition
     st1 {v2.8h}, [$0]          // store result to memory"
          :: "r"(result_ptr), "r"(a_ptr), "r"(b_ptr)
          : "v0", "v1", "v2", "memory"
          : "volatile"
  )

  result
end

puts "Saturated addition: #{simd_saturated_add(sat_a, sat_b)}"

Examining LLVM-IR and Assembly

To inspect LLVM IR output:

crystal build your_file.cr --emit llvm-ir --no-debug

To inspect raw assembly:

crystal build your_file.cr --emit asm --no-debug

You’ll see that your inline asm blocks are preserved as-is, even with optimizations (-O3).

__crystal_once.exit.i.i:                          ; preds = %else.i.i.i, %.noexc98
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %path.i.i.i.i.i)
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %obj1.i.i.i.i)
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %b2.i.i.i)
  store <4 x float> <float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00>, ptr %obj1.i.i.i.i, align 16
  store <4 x float> <float 5.000000e+00, float 6.000000e+00, float 7.000000e+00, float 8.000000e+00>, ptr %b2.i.i.i, align 16
  call void asm sideeffect "ld1 {v0.4s}, [$1] \0Ald1 {v1.4s}, [$2] \0Afadd v2.4s, v0.4s, v1.4s \0Ast1 {v2.4s}, [$0]", "r,r,r,~{v0},~{v1},~{v2},~{memory}"(ptr nonnull %path.i.i.i.i.i, ptr nonnull %obj1.i.i.i.i, ptr nonnull %b2.i.i.i) #30
  %314 = load <4 x float>, ptr %path.i.i.i.i.i, align 16
  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %path.i.i.i.i.i)
  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %obj1.i.i.i.i)
  call void @llvm.lifetime.end.p0(i64 16, ptr nonnull %b2.i.i.i)
  %315 = invoke ptr @GC_malloc(i64 80)
          to label %.noexc100 unwind label %rescue2.loopexit.split-lp.loopexit.split-lp.loopexit.split-lp

Lloh2300:
        ldr     q1, [x9, lCPI312_43@PAGEOFF]
        add     x8, sp, #164
        add     x9, sp, #128
        str     q0, [sp, #128]
        stur    q1, [x29, #-128]
        ; InlineAsm Start
        ld1.4s  { v0 }, [x9]
        ld1.4s  { v1 }, [x10]
        fadd.4s v2, v0, v1
        st1.4s  { v2 }, [x8]
        ; InlineAsm End
        ldr     q0, [x25]
        str     q0, [sp, #16]

Miscellaneous

When using SIMD with parallelism, memory bandwidth can become the bottleneck.
Although Crystal currently runs single-threaded by default, true parallelism is in progress, and memory limitations may become relevant in the future.

Conclusion

We’ve explored how to write SIMD operations in Crystal using inline asm, and examined how those instructions are lowered into LLVM IR and eventually into assembly.

This was a deep dive into low-level Crystal.

Appendix: SIMD Instruction Reference

SSE (x86_64)

Instruction	Description
`movups`	Load/store 4 × Float32 (unaligned)
`movaps`	Load/store 4 × Float32 (aligned)
`movdqu`	Load/store 4 × Int32 or 8 × Int16
`movss`	Store scalar Float32 (lowest lane)
`addps`	Add 4 × Float32
`mulps`	Multiply 4 × Float32
`paddd`	Add 4 × Int32
`paddsw`	Saturated add 8 × Int16
`haddps`	Horizontal add of Float32 pairs
`maxps`	Element-wise max (Float32)
`shufps`	Shuffle Float32 lanes (for reduction)

NEON (AArch64)

Instruction	Description
`ld1`	Load vector (e.g. `v0.4s`, `v0.8h`)
`st1`	Store vector
`add`	Add 4 × Int32
`sqadd`	Saturated add 8 × Int16
`fadd`	Add 4 × Float32
`fmul`	Multiply 4 × Float32
`faddp`	Pairwise add (Float32 reduction)
`fmaxp`	Pairwise max (Float32 reduction)
`faddv`	Vector-wide add (optional)
`fmaxv`	Vector-wide max (optional)

Notes

SSE's movaps and movdqa require 16-byte alignment.
NEON's faddp, fmaxp reduce in two steps: 4 → 2 → 1.
shufps is used with masks like 0x4E, 0x01 for reordering lanes during reduction.
Saturated arithmetic (paddsw, sqadd) clamps values on overflow.

Thanks for reading — and happy crystaling! 💎

Building Portable Crystal Binaries on macOS with GitHub Actions

kojix2 — Mon, 21 Jul 2025 02:34:19 +0000

Overview

If you’ve ever tried to share a Crystal tool you built, you may have noticed that distributing it on macOS isn’t as straightforward as on Linux. On Linux, you can just use the official Docker image with musl to build fully static binaries.

But macOS is different. Its design doesn’t allow fully static linking, so—just like with Rust or Go—you end up with binaries that must dynamically link to system libraries. These are what we call portable binaries.

By default, Crystal binaries on macOS depend on Homebrew libraries like libgc, libevent, and libpcre. That’s not really portable. In this post, I’ll show you how to avoid those dependencies and build more portable binaries for macOS using GitHub Actions.

How Crystal Resolves Libraries

Crystal looks for libraries in this order:

CRYSTAL_LIBRARY_PATH environment variable
ldflags from the @[Link] annotation
pkg-config

Tries the specified pkg_config name
Falls back to the library name
Only if both fail does it use a plain -l flag

Here’s the catch: even if you pass static libraries via --link-flags, pkg-config runs first. If it succeeds, it usually chooses shared libraries—and ignores the static ones you gave.

The Workarounds

Method 1: Use Symlinks

One way around pkg-config is to symlink the static libraries and link them directly:

brew install libgc pcre2
ln -s $(brew ls libgc | grep libgc.a) .
ln -s $(brew ls pcre2 | grep libpcre2-8.a) .
shards build --link-flags="-L $(pwd) $(pwd)/libgc.a $(pwd)/libpcre2-8.a" --release

Method 2: Disable PKG_CONFIG_PATH

Another trick is to simply disable pkg-config so it can’t interfere:

brew install libgc pcre2
unset PKG_CONFIG_PATH
shards build --link-flags="$(brew ls libgc | grep libgc.a) $(brew ls pcre2 | grep libpcre2-8.a)" --release

Combining both methods is the most reliable -- especially for libraries like libcrypto and libssl.

Things to Keep in Mind

The latest-macos runner gives you an Apple Silicon (Arm) binary
For Intel builds, use the macos-13 runner
On some systems, macOS security may require users to manually approve your binary

Alternative: Homebrew Tap

If you want the easiest experience for users, publishing a Homebrew tap is the way to go. That way, they can build your tool from source and let Homebrew handle dependencies.

Still, prebuilt binaries are handy. With the approaches above, you can distribute Crystal binaries on macOS much like you would with Rust.

That’s it for today. How about sharing the Crystal tool you built over the weekend?

Writing Inline Assembly in the Crystal Programming Language

kojix2 — Fri, 20 Jun 2025 04:17:04 +0000

Introduction

When you want to make your code run significantly faster, or just want to explore how computers work at a lower level, you might find yourself curious about writing instructions directly for the CPU. In Crystal, you can do this using inline assembly.

Crystal is a programming language built on top of the LLVM compiler infrastructure. Thanks to this, it can access many of LLVM's powerful features. For low-level programming, Crystal provides both Intrinsic functions and the asm syntax.

The `asm` Syntax

Crystal supports writing inline assembly using the asm keyword.

You can find the official documentation here.

The basic syntax is:

asm("template" : outputs : inputs : clobbers : flags)

template — Assembly code using LLVM’s integrated assembler syntax
outputs — Output operands
inputs — Input operands
clobbers — Registers that may be modified
flags — Optional flags (e.g., "intel")

This colon-separated syntax is quite unusual in Crystal and comes from GCC's inline assembly syntax.

Let’s look at some examples.

NOP Instruction

asm("nop")

Setting a Value Using an Output Operand

dst = uninitialized Int32

asm("mov $$10, $0" : "=r"(dst))

puts dst  # => 10

Note that $$10 is an immediate literal value, and $0 is a placeholder for the output operand.

Using uninitialized Int32 is optional; initializing with dst = 0 works as well.

Using an Input Operand

src = 10
dst = 0

asm("mov $1, $0" : "=r"(dst) : "r"(src))

puts dst  # => 10

Using Multiple Input Operands

a = 10
b = 20
c = uninitialized Int32

asm("add $2, $0" : "=r"(c) : "0"(a), "r"(b))

puts c  # => 30

Using Multiple Output Operands

dst1 = uninitialized Int32
dst2 = uninitialized Int32

asm("
  mov $$10, $0
  mov $$20, $1" : "=r"(dst1), "=r"(dst2))

puts dst1
puts dst2

Using Intel Syntax

You can also use Intel-style syntax:

dst = uninitialized Int32

asm("mov dword ptr [$0], 10" :: "r"(pointerof(dst)) :: "intel")

puts dst

Intrinsics

For relatively simple operations, LLVM provides intrinsics. These functions are highly optimized, platform-independent, and often compatible with Crystal’s interpreter. However, for most basic operations, Crystal's standard library already provides efficient implementations, so using intrinsics does not always yield performance benefits.

Available intrinsics are defined in the Intrinsics module.

Common Intrinsic Functions

`memcpy` — Copy memory

src = Slice(UInt8).new(10) { |i| i.to_u8 }
dest = Slice(UInt8).new(10, 0_u8)

Intrinsics.memcpy(dest, src, 10, is_volatile: false)

puts "Copied: #{dest}"

`memmove` — Move memory with overlap support

buffer = Slice(UInt8).new(10) { |i| i.to_u8 }

Intrinsics.memmove(buffer.to_unsafe + 3, buffer.to_unsafe, 5, is_volatile: false)

puts "Moved: #{buffer}"

`memset` — Initialize memory

buffer = Slice(UInt8).new(10, 0_u8)

Intrinsics.memset(buffer, 0xFF_u8, 10, is_volatile: false)

puts "Set: #{buffer}"

`debugtrap` — Trigger debugger trap

Intrinsics.debugtrap

`pause` — CPU pause (works on x86/x64 and AArch64)

Intrinsics.pause

This is often used internally in Crystal’s Mutex or SpinLock implementations.

`read_cycle_counter` — Read the CPU cycle counter

cycles = Intrinsics.read_cycle_counter

puts "Cycles: #{cycles}"

To observe it in action:

loop do
  cycles = Intrinsics.read_cycle_counter
  puts "Cycles: #{cycles}"
  sleep 1.second
end

Bit Manipulation Intrinsics

Bit Reversal

bitreverse8, bitreverse16, bitreverse32, bitreverse64, bitreverse128

value = 0b1101001_u8
result = Intrinsics.bitreverse8(value)

puts "Reversed: #{result.to_s(2)}"  # => 10010110

Byte Swap

bswap16, bswap32, bswap64, bswap128

value = 0x12345678_u32
result = Intrinsics.bswap32(value)

puts "Swapped: 0x#{result.to_s(16)}"  # => 0x78563412

Population Count

popcount8, popcount16, popcount32, popcount64, popcount128

value = 0b11010110_i32
count = Intrinsics.popcount32(value)

puts "Bit count: #{count}"  # => 5

Count Leading Zeros

countleading8, countleading16, countleading32, countleading64, countleading128

value = 0b00001111_i32
count = Intrinsics.countleading32(value, false)

puts "Leading zeros: #{count}"  # => 4

Count Trailing Zeros

counttrailing8, counttrailing16, counttrailing32, counttrailing64, counttrailing128

value = 0b11110000_i32
count = Intrinsics.counttrailing32(value, false)

puts "Trailing zeros: #{count}"  # => 4

Conclusion

Crystal still lacks extensive documentation in many languages, but DeepWiki is a reliable source for answers to most questions. This article is based on what I’ve learned from DeepWiki, and all code examples have been tested to ensure they work correctly. I highly recommend it.

That’s all for now — happy hacking with Crystal!

This post was translated from Japanese to English by ChatGPT.
Click here to see the original post.

DEV Community: kojix2

I Think Ruby Isn’t Dynamic Enough…

Ruby Is Not Object-Oriented Enough Either

What I Hope to See from Ruby

Porting Libraries to Crystal with AI

Introduction

Porting libraries to the Crystal language

Choosing a reference library

Checking the license

Making an overall plan with the web version of ChatGPT

Doing the porting work locally

Writing tests with fixtures

Adding GitHub Actions

Keeping README.md simple

Deciding the granularity of commits

Conclusion

Why Is Crystal Compilation So Slow?

Introduction

The Crystal Compilation Pipeline

Command-Line Options for Compilation Statistics

Native LLVM Functions Called During Codegen

Results

How to Speed Up the Crystal Compiler

Use -O3 Instead of --release

Why Crystal Doesn't Have Incremental Compilation or Shared Library Support

Crystal's --release Mode Includes --single-module

Crystal's Weak Caching Mechanism

Crystal Is a Statically-Typed Language Where the Caller Determines Types

Type IDs Change with Each Compilation

The Crystal Language Community and Resource Constraints

Conclusion

A Practical Guide to Parallel Programming in Crystal (2025)

Building with Parallel Execution Enabled

Overview of Crystal's Concurrency and Parallelism

Cooperative Scheduling of Fiber and I/O

What is a "Stack" in Crystal?

Background Knowledge: Thread / Scheduler / Fiber

Main Thread Creation and Initialization

Main Fiber Creation

Lazy Initialization of Scheduler

Default Thread Configuration

Stack Allocation for New Fibers

Parallel Execution with ExecutionContext

ExecutionContext::Concurrent

ExecutionContext::Parallel

ExecutionContext::Isolated

Default Fiber Without Using ExecutionContext

Basic Patterns of Channel and WaitGroup

Producer-Consumer (Parallel)

Protection of Shared Variables in Concurrent

Atomic / Mutex / SpinLock

Atomic

Mutex

SpinLock

Areas to Be Careful About in the Standard Library

What Qualifies as a Shared Variable Subject to Contention?

ENV

Class Variables

IO (File, Socket, STDOUT/ERR)

Logger

Report Any Issues You Find

Cases Where Thread Should Be Used

FFI (C Library Calls) and Parallel Execution

Summary

Practical Guidelines for Parallel Execution in Crystal

Notes on Building CLI and GUI tools with Crystal

libui and Garbage Collection - Challenges in Creating Ruby and Crystal Bindings

Introduction

libui Bindings

The Problem of Disappearing Controls and Callback Functions

libui's Memory Management Rules

libui's Memory Leak Detection Mechanism

Table Deallocation Procedure

Area Deallocation Procedure

MultilineEntry Deallocation Procedure

Summary

12 Things I Learned Writing CLI Tools in Crystal

1. Similarity to Ruby

2. Use DeepWiki

3. Arrays and Hashes cannot mix types

Use `-O3` Instead of `--release`

Crystal's `--release` Mode Includes `--single-module`

4. No `eval`

12. Limitations of OptionParser

Editing `shard.yml`

Creating `duck_egg.cr`

`asm` Syntax

The `asm` Syntax