Shayan Holakouee

Posted on Apr 27

The BEAM Is Not Like Other Runtimes (And That's Why Elixir Scales the Way It Does)

#elixir #webdev #programming #tutorial

You have been writing Elixir for a while. You know GenServers. You know supervisors. You have hit the BEAM's concurrency model enough times to trust it. But there is a level below that which most Elixir developers never look at, and it explains a lot of behavior that otherwise seems like magic.

Why does spawning a hundred thousand processes not kill your system? Why is a process crash isolated but not silent? Why does garbage collection in Elixir not stop the world? The answers are all in the BEAM, and they are worth understanding properly.

The BEAM Is Not the JVM With a Different Language

This comparison comes up constantly and it misleads people. Both are virtual machines that run bytecode. That is roughly where the similarity ends.

The JVM was designed around threads. Concurrency on the JVM means OS threads, shared mutable state, locks, and a garbage collector that operates across the entire heap. The BEAM was designed around processes. Not OS processes. Not OS threads. Its own lightweight processes, each with isolated memory, independent garbage collection, and message passing as the only communication mechanism.

This is not an implementation detail. It is the fundamental design decision that every other property of the BEAM follows from.

Processes Are Cheaper Than You Think

When people hear "process" they think OS process: expensive, slow to start, heavy on memory. BEAM processes are none of those things.

A BEAM process starts with around 2KB of stack space. It grows as needed. Spawning one takes microseconds. The BEAM scheduler manages thousands of them across a small pool of OS threads, one per CPU core by default. You can run a million concurrent processes on a single machine without the OS knowing about most of them.

# This is not dangerous. It is idiomatic.
pids = Enum.map(1..100_000, fn i ->
  spawn(fn ->
    Process.sleep(:timer.seconds(10))
    IO.puts("Process #{i} done")
  end)
end)

IO.puts("Spawned #{length(pids)} processes")
# => Spawned 100000 processes

On most systems this runs without issue. Try the equivalent with OS threads in any language and you will run out of resources before you hit ten thousand. The difference is not hardware. It is the scheduler.

The Scheduler: Preemptive and Reduction-Based

The BEAM uses a preemptive scheduler, but not in the way most runtimes do it. It does not preempt based on time slices measured in milliseconds. It preempts based on reductions.

A reduction is roughly one unit of work: a function call, a pattern match, a message send. Each process gets a budget of 2000 reductions per scheduling turn. When the budget runs out, the scheduler suspends the process and runs another one. This happens regardless of what the process is doing. A tight loop does not starve other processes.

# This does not block the scheduler
defmodule Spinner do
  def spin(n) do
    spin(n + 1)  # each call consumes a reduction
  end
end

spawn(fn -> Spinner.spin(0) end)
spawn(fn -> IO.puts("I still run") end)
# => I still run

In Node.js, a synchronous tight loop blocks the event loop entirely. In the BEAM, the scheduler preempts it after 2000 reductions. Every other process continues running. The misbehaving process gets its turn again shortly, but it cannot monopolize the system.

This is why latency in Elixir systems tends to be consistent rather than spiky. No single process can hold the scheduler hostage.

Memory Isolation and Per-Process GC

Every BEAM process has its own heap. No sharing. When a process dies, its heap is reclaimed immediately. No coordination with other processes, no waiting for a global GC cycle.

This is the source of one of the BEAM's most important properties: garbage collection does not stop the world. There is no world to stop. Each process collects its own garbage independently, on its own schedule, without affecting any other process.

# Each process allocates and collects independently
spawn(fn ->
  large_list = Enum.to_list(1..1_000_000)
  # large_list is only in this process's heap
  # GC here affects nothing else
  :ok
end)

# This process is unaffected by the GC above
spawn(fn ->
  IO.puts("Running without pause")
end)

Compare this to the JVM's stop-the-world pauses, or Go's GC which, while incremental, still coordinates across the entire heap. In the BEAM, a process doing heavy allocation and collection does not introduce latency spikes in other processes. The isolation is complete.

The tradeoff is message passing. Because processes share no memory, sending data between them requires copying. Large messages are expensive not because of the send itself but because of the copy. This is why Elixir idioms tend toward small, frequent messages rather than large data transfers. The design pushes you toward the pattern that performs well.

How Message Passing Actually Works

Every process has a mailbox. Sending a message puts a copy of the message into the recipient's mailbox. The recipient reads from its mailbox using receive. This is asynchronous by default: sending does not block.

parent = self()

spawn(fn ->
  send(parent, {:result, 42})
end)

receive do
  {:result, value} -> IO.puts("Got: #{value}")
after
  5000 -> IO.puts("Timed out")
end

The receive block pattern matches against messages in the mailbox in order. If no message matches, the process blocks and waits. Unmatched messages stay in the mailbox. This is a subtle point: if you receive messages without handling all patterns, your mailbox grows unboundedly. A common bug in long-running processes is accumulating unmatched messages until memory pressure becomes a problem.

# Dangerous in a long-running process
receive do
  {:ok, value} -> handle(value)
  # no catch-all: anything else stays in mailbox forever
end

# Safer
receive do
  {:ok, value} -> handle(value)
  other -> Logger.warning("Unexpected message: #{inspect(other)}")
end

Preemption vs Cooperation: The NIF Problem

The BEAM's scheduler is preemptive for Elixir and Erlang code. It is not preemptive for NIFs.

A NIF (Native Implemented Function) is C code called directly from Elixir. When a NIF runs, the BEAM scheduler cannot preempt it. The OS thread running that NIF is blocked for the duration of the call. If your NIF takes 100ms, every process scheduled on that OS thread waits 100ms.

# If :my_nif.heavy_computation/1 takes 500ms,
# it blocks the scheduler thread for 500ms
result = :my_nif.heavy_computation(data)

This is why the BEAM documentation is so emphatic that NIFs must be fast. Anything over one millisecond should use a dirty scheduler: a separate pool of OS threads reserved for long-running native calls that runs outside the main scheduler.

# In your NIF definition (C side), mark it dirty:
# ERL_NIF_DIRTY_JOB_CPU_BOUND or ERL_NIF_DIRTY_JOB_IO_BOUND

Understanding this explains why some Elixir libraries that wrap C code (image processing, certain crypto operations, database drivers with native extensions) can introduce latency spikes that seem inconsistent with the BEAM's concurrency model. They are bypassing the model.

The Scheduler's Relationship With IO

IO in the BEAM is non-blocking at the scheduler level. When a process does a network read or a file operation, the BEAM does not block the OS thread. It suspends the process, registers the IO operation with an internal polling mechanism (built on epoll or kqueue depending on the OS), and runs other processes. When the IO completes, the process is rescheduled.

This is why Elixir handles tens of thousands of concurrent connections without a thread-per-connection model. Each connection maps to a process. Each process consumes a BEAM process worth of resources (a few kilobytes), not an OS thread. The IO multiplexing is handled by the runtime, not by the application code.

# Each of these connections is a lightweight process
# The BEAM handles the IO multiplexing
{:ok, socket} = :gen_tcp.accept(listen_socket)
pid = spawn(fn -> handle_connection(socket) end)

Phoenix handling millions of concurrent WebSocket connections is not a framework trick. It is the BEAM's process model applied to network IO with no special configuration required.

What This Means for How You Write Code

Once you understand the BEAM's process model, certain Elixir patterns stop looking like conventions and start looking like direct expressions of the runtime's capabilities.

Spawning a process per request is not wasteful, it is appropriate. Letting a process crash instead of handling every possible error is not lazy, it is correct: the process's isolated heap disappears cleanly, the supervisor restarts it in a known good state, and no other process was affected.

The things that are expensive in other runtimes (concurrency, isolation, failure recovery) are cheap in the BEAM because the runtime was built to make them cheap. The things that are expensive in the BEAM (large message copies, long-running NIFs) are expensive because they work against the model.

The more directly your code expresses the model, the more you get out of it.

DEV Community