Renato Valim

Posted on Oct 1

TIL: BEAM Dirty Work!!

#elixir #erlang #todayilearned #computerscience

I’ve been studying again operating systems — processes, threads, CPU scheduling — and wanted to connect that knowledge to my day-to-day work as an Elixir developer. So I fired up iex -S mix phx.server and started poking around.

First surprise: running pgrep -f beam.smp showed 3 OS processes. Turns out two were orphaned ElixirLS language servers from old Neovim sessions (oops), and one was my Phoenix app.

Second (not so) surprise: ps -M <phoenix app pid> | wc -l revealed my Phoenix app was running 32 OS threads (on my M1 Mac)

I wanted to understand exactly the purpouse of all those threads, but after some research I ended up reading the :erlang.system_info/1 documentation. Scrolling through the available options, I noticed some intriguing entries: dirty_cpu_schedulers, dirty_io_schedulers, ...

“Dirty” schedulers? I’d never heard that term before. Very interesting:

iex> :erlang.system_info(:schedulers)
8
iex> :erlang.system_info(:dirty_cpu_schedulers)
8
iex> :erlang.system_info(:dirty_io_schedulers)
10

That rabbit hole led me to discover one of the BEAM VM’s most clever architectural decisions — and revealed a fundamental challenge at the boundary between managed VM code and native code.

The Problem: When the BEAM Loses Control

The BEAM VM is famous for its ability to run gazillions!!! of lightweight processes concurrently. It does this through cooperative multitasking — each process runs for a bit, then yields control so others can run. The BEAM VM can preempt any process after some "reductions" (think of these as instruction counts).

But there’s a catch: this only works for BEAM Bytecode.

When you call a NIF (Native Implemented Function) — C/Rust code compiled into a .so/.dylib/.dll shared library — something dangerous happens:

# This is Elixir code — BEAM has full control
def some_func(data) do
 Enum.reduce(data, 0, &other_func/2) # Can be preempted by BEAM
end
# This calls C code — BEAM loses control
def hash(data) do
 :crypto.hash(:sha256, data) # Cannot be interrupted by BEAM
end

The Journey of the Program Counter

Here’s what happens when a scheduler thread executes a NIF:

Scheduler Thread 1 (single OS thread):
├─ Process A: needs to hash password (bcrypt NIF)
├─ Process B: handle HTTP request (waiting…)
├─ Process C: send email (waiting…)
└─ Process D: database query (waiting…)

Timeline:
0ms: Scheduler picks Process A
1ms: Process A calls bcrypt NIF
 PC (Program Counter) jumps from BEAM bytecode → C code in .so file
2ms: [PC executing C code — "BEAM cannot see inside"]
10ms: [Still in C code…]
50ms: [Still in C code…]
100ms: bcrypt returns! PC jumps back to BEAM
101ms: Scheduler can FINALLY pick Process B

Result: Process B, C, D waited 100ms even though they had work to do!

Why Can’t BEAM Interrupt C Code?

BEAM bytecode runs in the VM’s interpreter loop, something like this (merely illustrative!!!):

while (true) {
 instruction = fetch()
 execute(instruction)
 reductions++

 if (reductions > MAX_REDUCTIONS) {
 // That's enough!! Let your brother play the videogame now
 }
}

But C code in a NIF executes as raw CPU instructions:

// crypto.so
int hash_password(char* pwd) {
 for (int i = 0; i < 1000000; i++) {
 // Complex hashing logic
 // BEAM’s while loop isn’t running!
 // Can’t check reductions count
 // Can’t yield control
 }
 return result;
}

The program counter has left BEAM’s interpreter and is running native machine code directly. BEAM just has to wait.

Wait, Doesn’t the OS Share CPU Time?

Great question! Yes, the OS scheduler does time-slice CPU between threads:

OS Level (works fine):
Thread 1 (Scheduler 1): [10ms] [pause] [10ms] [pause]
Thread 2 (Scheduler 2): [pause] [10ms] [pause] [10ms]

But that doesn’t help inside a single thread:

Inside Scheduler Thread 1:
├─ Process A [running bcrypt NIF — 100ms]
│ └─ OS gives thread CPU time 
│ └─ But thread is busy executing C code
│ └─ Processes B, C, D are stuck in queue

The OS is sharing CPU time between threads, but the BEAM can’t share scheduler time between Erlang processes while stuck in that native code.

The Solution: Dirty Schedulers

BEAM has dirty schedulers: separate thread pools for running potentially blocking operations:

┌────────────────────────────────────────────────┐
│ Normal Schedulers (8 threads on my M1 Mac) 
├────────────────────────────────────────────────┤
│ Thread 1: [Process A][Process B][Process C]
│ Thread 2: [Process D][Process E][Process F]
│ ...
│ > Handle regular Erlang processes 
│ > Stay responsive and preemptible 
└────────────────────────────────────────────────┘

┌────────────────────────────────────────────────┐
│ Dirty CPU Schedulers (8 threads on my M1 Mac) 
├────────────────────────────────────────────────┤
│ Thread 1: [bcrypt NIF — can block for 100ms] 
│ Thread 2: [image compression NIF] 
│ Thread 3: [crypto operations] 
│ > Run CPU-intensive NIFs 
│ > Can block without affecting normal schedulers 
└────────────────────────────────────────────────┘

┌────────────────────────────────────────────────┐
│ Dirty IO Schedulers (10 threads on my M1 Mac) 
├────────────────────────────────────────────────┤
│ > Run IO-heavy NIFs (file operations, etc.) 
└────────────────────────────────────────────────┘

See It In Action

On my machine running a Phoenix app:

# In IEx
:erlang.system_info(:schedulers) # => 8
:erlang.system_info(:dirty_cpu_schedulers) # => 8
:erlang.system_info(:dirty_io_schedulers) # => 10

Total: 26 worker threads just for scheduling!

From the OS perspective:

$ ps -M <beam_pid> | wc -l
33 # 32 threads total (8 + 8 + 10 + system threads)

How NIFs Use Dirty Schedulers

NIF authors mark functions as "dirty":

static ERL_NIF_TERM slow_hash(ErlNifEnv* env, …) {
 // CPU-intensive work
 return result;
}

static ErlNifFunc nif_funcs[] = {
 {“hash”, 2, slow_hash, ERL_NIF_DIRTY_JOB_CPU_BOUND}
 //                     ^^^^ This tells BEAM to use dirty scheduler
};

When you call it from Elixir:

:crypto.hash(:sha256, data)
# ↓
# BEAM identifies it is dirty work
# ↓ 
# Schedules it on a dirty CPU scheduler instead
# ↓
# Normal schedulers stay responsive

The Real-World Impact

Without dirty schedulers:

User A: Registers account (triggers bcrypt)
Result: Entire Phoenix app freezes for 100ms
 — Health checks timeout
 — WebSockets disconnect
 — Simple GET requests stall

With dirty schedulers:

User A: Registers account (triggers bcrypt on dirty scheduler)
Result: Rest of app stays responsive
 — Other requests process normally
 — WebSockets maintain connection
 — Only password hashing takes 100ms (expected)

Key Takeaway

The BEAM VM’s responsiveness comes from cooperative multitasking, but that breaks down when calling native code. Dirty schedulers solve this by isolating potentially blocking operations on separate threads, keeping your application responsive even when running expensive C/Rust operations.

Next time you use :crypto, :bcrypt, or any NIF-based library, remember: there’s a whole separate pool of threads handling that work so your web requests don’t freeze!

Top comments (2)

Fredrik Teschke • Oct 2

Good read!

Vinicius Pereira • Oct 1

Great post my friend!