This article was originally written in Russian and has been translated to English with the help of DeepSeek. You can find the original version here: Habr.
Elixir – a language created to reintroduce Erlang to the modern world. A syntax free of beloved but archaic punctuation marks; a development culture that emphasizes tooling quality and developer comfort; a comprehensive suite for building web services; a standard library unburdened by decades of legacy; and real macros.
At first glance, Elixir doesn’t introduce radically new concepts at its core. Indeed, developers familiar with both Elixir and Erlang can often imagine how code in one language would translate to the other. But not always — Elixir includes constructs that have no direct Erlang equivalents. How do they work? Obviously, Elixir expands them into some additional Erlang code during the compilation phase. Some transformations are intuitive to predict; others (spoiler alert) might reveal surprising compiler tricks.
This article explores the transformations Elixir code undergoes before reaching the Erlang compiler. We’ll dissect conditional constructs like if
and cond
, unravel the mysteries of the dot operator, trace the adventures of with
and for
comprehensions, demystify protocols, and marvel at the optimizations Elixir achieves.
Since the final result of the Elixir compiler's work is Erlang Abstract Code — an Erlang syntax tree — it is easy to reconstruct Erlang code from it. The following function will assist us in this process:
@spec to_abstract(module()) :: String.t()
def to_abstract(module) do
module
|> :code.get_object_code() # Get the module BEAM code
|> then(fn {_module, beam, _path} -> beam end)
|> :beam_lib.chunks([:abstract_code]) # Extract Abstract Code from debug section
|> then(fn result ->
{:ok, {_, [abstract_code: {:raw_abstract_v1, abstract_code}]}} = result
abstract_code
end)
|> :erl_syntax.form_list()
|> :erl_prettypr.format() # Translate Abstract Code back to Erlang
|> List.to_string()
end
Feel free to use it yourself if the article doesn’t cover topics you’re curious about. The full code is available on GitHub.
A heads-up: There will be plenty of Erlang code ahead, so familiarity with the language will help. But don’t worry if you’ve never encountered Erlang before — no overly complex syntax will appear. A basic grasp of Elixir is all you’ll need to follow along.
Conditional expressions
Let’s start with the basics. Unlike Erlang, which strictly operates with boolean values, Elixir introduces the concept of truthiness, where nil
and false
are considered falsy
, and all other values are truthy
.
Accordingly, all expressions that rely on this logic must expand into BEAM-compatible code during compilation:
Kernel.if/2
def if_thing(thing) do
if thing, do: :thing, else: :other_thing
end
if_thing(_thing@1) ->
case _thing@1 of
_@1 when _@1 =:= false orelse _@1 =:= nil -> other_thing;
_ -> thing
end.
Kernel.SpecialForms.cond/1
def cond_example do
cond do
:erlang.phash2(1) -> 1
:erlang.phash2(2) -> 2
:otherwise -> :ok
end
end
cond_example() ->
case erlang:phash2(1) of
_@3 when _@3 /= nil andalso _@3 /= false -> 1;
_ ->
case erlang:phash2(2) of
_@2 when _@2 /= nil andalso _@2 /= false -> 2;
_ ->
case otherwise of
_@1 when _@1 /= nil andalso _@1 /= false -> ok;
_ -> erlang:error(cond_clause)
end
end
end.
Kernel.!/1
def negate(thing), do: !thing
negate(_thing@1) ->
case _thing@1 of
_@1 when _@1 =:= false orelse _@1 =:= nil -> true;
_ -> false
end.
Indeed, every conditional expression relying on truthy/falsy evaluation expands into a case
statement where the negative clause checks for membership in the falsy
category.
But this isn’t always the case. For example:
def if_bool(thing) do
if is_nil(thing), do: :thing, else: :other_thing
if thing != nil, do: :thing, else: :other_thing
end
if_bool(_thing@1) ->
case _thing@1 == nil of
false -> other_thing;
true -> thing
end,
case _thing@1 /= nil of
false -> other_thing;
true -> thing
end.
This reveals the first optimization implemented by the compiler: if Elixir is certain that the evaluation will only involve boolean values, it generates a case
statement
case Value of
true -> success;
false -> failure
end
without considering truthiness semantics as in the general case
case Value of
Value when Value =:= false orelse Value =:= nil -> failure;
_ -> success
end
The optimization condition is defined here:
case lists:member({optimize_boolean, true}, Meta) andalso elixir_utils:returns_boolean(EExpr) of
true -> rewrite_case_clauses(Opts);
false -> Opts
end,
For this optimization to apply, the following conditions must be met:
a) The expression must be marked with the optimize_boolean
flag. This flag is automatically set by the compiler for if
, !
, !!
, and
, and or
expressions. !!
represents another optimization: collapsing double negation into a truthiness check. And while and
/or
operate strictly on booleans, the optimize_boolean
flag avoids generating a third case
clause that would raise a BadBooleanError
.
b) Elixir must be able to determine that the operation will return a boolean. This primarily occurs when using operators or is_
guards from the :erlang
module, but more complex cases are also analyzed. For example if the compiler observes that every clause in a case
or cond
returns boolean values, it infers that the entire expression must return booleans.
The full logic can be seen here. Below is an excerpt of it:
returns_boolean(Bool) when is_boolean(Bool) -> true;
returns_boolean({{'.', _, [erlang, Op]}, _, [_]}) when Op == 'not' -> true;
returns_boolean({{'.', _, [erlang, Op]}, _, [_, _]}) when
Op == 'and'; Op == 'or'; Op == 'xor';
Op == '=='; Op == '/='; Op == '=<'; Op == '>=';
Op == '<'; Op == '>'; Op == '=:='; Op == '=/=' -> true;
returns_boolean({{'.', _, [erlang, Op]}, _, [_, Right]}) when
Op == 'andalso'; Op == 'orelse' ->
returns_boolean(Right);
returns_boolean({{'.', _, [erlang, Fun]}, _, [_]}) when
Fun == is_atom; Fun == is_binary; Fun == is_bitstring; Fun == is_boolean;
Fun == is_float; Fun == is_function; Fun == is_integer; Fun == is_list;
Fun == is_number; Fun == is_pid; Fun == is_port; Fun == is_reference;
Fun == is_tuple; Fun == is_map; Fun == is_process_alive -> true;
...
Key-based access
A modern convenience absent in Erlang is dedicated syntax for key-based access in data structures.
Elixir provides square bracket access []
, which simply compiles to a call to Access.get/3:
def brackets(data) do
data[:field]
end
brackets(_data@1) ->
'Elixir.Access':get(_data@1, field).
Then there’s also dot notation access, which works exclusively for maps with atom keys, and here’s where things get more interesting:
def dot(map) when is_map(map) do
map.field
end
dot(_map@1) when erlang:is_map(_map@1) ->
case _map@1 of
#{field := _@2} -> _@2;
_@2 ->
case elixir_erl_pass:no_parens_remote(_@2, field) of
{ok, _@1} -> _@1;
_ -> erlang:error({badkey, field, _@2})
end
end;
Instead of compiling directly into, say, erlang:map_get/2, this notation expands into two nested case
statements.
The first clause retrieves the value from the map, while the nested case
is a consequence of earlier design decisions.
The issue stems from Elixir allowing parentheses to be omitted in function calls:
iex(1)> DateTime.utc_now
~U[2025-02-17 12:35:39.575764Z]
This also applies when the module is determined at runtime:
iex(2)> mod = DateTime
DateTime
iex(3)> mod.utc_now
warning: using map.field notation (without parentheses) to invoke function DateTime.utc_now() is deprecated, you must add parentheses instead: remote.function()
(elixir 1.18.2) src/elixir.erl:386: :elixir.eval_external_handler/3
(stdlib 6.2) erl_eval.erl:919: :erl_eval.do_apply/7
(stdlib 6.2) erl_eval.erl:479: :erl_eval.expr/6
(elixir 1.18.2) src/elixir.erl:364: :elixir.eval_forms/4
(elixir 1.18.2) lib/module/parallel_checker.ex:120: Module.ParallelChecker.verify/1
~U[2025-02-17 12:36:23.248233Z]
Notice that the notation mod.utc_now
is ambiguous — it could be a function call or a key access. As a result, Elixir must generate code that checks at runtime whether the value being accessed is a function or a map.
Starting with this commit, a warning is now emitted, but the code still works.
Interestingly, the inverse scenario also requires additional logic:
def dot(module) when is_atom(module) do
module.function()
end
dot(_module@1) when erlang:is_atom(_module@1) ->
case _module@1 of
#{function := _@2} ->
elixir_erl_pass:parens_map_field(function, _@2);
_@2 -> _@2:function()
end.
Because map key access can also end with parentheses:
iex(1)> map = %{field: :value}
%{field: :value}
iex(2)> map.field()
warning: using module.function() notation (with parentheses) to fetch map field :field is deprecated, you must remove the parentheses: map.field
(elixir 1.18.2) src/elixir.erl:386: :elixir.eval_external_handler/3
(stdlib 6.2) erl_eval.erl:919: :erl_eval.do_apply/7
(elixir 1.18.2) src/elixir.erl:364: :elixir.eval_forms/4
(elixir 1.18.2) lib/module/parallel_checker.ex:120: Module.ParallelChecker.verify/1
(iex 1.18.2) lib/iex/evaluator.ex:336: IEx.Evaluator.eval_and_inspect/3
:value
Erlang to the Rescue!
The Erlang compiler optimizes code more aggressively, leveraging type information known at compile time. It can eliminate both case
clauses entirely if confident the value is a map.
The simplest way to provide this type information is via function annotations:
def function(data) when is_map(data)
or
def function(%{} = data)
As a bonus, we’ll explore how this redundant code generation impacts the final program’s performance at the end of the article. Stay tuned 👍
With
Kernel.SpecialForms.with/1 is, in my opinion, one of the most valuable additions Elixir has brought to the BEAM ecosystem. So much so that Erlang adopted it almost verbatim, renaming it to maybe. One might hope that with
could eventually compile directly to maybe
in future compiler versions, but there’s a critical difference: maybe
does not support guards in its pattern matches. As a result, Elixir must continue to manually translate with
into lower-level constructs, both now and for the foreseeable future:
The example below represents an anti-pattern, but we need this specific code to examine with
clauses containing else
def with_else(map) do
with {_, {:ok, data}} <- {:map, Map.fetch(map, "data")},
{_, {int, ""}} <- {:int, Integer.parse(data)} do
int
else
{:map, :error} -> {:error, :no_data}
{:int, _} -> {:error, :not_an_int}
end
end
with_else(_map@1) ->
_@2 = fun ({map, error}) -> {error, no_data};
({int, _}) -> {error, not_an_int};
(_@1) -> erlang:error({else_clause, _@1})
end,
case {map, maps:find(<<"data">>, _map@1)} of
{_, {ok, _data@1}} ->
case {int, 'Elixir.Integer':parse(_data@1)} of
{_, {_int@1, <<>>}} -> _int@1;
_@3 -> _@2(_@3)
end;
_@3 -> _@2(_@3)
end.
Here, the compiler extracts the else
clauses into a separate lambda, passing values from mismatched case
patterns into it. Interestingly, as recently as Elixir v1.16.3, the generated code was far more verbose:
with_else(_map@1) ->
case {map, maps:find(<<"data">>, _map@1)} of
{_, {ok, _data@1}} ->
case {int, 'Elixir.Integer':parse(_data@1)} of
{_, {_int@1, <<>>}} -> _int@1;
_@1 ->
case _@1 of
{map, error} -> {error, no_data};
{int, _} -> {error, not_an_int};
_@2 -> erlang:error({else_clause, _@2})
end
end;
_@1 ->
case _@1 of
{map, error} -> {error, no_data};
{int, _} -> {error, not_an_int};
_@2 -> erlang:error({else_clause, _@2})
end
end.
In earlier versions, the compiler duplicated all else
clauses for each branching path. While the Erlang compiler (to which Elixir delegates code generation) could theoretically optimize this by eliminating redundant clauses, such analysis was neither consistent nor universally applied.
As an aside, a with
expression without an else
clause compiles into straightforward, unnuanced code:
def without_else(map) do
with {:ok, data} <- fetch_data(map),
{:ok, int} <- parse_int(data) do
int
end
end
without_else(_map@1) ->
case fetch_data(_map@1) of
{ok, _data@1} ->
case parse_int(_data@1) of
{ok, _int@1} -> _int@1;
_@1 -> _@1
end;
_@1 -> _@1
end.
For
Kernel.SpecialForms.for/1 is arguably the most intricate syntactic sugar in Elixir. Personally, I avoid using it when possible, though some may find it appealing.
A basic for
comprehension compiles into an Enum.map
:
def basic do
for i <- 1..5 do
i * 1
end
end
basic() ->
'Elixir.Enum':map(
#{'__struct__' => 'Elixir.Range', first => 1, last => 5, step => 1},
fun (_i@1) -> _i@1 * 1 end).
When a filter is added, it expands into a reduce
call. Notably, the result is reversed via lists:reverse/1
instead of Enum.reverse/1
, saving a function call.
def filter do
for i <- 1..10, div(i, 2) == 0 do
i * 1
end
end
filter() ->
lists:reverse(
'Elixir.Enum':reduce(
#{'__struct__' => 'Elixir.Range', first => 1, last => 10, step => 1},
[],
fun (_i@1, _@1) ->
case _i@1 div 2 == 0 of
true -> [_i@1 * 1 | _@1];
false -> _@1
end
end)).
Pattern-matching clauses are lifted into the head of the generated function. However, guards are expanded into nested case
statements — a curious choice, given that guards here are subject to the same limitations as anywhere else.
def match do
users = [user: "john", admin: "meg", guest: "barbara"]
for {type, name} when type != :guest <- users do
String.upcase(name)
end
end
match() ->
_users@1 = [{user, <<"john">>},
{admin, <<"meg">>},
{guest, <<"barbara">>}],
lists:reverse(
'Elixir.Enum':reduce(
_users@1,
[],
fun
({_type@1, _name@1}, _@1) ->
case _type@1 /= guest of
true -> ['Elixir.String':upcase(_name@1) | _@1];
false -> _@1 end;
(_, _@1) -> _@1
end)).
An optimization worth knowing about: if the compiler detects that the result of for
expression is not used, it generates code that skips collecting it. For instance, while a basic for
compiles to map
, the same for
that is not assigned to anything compiles to reduce
with a nil
accumulator:
def no_collect do
for i <- 1..5 do
i
end
:ok
end
no_collect() ->
'Elixir.Enum':reduce(#{'__struct__' => 'Elixir.Range',
first => 1, last => 5, step => 1},
[],
fun (_i@1, _@1) -> begin _i@1, nil end end),
ok.
Finally, when uniq: true
is used, the compiler additionally stores values in MapSet
to check them for uniqueness:
def unique do
for i <- 1..10, uniq: true do
i
end
end
I split the code into intermediate variables for clarity; the original code is written in a single line.
unique() ->
Range = #{'__struct__' => 'Elixir.Range', first => 1, last => 10, step => 1},
Function = fun (Elem, Acc) ->
{List, MapSet} = Acc,
Key = Elem,
case MapSet of
#{Key := true} -> {List, MapSet};
#{} -> {[Key | List], MapSet#{Key => true}}
end
end,
Result = 'Elixir.Enum':reduce(Range, {[], #{}}, Function),
lists:reverse(erlang:element(1, Result)).
Protocols
To explore protocols, let’s use an example from the documentation:
defmodule ElixirJourney.Protocols do
defprotocol Size do
def size(data)
end
defimpl Size, for: BitString do
def size(binary), do: byte_size(binary)
end
defimpl Size, for: Map do
def size(map), do: map_size(map)
end
defimpl Size, for: Tuple do
def size(tuple), do: tuple_size(tuple)
end
def protocol(value) do
Size.size(value)
end
end
Invoking a protocol implementation is straightforward — it’s simply a function call to the protocol’s module (defprotocol
and each defimpl
generate their own modules):
-module('Elixir.ElixirJourney.Protocols').
...
protocol(_value@1) ->
'Elixir.ElixirJourney.Protocols.Size':size(_value@1).
Implementation modules are also compiled into expected forms, with two additions:
- A
behaviour
declaration referencing the protocol - A meta-function
__impl__
that provides metadata about the implementation (e.g., target protocol and module)
-module('Elixir.ElixirJourney.Protocols.Size.Map').
...
-behaviour('Elixir.ElixirJourney.Protocols.Size').
...
'__impl__'(for) -> 'Elixir.Map';
'__impl__'(protocol) ->
'Elixir.ElixirJourney.Protocols.Size'.
size(_map@1) -> erlang:map_size(_map@1).
The core logic resides in the protocol module itself.
The implementation varies depending on whether protocols are consolidated or not (via the :consolidate_protocols
Mix flag). Let’s examine the consolidated case first:
-module('Elixir.ElixirJourney.Protocols.Size').
-behaviour('Elixir.Protocol').
-export_type([t/0]).
-type t() :: term().
-callback size(t()) -> term().
The protocol module declares itself as a behaviour
, with protocol functions defined as callback
s. By default, callbacks use term()
type specs, but these can be refined in defprotocol
:
defprotocol Size do
@type t :: bitstring() | map() | tuple()
@spec size(t()) :: non_neg_integer()
def size(data)
end
-export_type([t/0]).
-type t() :: bitstring() | map() | tuple().
-callback size(t()) -> non_neg_integer().
The meta-function __protocol__
is generated, enabling runtime inspection of protocol details:
'__protocol__'(module) -> 'Elixir.ElixirJourney.Protocols.Size';
'__protocol__'(functions) -> [{size, 1}];
'__protocol__'('consolidated?') -> true;
'__protocol__'(impls) -> {consolidated, ['Elixir.BitString', 'Elixir.Map', 'Elixir.Tuple']}.
Protocol function calls look like this:
size(_@1) -> ('impl_for!'(_@1)):size(_@1).
where impl_for!
ensures an error is raised for unimplemented types:
'impl_for!'(_@1) ->
case impl_for(_@1) of
_@2 when _@2 =:= false orelse _@2 =:= nil ->
erlang:error(
'Elixir.Protocol.UndefinedError':exception(
[
{protocol, 'Elixir.ElixirJourney.Protocols.Size'},
{value, _@1},
{description, <<>>}
]));
_@3 -> _@3
end.
The actual implementation selection happens in impl_for
:
impl_for(#{'__struct__' := _@1})
when erlang:is_atom(_@1) ->
struct_impl_for(_@1);
impl_for(_@1) when erlang:is_tuple(_@1) ->
'Elixir.ElixirJourney.Protocols.Size.Tuple';
impl_for(_@1) when erlang:is_map(_@1) ->
'Elixir.ElixirJourney.Protocols.Size.Map';
impl_for(_@1) when erlang:is_bitstring(_@1) ->
'Elixir.ElixirJourney.Protocols.Size.BitString';
impl_for(_) -> nil.
struct_impl_for(_) -> nil.
if the protocol had struct implementations, they would be enumerated in struct_impl_for
And that's it. During consolidation, the compiler gathers all protocol implementations across the project and generates a consolidated impl_for
dispatch table.
If consolidation is disabled, the compiler generates a runtime impl_for
check for every possible type, including dynamic struct name resolution via struct_impl_for
:
impl_for(#{'__struct__' := _@2 = _@1})
when erlang:is_atom(_@2) ->
struct_impl_for(_@1);
impl_for(_@1) when erlang:is_tuple(_@1) ->
case 'Elixir.Code':ensure_compiled('Elixir.ElixirJourney.Protocols.Size.Tuple') of
{module, _@2} -> _@2;
{error, _} -> nil
end;
impl_for(_@1) when erlang:is_atom(_@1) ->
case 'Elixir.Code':ensure_compiled('Elixir.ElixirJourney.Protocols.Size.Atom') of
{module, _@2} -> _@2;
{error, _} -> nil
end;
impl_for(_@1) when erlang:is_list(_@1) ->
case 'Elixir.Code':ensure_compiled('Elixir.ElixirJourney.Protocols.Size.List') of
{module, _@2} -> _@2;
{error, _} -> nil
end;
...
struct_impl_for(_@1) ->
case 'Elixir.Code':ensure_compiled('Elixir.Module':concat('Elixir.ElixirJourney.Protocols.Size', _@1)) of
{module, _@2} -> _@2;
{error, _} -> nil
end.
That is why unconsolidated protocols allow dynamic implementation injection at runtime but incur performance overhead.
String interpolation
Let's briefly examine string interpolation:
def interpolation do
"This #{:will} #{"be"} #{[97]} #{"str" <> "ing"}"
end
interpolation() ->
<<"This ",
case will of
_@1 when erlang:is_binary(_@1) -> _@1;
_@1 -> 'Elixir.String.Chars':to_string(_@1)
end/binary,
" ", "be", " ",
case [97] of
_@2 when erlang:is_binary(_@2) -> _@2;
_@2 -> 'Elixir.String.Chars':to_string(_@2)
end/binary,
" ",
case <<"str", "ing">> of
_@3 when erlang:is_binary(_@3) -> _@3;
_@3 -> 'Elixir.String.Chars':to_string(_@3)
end/binary>>.
Each value within curly braces is converted into a call to the String.Chars
protocol. An interesting detail here is that Elixir does not fully rely on the protocol. Instead, it generates a separate case
statement with an early return for scenarios where the value is already a string.
It appears Elixir anticipates that the Erlang compiler might optimize away the entire case
statement in certain situations, if the compiler can determine that the value will always be a string.
By the way Elixir itself is capable of such optimization, but only in the simplest case — when a string literal is passed directly, like the "be"
string in our example.
Compile-time computations
Here we encounter a fresh set of optimizations (introduced in 2024) that I stumbled upon by accident. In Elixir, we are accustomed to compile-time computations being as accessible as runtime ones. If you want something to be calculated during compilation, you simply move it out of a function into the module body and attach the result to an @attribute
.
Now, however, Elixir additionally maintains a list of pure functions, based on which it can automatically hoist computations of these functions to compile time, regardless of their location in the code.
For Elixir v1.18.2, this list looks like this:
inline_pure_function('Elixir.Duration', 'new!') -> true;
inline_pure_function('Elixir.MapSet', new) -> true;
inline_pure_function('Elixir.String', length) -> true;
inline_pure_function('Elixir.String', graphemes) -> true;
inline_pure_function('Elixir.String', codepoints) -> true;
inline_pure_function('Elixir.String', split) -> true;
inline_pure_function('Elixir.Kernel', to_timeout) -> true;
inline_pure_function('Elixir.URI', new) -> true;
inline_pure_function('Elixir.URI', 'new!') -> true;
inline_pure_function('Elixir.URI', parse) -> true;
inline_pure_function('Elixir.URI', encode_query) -> true;
inline_pure_function('Elixir.URI', encode_www_form) -> true;
inline_pure_function('Elixir.URI', decode) -> true;
inline_pure_function('Elixir.URI', decode_www_for) -> true;
inline_pure_function('Elixir.Version', parse) -> true;
inline_pure_function('Elixir.Version', 'parse!') -> true;
inline_pure_function('Elixir.Version', parse_requirement) -> true;
inline_pure_function('Elixir.Version', 'parse_requirement!') -> true;
inline_pure_function(_Left, _Right) -> false.
With this mechanism, constructing a new MapSet
can immediately resolve to a precomputed structure:
set = MapSet.new([1, 2, 3])
_set@1 = #{'__struct__' => 'Elixir.MapSet', map => #{1 => [], 2 => [], 3 => []}}
Similarly, calculating the length of a string can be "baked" directly into the module as a literal value:
length = String.length("static string")
_length@1 = 13
These optimizations are somewhat short-sighted, relying on functions being called with literals. For example, a scenario like this:
length = String.length("dynamic" <> " " <> "string")
would not be optimized
_length@2 = 'Elixir.String':length(<<"dynamic", " ", "string">>),
If a computation returns an error, that error is faithfully baked into the compiled code:
version = Version.parse("static invalid")
_version@1 = error,
However, if the computation raises an exception, the compiler leaves the code untouched, ensuring the exception is thrown at runtime as normal:
version = Version.parse!("static invalid")
_version@2 = 'Elixir.Version':'parse!'(<<"static invalid">>)
Beyond the generic list of pure functions, the compiler includes several special-case optimizations. For example, converting the offset argument in shift
functions from the Date
, DateTime
, NaiveDateTime
, and Time
modules into a Duration
structure:
shifted = Date.shift(~D[2025-01-01], day: 1)
_shifted@1 = 'Elixir.Date':shift(
#{'__struct__' => 'Elixir.Date', calendar => 'Elixir.Calendar.ISO', year => 2025, month => 1, day => 1},
#{
'__struct__' => 'Elixir.Duration', day => 1, hour => 0, microsecond => {0, 0},
minute => 0, month => 0, second => 0, week => 0, year => 0
}
)
Since these optimizations rely solely on module and function names, we can fool around a bit by replacing a built-in module:
defmodule Version do
def parse(version) do
IO.puts(version)
version
end
end
defmodule M do
def run do
Version.parse("This will be printed at compiletime")
end
end
Elixir will complain about the replacement, but will execute the code nonetheless:
❯ elixir script.exs
warning: redefining module Version (current version loaded from /usr/lib/elixir/lib/elixir/ebin/Elixir.Version.beam)
│
1 │ defmodule Version do
│ ~~~~~~~~~~~~~~~~~~~~
│
└─ script.exs:1: Version (module)
This will be printed at compiletime
Bonus: "fixing" dot access
As promised, here's a small bonus. If you’ve already forgotten the context: the compiler is forced to generate additional code that dynamically interprets data.field
access due to syntax ambiguity. This extra code selects how to handle the expression based on the value at runtime:
def dot(map) when is_map(map) do
map.field
end
def dot(module) when is_atom(module) do
module.function()
end
dot(_map@1) when erlang:is_map(_map@1) ->
case _map@1 of
#{field := _@2} -> _@2;
_@2 ->
case elixir_erl_pass:no_parens_remote(_@2, field) of
{ok, _@1} -> _@1;
_ -> erlang:error({badkey, field, _@2})
end
end;
dot(_module@1) when erlang:is_atom(_module@1) ->
case _module@1 of
#{function := _@2} ->
elixir_erl_pass:parens_map_field(function, _@2);
_@2 -> _@2:function()
end.
First, let’s measure how this affects performance:
Mix.install([:benchee])
# Hide the literal behind runtime computation, otherwise BEAM will be able to "bake" the value when accessing with map_get
map = Enum.random([%{name: "John Doe", age: 30, email: "john.doe@example.com"}])
Benchee.run(
%{
dot: fn ->
name = map.name
age = map.age
email = map.email
{name, age, email}
end,
pattern_match: fn ->
%{name: name, age: age, email: email} = map
{name, age, email}
end,
fetch!: fn ->
name = Map.fetch!(map, :name)
age = Map.fetch!(map, :age)
email = Map.fetch!(map, :email)
{name, age, email}
end
},
measure_function_call_overhead: true
)
We’ll compare three approaches: dot access, pattern-matching all three fields at once, and Map.fetch!
(which, through collaboration between the Elixir and Erlang compilers, ultimately compiles down to :erlang.map_get
).
Intuitively, I’d expect pattern-matching to be fastest since in a dynamically typed environment it only needs to typecheck the value once.
But the results surprised me:
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.31 GB
Elixir 1.18.2
Erlang 27.2.1
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s
Measured function call overhead as: 19 ns
Benchmarking dot ...
Benchmarking fetch! ...
Benchmarking pattern_match ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
fetch! 18.55 M 53.90 ns ±56533.65% 15 ns 22 ns
pattern_match 18.47 M 54.15 ns ±59289.07% 13 ns 20 ns
dot 18.27 M 54.73 ns ±56682.74% 14 ns 18 ns
Comparison:
fetch! 18.55 M
pattern_match 18.47 M - 1.00x slower +0.25 ns
dot 18.27 M - 1.02x slower +0.82 ns
Microbenchmarks like this should always be taken with a grain of salt. The values we are measuring here are minuscule, and any fluctuation: hardware interrupts, kernel scheduler activity and lots of other things — can introduce massive distortions.
However, across multiple runs, the results remained consistent. fetch!
and pattern_match
stayed neck-and-neck, while dot
lagged slightly behind (by single-digit percentages). It seems the extra code barely impacts execution time.
Still, let’s imagine the compiler didn’t generate that extra code. Would it shrink the final binary?
To test this, we dive into the compiler internals. The dot operator, like all special forms, is expanded into Erlang Abstract Code in lib/elixir/src/elixir_erl_pass.erl.
Even without deep familiarity with the compiler’s internals or Erlang’s abstract code representation, it’s clear that this code handles the data.field
case (without parentheses):
TError = {tuple, Ann, [{atom, Ann, badkey}, TRight, TVar]},
{{'case', Generated, TLeft, [
{clause, Generated,
[{map, Ann, [{map_field_exact, Ann, TRight, TVar}]}],
[],
[TVar]},
{clause, Generated,
[TVar],
[],
[{'case', Generated, ?remote(Generated, elixir_erl_pass, no_parens_remote, [TVar, TRight]), [
{clause, Generated,
[{tuple, Generated, [{atom, Generated, ok}, TInnerVar]}], [], [TInnerVar]},
{clause, Generated,
[{var, Generated, '_'}], [], [?remote(Ann, erlang, error, [TError])]}
]}]}
]}, SV};
A separate branch exists to handle data.field()
, but the logic is similar.
Our goal is to eliminate branching. For data.field()
, we want a direct function call to field
function of data
module, and for data.field
, a direct :erlang.map_get(field, data)
. map_get
fits perfectly here as it also raises the same KeyError
for missing keys.
The following patch achieves exactly this:
diff --git a/lib/elixir/src/elixir_erl_pass.erl b/lib/elixir/src/elixir_erl_pass.erl
index f1c13ca24..7b3358011 100644
--- a/lib/elixir/src/elixir_erl_pass.erl
+++ b/lib/elixir/src/elixir_erl_pass.erl
@@ -237,40 +237,12 @@ translate({{'.', _, [Left, Right]}, Meta, []}, _Ann, S)
TRight = {atom, Ann, Right},
Generated = erl_anno:set_generated(true, Ann),
- {InnerVar, SI} = elixir_erl_var:build('_', SL),
- TInnerVar = {var, Generated, InnerVar},
- {Var, SV} = elixir_erl_var:build('_', SI),
- TVar = {var, Generated, Var},
case proplists:get_value(no_parens, Meta, false) of
true ->
- TError = {tuple, Ann, [{atom, Ann, badkey}, TRight, TVar]},
- {{'case', Generated, TLeft, [
- {clause, Generated,
- [{map, Ann, [{map_field_exact, Ann, TRight, TVar}]}],
- [],
- [TVar]},
- {clause, Generated,
- [TVar],
- [],
- [{'case', Generated, ?remote(Generated, elixir_erl_pass, no_parens_remote, [TVar, TRight]), [
- {clause, Generated,
- [{tuple, Generated, [{atom, Generated, ok}, TInnerVar]}], [], [TInnerVar]},
- {clause, Generated,
- [{var, Generated, '_'}], [], [?remote(Ann, erlang, error, [TError])]}
- ]}]}
- ]}, SV};
+ {{call, Generated, {remote, Generated, {atom, Ann, erlang}, {atom, Ann, map_get}}, [TRight, TLeft]}, SL};
false ->
- {{'case', Generated, TLeft, [
- {clause, Generated,
- [{map, Ann, [{map_field_exact, Ann, TRight, TVar}]}],
- [],
- [?remote(Generated, elixir_erl_pass, parens_map_field, [TRight, TVar])]},
- {clause, Generated,
- [TVar],
- [],
- [{call, Generated, {remote, Generated, TVar, TRight}, []}]}
- ]}, SV}
+ {{call, Generated, {remote, Generated, TLeft, TRight}, []}, SL}
end;
translate({{'.', _, [Left, Right]}, Meta, Args}, _Ann, S)
Apply it and assemble the compiler:
❯ git apply dot-to-maps-get.patch
❯ make compile
Run tests to verify nothing breaks:
❯ make test
Only one test fails, complaining about an exception message mismatch:
1) test blaming annotates undefined key error with nil hints (ExceptionTest)
test/elixir/exception_test.exs:678
Assertion with == failed
code: assert blame_message(nil, & &1.foo) ==
"key :foo not found in: nil\n\nIf you are using the dot syntax, " <>
"such as map.field, make sure the left-hand side of the dot is a map"
left: "expected a map, got: nil"
right: "key :foo not found in: nil\n\nIf you are using the dot syntax, such as map.field, make sure the left-hand side of the dot is a map"
stacktrace:
test/elixir/exception_test.exs:679: (test)
We can live with that. Everything else seems to be in order.
Add the patched compiler’s bin
directory to the front of PATH
:
❯ PATH="path_to_elixir/elixir/bin:$PATH"
and check that new compiler generates the expected code:
dot(_map@1) when erlang:is_map(_map@1) ->
erlang:map_get(field, _map@1);
dot(_module@1) when erlang:is_atom(_module@1) ->
_module@1:function().
Yes, it does.
Run the benchmarks:
...
Name ips average deviation median 99th %
pattern_match 19.09 M 52.37 ns ±60540.61% 12 ns 22 ns
dot 18.66 M 53.60 ns ±57166.61% 14 ns 26 ns
fetch! 18.08 M 55.30 ns ±56749.55% 14 ns 28 ns
Comparison:
pattern_match 19.09 M
dot 18.66 M - 1.02x slower +1.23 ns
fetch! 18.08 M - 1.06x slower +2.93 ns
No dramatic changes. The access methods remain similar in performance.
Finally, let’s measure the impact on compiled code size. We'll take the Phoenix v1.7.20 source code, compile it with both compiler versions and compare the total size of BEAM files:
❯ git clone git@github.com:phoenixframework/phoenix.git && cd phoenix
...
❯ mix deps.get
...
❯ mix compile && du -s _build/dev
...
5400
❯ mix clean --deps
❯ PATH="path_to_elixir/elixir/bin:$PATH" mix compile && du -s _build/dev
...
5336
The difference exists but is minimal — a ~1% size reduction.
Thus, we can confidently conclude that the extra code generated for dot access has negligible impact on both performance and code size 👌
Conclusion
This concludes our journey. As you can see, even when operating purely at the syntactic level (translating Elixir AST into Erlang AST), the compiler is capable of many fascinating optimizations and transformations.
There is sometimes a perception that using Elixir will result in performance loss compared to Erlang, since the latter is considered a "lower-level" language. While this is technically true, in practice these losses are either negligible or completely eliminated during compilation.
Another thought that has been on my mind for a long time and has only strengthened as I delved deeper into the compiler: the complexity of code in Erlang/Elixir remains roughly the same regardless of the complexity of the project it is part of. It’s always just functions working with data. The data could be plain structures, Ecto schemas in a web application, or even a syntax tree in a compiler — it is still just data being processed by ordinary functions. This is a rare quality. In my experience, the code of a language’s "internals" or its core frameworks is usually worlds away from what an average programmer writes in that language.
So when using Elixir, never be afraid to explore how things work under the hood. Don’t hesitate to dive into the code behind the documentation — the language makes it as easy as possible!
Top comments (0)