Elixir uses recursion for looping over data structures like lists. Although most of it's hidden away through higher-level abstractions on the Enum
class, in some cases writing a recursive function yourself can be more performant. In this episode of Elixir Alchemy, we’ll try to find some of these cases.
Along the way, we’ll learn about body-recursive and tail-recursive functions. Although the latter is often touted as being faster, we’ll also look at cases where that’s not necessarily true. Let’s dive right in!
Enumeration
Elixir provides functions on the Enum
module to enumerate over collections.
def sum_numbers1(list) do
list
|> Enum.filter(&is_number/1)
|> Enum.reduce(0, fn n, sum -> n + sum end)
end
This example takes a list and returns the sum of all numbers in that list. Because there might be non-numeric items in the input list, it uses Enum.filter/2
to select only the items that is_number/1
returns true on. After that, the remaining values are added together through Enum.reduce/3
.
sum_numbers1([1, 2, "three", 4, %{}]) # => 7
While this solution works, it iterates over the whole list twice to get to the result. To make this more performant, we might instead choose to write a recursive function that can do this in one go.
Recursion
A recursive function can do both tasks (removing the non-numeric items and adding the rest together) in one run. It will call itself for every item in the list, and use pattern matching to decide what to do with each item.
# 1
def sum_numbers2([head | tail]) when is_number(head) do
sum_numbers2(tail) + head
end
# 2
def sum_numbers2([_head | tail]) do
sum_numbers2(tail)
end
# 3
def sum_numbers2([]) do
0
end
Instead of using the functions provided by the Enum
module, this solution calls itself until the list is empty. It does so using three function heads:
- When the first item in the list (the head) is a number, we'll call
sum_numbers2/1
with the rest of list (anything but the head, called the tail). This call returns a number we'll add our current head to. - When the head is not a number, we'll drop it by calling the
sum_numbers2/1
function with just the tail. - When the list is empty, we'll return 0.
sum_numbers2([1, 2, "three", 4, %{}]) # => 7
When calling this function with [1, 2, "three", 4, %{}]
, the sum_numbers2/1
function is called repeatedly in the following order to get to the result:
sum_numbers2([1, 2, "three", 4, %{}])
sum_numbers2([2, "three", 4, %{}]) + 1
sum_numbers2(["three", 4, %{}]) + 1 + 2
sum_numbers2([4, %{}]) + 1 + 2
sum_numbers2([%{}]) + 1 + 2 + 4
sum_numbers2([]) + 1 + 2 + 4
0 + 1 + 2 + 4
This list shows the how the function is simplified until we get to a result, which shows the function being called six times. Once for each item, plus once more for the empty list at the end.
Because the function calls itself for each iteration, each item in the list causes a stack frame to be added to the call stack. That’s because each item warrants a call to a function, which needs to be executed to claim its result. Adding a stack frame to the call stack requires some memory to be allocated, and can cause some overhead.
Tail recursion and tail-call optimization
To keep the memory footprint to a minimum, some languages—like Erlang and thus Elixir—implement tail-call optimization. With a small rewrite of our code, we can prevent the stack frame being added and that memory allocated.
# 1
def sum_numbers3(list) do
do_sum_numbers3(list, 0)
end
# 2
defp do_sum_numbers3([head | tail], sum) when is_number(head) do
do_sum_numbers3(tail, sum + head)
end
# 3
defp do_sum_numbers3([_head | tail], sum) do
do_sum_numbers3(tail, sum)
end
# 4
defp do_sum_numbers3([], sum) do
sum
end
This example is yet another implementation of the function from before. However, this example is tail-recursive, meaning it doesn’t need to await a call to itself before continuing. It does so by keeping an accumulator (sum
) that keeps the current value instead of stacking function calls.
- The
sum_numbers3/1
function is the public function, which calls the privatedo_sum_numbers3/2
function with the passed list and an accumulator of0
to start with. - Much like the previous example,
do_sum_numbers3/2
's first function head matches lists with a numeric head. It calls itself with the list's tail, and the head added to the accumulator. - When the head is not a number, the third function head drops it by calling itself with the tail and the current accumulator, without changing anything.
- Finally, the exit clause returns the accumulator.
By calling itself at the end of the function and passing the accumulator with all calculated values, the Erlang VM can execute a trick called tail-call optimization, which allows it to circumvent adding new stack frames. Instead, it uses the current frame by jumping back up to the beginning of the function.
Tail call optimization allows functions calling themselves without running into stack problems.
def sleep, do: sleep
This example implements a function that continually calls itself. Without tail-call optimization, this function would produce a stack error.
Which is faster?
We've written three implementations of the same function. The first used Enum.filter/2
and Enum.reduce/3
to filter and sum the list, and iterated over the list twice. To fix that, we used a recursive function that did it in one go, which we further improved using tail-call optimization.
letters = Enum.map(?a..?z, fn x -> <<x::utf8>> end)
numbers = Enum.to_list(0..10_000)
list = Enum.shuffle(letters ++ numbers)
Benchee.run(
%{
"Enum.filter/2 and Enum.reduce/3" => fn -> Recursion.sum_numbers1(list) end,
"Body-recursive" => fn -> Recursion.sum_numbers2(list) end,
"Tail-recursive" => fn -> Recursion.sum_numbers3(list) end
},
time: 10,
memory_time: 2
)
To benchmark our solutions, we'll use Benchee. We'll prepare a shuffled list of strings and numbers for each function to sum for ten seconds.
Name ips average deviation median 99th %
Tail-recursive 107.35 K 9.32 μs ±21.39% 9 μs 14 μs
Body-recursive 55.65 K 17.97 μs ±662.77% 14 μs 34 μs
Enum.filter/2 and Enum.reduce/3 20.86 K 47.94 μs ±51.46% 46 μs 85 μs
Comparison:
Tail-recursive 107.35 K
Body-recursive 55.65 K - 1.93x slower
Enum.filter/2 and Enum.reduce/3 20.86 K - 5.15x slower
Memory usage statistics:
Name Memory usage
Tail-recursive 0 B
Body-recursive 0 B
Enum.filter/2 and Enum.reduce/3 16064 B
In this test on this machine, we see that the body-recursive version is almost three times as fast as the original implementation that used Enum.filter/2
and Enum.reduce/3
. The tail-recursive version was almost twice as fast ad the body-recursive one.
Is tail-call recursion always faster?
While the results of that benchmark look quite convincing, tail-recursion isn't always faster than body recursion. In fact, that's one of the 7 myths of Erlang performance.
While tail-recursive calls are usually faster for list reductions—like the example we’ve seen before—body-recursive functions can be faster in some situations. That's often true when mapping over a list, where the function takes a list and returns a list without reducing it to a single value.
def double_numbers1(list) do
list
|> Enum.filter(&is_number/1)
|> Enum.map(fn n -> n * 2 end)
end
def double_numbers2([]) do
[]
end
def double_numbers2([head | tail]) when is_number(head) do
[head * 2 | double_numbers2(tail)]
end
def double_numbers2([_head | tail]) do
double_numbers2(tail)
end
def double_numbers3(list) do
list
|> do_double_numbers3([])
|> Enum.reverse()
end
defp do_double_numbers3([], acc) do
acc
end
defp do_double_numbers3([head | tail], acc) when is_number(head) do
do_double_numbers3(tail, [head * 2 | acc])
end
defp do_double_numbers3([_head | tail], acc) do
do_double_numbers3(tail, acc)
end
This example is a lot like the example we've seen before, but instead of adding all numbers together and reducing them to a single number, the double_numbers/1
function doubles all numbers and returns them in a list.
Like before, we have three implementations. The first uses Enum.filter/2
and Enum.map/2
to iterate over the list twice, the second is body-recursive and the last is tail-recursive. Before returning, the tail-recursive function needs to reverse the list, because the values get prepended to the accumulator.
Note: Check out the example project for a comparison of more possible implementations, like list comprehensions and a tail-recursive version that doesn't reverse the list.
Name ips average deviation median 99th %
body-recursive 36.86 K 27.13 μs ±39.82% 26 μs 47 μs
tail-recursive 27.46 K 36.42 μs ±1176.74% 27 μs 80 μs
Enum.filter/2 and Enum.map/2 12.62 K 79.25 μs ±194.81% 62 μs 186 μs
Comparison:
body-recursive 36.86 K
tail-recursive 27.46 K - 1.34x slower
Enum.filter/2 and Enum.map/2 12.62 K - 2.92x slower
Memory usage statistics:
Name Memory usage
body-recursive 15.64 KB
tail-recursive 20.09 KB - 1.28x memory usage
Enum.filter/2 and Enum.map/2 31.33 KB - 2.00x memory usage
When running a benchmark for each of these implementations, we can see that the body-recursive version is fastest in this case. Having to reverse the list in the tail-recursive version negates the improved performance tail-call optimization adds in this specific case.
As always, it depends
This concludes our look into recursion in Elixir. As a rule of thumb; tail-recursive functions are faster if they don't need to reverse the result before returning it. That's because that requires another iteration over the whole list. Tail-recursive functions are usually faster at reducing lists, like our first example.
Which is the best approach depends on the situation. For mission-critical loops, writing a benchmark to measure which solution is faster is usually your best bet, as it's not immediately obvious which will perform better.
To learn more about recursion, also be sure to read this overview of the available methods, and when to use which. If you liked this article, don't forget to subscribe to the mailing list if you'd like to read more Elixir Alchemy!
Top comments (0)