Allan MacGregor 🇨🇦 for AppSignal

Posted on Sep 13, 2022 • Originally published at blog.appsignal.com

Benchmark Your Elixir App's Performance with Benchee

#elixir

At some point, every software engineer will find themselves in a situation where they need to benchmark system performance and test the limits of what a given system can handle. This is a common problem in software engineering, and even more so in the applications that are well suited for Elixir.

Finding bottlenecks early on in an application can save a lot of time, money, and effort in the long run, and give developers confidence in the upper limit of a system.

In this post, we will introduce a tool called Benchee to benchmark parts of an Elixir application. We will also show you how to integrate Benchee with your automated test suite.

By the end of the article, you'll understand Benchee's full functionality and capabilities, and will be able to use it to measure your application's performance.

Let's get going!

Why Do I Need to Benchmark My Elixir Application?

Benchmarking, in a nutshell, is the process of measuring the performance of a system under specific loads or conditions. As part of the benchmarking process, you will be able to identify potential bottlenecks in your system and areas to improve.

For example, we can use benchmarking tools to answer questions like:

Can a system handle ten times the load of normal traffic?
Can the system run on a smaller infrastructure to handle the same load?
How long does it take to process 10, 100, 1,000, or 10,000 requests? Does the processing time scale linearly with the number of requests?

Just answering these questions alone can help your team avoid costly mistakes and proactively identify areas for improvement, avoiding downtime and unhappy users.

Benchmarking doesn't have to be expensive or time-consuming; it can be simple to get the right tools in place and make them part of an application's natural development life cycle.

What Is Benchee?

This is where Benchee comes in. Benchee is a tool that you can use to benchmark parts of an Elixir application. It is versatile and extensible, with more than a few plugins to enhance its functionality.

Prerequisites

Elixir Environment

To follow along, you will need to locally install Elixir and Phoenix. The easiest way to do so is to follow the official Elixir instructions, which will give you a couple of options for:

Local installation on Linux, Windows, and macOS
Dockerized versions of Elixir
Package manager version setups

I recommend a local installation for the best results.

Setting Up Our Elixir Application

For this article's purposes, we will set up a simple Elixir application that can calculate the Fibonacci sequence.

Start by creating a new application with mix new fibonacci_benchmarking:

As output, you will see the following:

* creating README.md
* creating .formatter.exs
* creating .gitignore
* creating mix.exs
* creating lib
* creating lib/fibonacci_benchmarking.ex
* creating test
* creating test/test_helper.exs
* creating test/fibonacci_benchmarking_test.exs

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

    cd fibonacci_benchmarking
    mix test

Run "mix help" for more commands.

Next, in your favorite editor, add the following code to the lib/fibonacci_benchmarking.ex file:

defmodule FibonacciBenchmarking do
  def list(number), do: Enum.map(0..number, &fibonacci/1)

  def fibonacci(0), do: 0
  def fibonacci(1), do: 1
  def fibonacci(n), do: fibonacci(0, 1, n-2)

  def fibonacci(_, prv, -1), do: prv
  def fibonacci(prvprv, prv, n) do
      next = prv + prvprv
      fibonacci(prv, next, n-1)
  end
end

Note: The original code can be found in rosettacode.org.

Go to the fibonacci_benchmarking directory and run the following commands:

mix deps.get
iex -S mix

And once inside the elixir shell, you can run this:

iex(1)> FibonacciBenchmarking.list(10)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

If you see the above output, you have successfully set up your application, and are ready to proceed with Benchee.

Implement Benchmarking on an Elixir Application

First, we will need to install Benchee. Start by adding the following to your mix.exs file:

defp deps do
  [
    {:benchee, "~> 1.0", only: :dev}
  ]
end

Next, run the following command:

mix deps.get

We can validate that Benchee is installed by running the Elixir shell and the following snippet:

Benchee.run(%{
    "10_seq" => fn -> FibonacciBenchmarking.list(10) end
})

Benchee might take a second or two to warm up, but on completion, you should see the following output:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Number of Available Cores: 8
Available memory: 62.76 GB
Elixir 1.13.3
Erlang 24.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s

Benchmarking 10_seq ...

Name             ips        average  deviation         median         99th %
10_seq      973.04 K        1.03 μs  ±2735.15%        0.77 μs        1.60 μs

The code above makes a call to FibonacciBenchmarking.list(10), and Benchee measures the time it takes to execute the function.

Let's take a moment to understand the output of Benchee. By default, Benchee will output the following information:

ips stands for iterations per second. This number represents how many times a given function can be executed in a second. Higher is better.
average is the average time it takes to execute the function. Lower is better.
deviation is the standard deviation of the results. This is a measure of how much the results deviate from the average.
median is the middle value of the results.
99th % - 99% of all the measured values are less than this value.

While running Benchee in this fashion can be useful for ad-hoc benchmarks, a much better method is to include Benchee as part of our unit tests.

Automate Benchee for Elixir and Run Tests

By default, all Elixir and Phoenix applications have a test directory and use ExUnit to run tests. Our goal is to get Benchee running as part of our test suite and test a different implementation of the Fibonacci sequence.

Start by creating a new file called test/benchee_unit_test.exs, and copy the following code into it:

defmodule BencheeUnitTest do
  use ExUnit.Case
  alias Application.TestHelper

  @tag :benchmark
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "case_10_numbers" => fn() ->
        FibonacciBenchmarking.list(10)
      end
    })

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

Go ahead and run mix test on the console. Validate that the output looks like the following:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Number of Available Cores: 8
Available memory: 62.76 GB
Elixir 1.13.3
Erlang 24.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s

Benchmarking case_10_numbers ...

Name                      ips        average  deviation         median         99th %
case_10_numbers        1.31 M      765.77 ns  ±3552.51%         599 ns        1165 ns
.

Finished in 10.7 seconds (0.00s async, 10.7s sync)
1 test, 0 failures

Randomized with seed 612867

So far, we have integrated Benchee into our test suite and added the first test to validate one of the test cases. Let's add the second test case to compare. Update the test function to:

defmodule BencheeUnitTest do
  use ExUnit.Case
  alias Application.TestHelper

  @tag :benchmark
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "case_10_numbers" => fn() ->
        FibonacciBenchmarking.list(10)
      end,
      "case_1000_numbers" => fn() ->
        FibonacciBenchmarking.list(1000)
      end
    })

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

Just like we did before, we can run the test suite with mix test, and validate that the output looks like the following:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Number of Available Cores: 8
Available memory: 62.76 GB
Elixir 1.13.3
Erlang 24.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking case_1000_numbers ...
Benchmarking case_10_numbers ...

Name                        ips        average  deviation         median         99th %
case_10_numbers          1.33 M     0.00075 ms  ±3419.36%     0.00059 ms     0.00109 ms
case_1000_numbers     0.00010 M       10.21 ms    ±11.54%        9.83 ms       15.29 ms

Comparison:
case_10_numbers          1.33 M
case_1000_numbers     0.00010 M - 13598.03x slower +10.21 ms
.

Our second test scenario tries to compare the performance of the Fibonacci sequence with a list of 1,000 numbers; however, this is not a very practical way to test with multiple inputs. We can take advantage of the Benchee.run hooks and provide a list of inputs for each scenario.

Go ahead and open the test/benchee_unit_test.exs file and replace the contents with this code:

defmodule BencheeUnitTest do
  use ExUnit.Case
  alias Application.TestHelper

  @tag :benchmark
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "generate_list" => fn(input) ->
        FibonacciBenchmarking.list(input)
      end
    },
    inputs: %{
      "case_10" => 10,
      "case_100" => 100,
      "case_1000" => 1000,
      "case_10000" => 10000,
      "case_100000" => 100000
    })

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

In this new version of the code, we have generalized our generate list case to accept a list of inputs, and we can now run the test suite with mix test.

However, because of the size of our last input, you will get a message like this:

❯ mix test

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Number of Available Cores: 8
Available memory: 62.76 GB
Elixir 1.13.3
Erlang 24.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: case_10, case_100, case_1000, case_10000, case_100000
Estimated total run time: 35 s

Benchmarking generate_list with input case_10 ...
Benchmarking generate_list with input case_100 ...
Benchmarking generate_list with input case_1000 ...
Benchmarking generate_list with input case_10000 ...
Benchmarking generate_list with input case_100000 ...


  1) test benchmark fibonacci list generation (BencheeUnitTest)
     test/benchee_unit_test.exs:6
     ** (ExUnit.TimeoutError) test timed out after 60000ms. You can change the timeout:

       1. per test by setting "@tag timeout: x" (accepts :infinity)
       2. per test module by setting "@moduletag timeout: x" (accepts :infinity)
       3. globally via "ExUnit.start(timeout: x)" configuration
       4. by running "mix test --timeout x" which sets timeout
       5. or by running "mix test --trace" which sets timeout to infinity
          (useful when using IEx.pry/0)

     where "x" is the timeout given as integer in milliseconds (defaults to 60_000).

     code: output = Benchee.run(%{
     stacktrace:
       (elixir 1.13.3) lib/task.ex:794: Task.await/2
       (elixir 1.13.3) lib/enum.ex:1593: Enum."-map/2-lists^map/1-0-"/2
       (benchee 1.1.0) lib/benchee/benchmark/runner.ex:77: Benchee.Benchmark.Runner.parallel_benchmark/2
       (elixir 1.13.3) lib/enum.ex:1593: Enum."-map/2-lists^map/1-0-"/2
       (elixir 1.13.3) lib/enum.ex:1593: Enum."-map/2-lists^map/1-0-"/2
       (benchee 1.1.0) lib/benchee/benchmark.ex:103: Benchee.Benchmark.collect/3
       (benchee 1.1.0) lib/benchee.ex:48: Benchee.run/2
       test/benchee_unit_test.exs:8: (test)
       (ex_unit 1.13.3) lib/ex_unit/runner.ex:500: ExUnit.Runner.exec_test/1
       (stdlib 3.17) timer.erl:166: :timer.tc/1
       (ex_unit 1.13.3) lib/ex_unit/runner.ex:451: anonymous fn/4 in ExUnit.Runner.spawn_test_monitor/4



Finished in 60.0 seconds (0.00s async, 60.0s sync)
1 test, 1 failure

Randomized with seed 567196

As it happens, we hit a timeout error after 60 seconds. Fortunately, as part of the stack trace, we get a couple of suggestions on how to solve this problem. For now, update the test suite with this code:

defmodule BencheeUnitTest do
  use ExUnit.Case
  alias Application.TestHelper

  @tag :benchmark
  @tag timeout: :infinity
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "generate_list" => fn(input) ->
        FibonacciBenchmarking.list(input)
      end
    },
    inputs: %{
      "case_10" => 10,
      "case_100" => 100,
      "case_1000" => 1000,
      "case_10000" => 10000,
      "case_100000" => 100000
    })

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

Note: Depending on your system, running that last scenario will take a while; feel free to remove it to continue with the tutorial.

Now that we have a baseline of our Fibonacci sequence generator's performance, a common and useful exercise is to compare the performance of different implementations of the same algorithm. In this case, we have an alternative implementation of the Fibonacci sequence generator based on a recursive function.

Start by updating the lib/fibonacci_benchmarking.ex file with the following code:

defmodule FibonacciBenchmarking do
  def list(number), do: Enum.map(0..number, &fibonacci/1)
  def list_alternate(number), do: Stream.unfold({0,1}, fn {a,b} -> {a,{b,a+b}} end) |> Enum.take(number)

  def fibonacci(0), do: 0
  def fibonacci(1), do: 1
  def fibonacci(n), do: fibonacci(0, 1, n-2)

  def fibonacci(_, prv, -1), do: prv
  def fibonacci(prvprv, prv, n) do
      next = prv + prvprv
      fibonacci(prv, next, n-1)
  end
end

Following that, we will update the test/benchee_unit_test.exs file to account for both implementations:

defmodule BencheeUnitTest do
  use ExUnit.Case

  @tag :benchmark
  @tag timeout: :infinity
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "generate_list_enum" => fn(input) ->
        FibonacciBenchmarking.list(input)
      end,
      "generate_list_stream" => fn(input) ->
        FibonacciBenchmarking.list_alternate(input)
      end
    },
    inputs: %{
      "case_10" => 10,
      "case_100" => 100,
      "case_1000" => 1000,
      "case_10000" => 10000,
    })

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

The update test case will run two scenarios side by side for each of the prescribed inputs, letting us compare their overall performance. Go ahead and run mix test to see the results.

Compiling 1 file (.ex)
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Number of Available Cores: 8
Available memory: 62.76 GB
Elixir 1.13.3
Erlang 24.2.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: case_10, case_100, case_1000, case_10000
Estimated total run time: 56 s

Benchmarking generate_list_enum with input case_10 ...
Benchmarking generate_list_enum with input case_100 ...
Benchmarking generate_list_enum with input case_1000 ...
Benchmarking generate_list_enum with input case_10000 ...
Benchmarking generate_list_stream with input case_10 ...
Benchmarking generate_list_stream with input case_100 ...
Benchmarking generate_list_stream with input case_1000 ...
Benchmarking generate_list_stream with input case_10000 ...

##### With input case_10 #####
Name                           ips        average  deviation         median         99th %
generate_list_stream        1.66 M      601.29 ns  ±3858.34%         441 ns         955 ns
generate_list_enum          1.35 M      742.83 ns  ±3201.26%         610 ns        1164 ns

Comparison:
generate_list_stream        1.66 M
generate_list_enum          1.35 M - 1.24x slower +141.55 ns

##### With input case_100 #####
Name                           ips        average  deviation         median         99th %
generate_list_stream      258.49 K        3.87 μs   ±512.17%        3.29 μs        8.04 μs
generate_list_enum         31.71 K       31.54 μs    ±26.84%       30.57 μs       41.17 μs

Comparison:
generate_list_stream      258.49 K
generate_list_enum         31.71 K - 8.15x slower +27.67 μs

##### With input case_1000 #####
Name                           ips        average  deviation         median         99th %
generate_list_stream       17.67 K      0.0566 ms    ±14.14%      0.0550 ms      0.0988 ms
generate_list_enum         0.102 K        9.84 ms     ±9.92%        9.60 ms       14.14 ms

Comparison:
generate_list_stream       17.67 K
generate_list_enum         0.102 K - 173.90x slower +9.78 ms

##### With input case_10000 #####
Name                           ips        average  deviation         median         99th %
generate_list_stream        501.16      0.00200 s    ±33.05%      0.00190 s      0.00370 s
generate_list_enum            0.27         3.75 s    ±12.05%         3.75 s         4.06 s

Comparison:
generate_list_stream        501.16
generate_list_enum            0.27 - 1876.93x slower +3.74 s
.

Finished in 65.3 seconds (0.00s async, 65.3s sync)
1 test, 0 failures

Randomized with seed 839542

As you can see, our list function's Enum implementation is much slower than the Stream implementation, especially when the input size is larger. Comparing the performance of the two implementations is valuable in understanding the trade-offs and will help you develop more performant applications.

When adding benchmarking tests to parts of your automated testing, consider the potential drawbacks, such as the increased time it takes to run the tests. In this case, the benchmarking tests are tagged with :benchmark and can be excluded from the default test suite. This allows us to run the benchmarking tests separately from the unit tests and only when we need to.

A much better approach is to take advantage of CI/CD pipeline integration like GitHub Actions and run the benchmarking tests as part of the pull request validation process. This way, we can run the benchmarking tests as part of the CI/CD pipeline and get the results without having to run the tests locally.

Improving Benchee Reporting

Now, while seeing results on the console can be useful for a quick glance, the console is not the most convenient way to share results with your team. Benchee provides a number of different ways to export your results to a file.

For this example, we will use benchee_html to generate an HTML report with our benchmarking test results. To do this, we will add the benchee_html dependency to our mix.exs file:

def deps do
  [
    ...
    {:benchee_html, "~> 1.0", only: [:dev, :test]}
  ]
end

Next, we will update the test/benchee_unit_test.exs file to generate the HTML report:

defmodule BencheeUnitTest do
  use ExUnit.Case

  @tag :benchmark
  @tag timeout: :infinity
  test "benchmark fibonacci list generation" do
    # capture benchee output to run assertions
    output = Benchee.run(%{
      "generate_list_enum" => fn(input) ->
        FibonacciBenchmarking.list(input)
      end,
      "generate_list_stream" => fn(input) ->
        FibonacciBenchmarking.list_alternate(input)
      end
    },
    inputs: %{
      "case_10" => 10,
      "case_100" => 100,
      "case_1000" => 1000,
      "case_10000" => 10000,
    },
    formatters: [
      Benchee.Formatters.HTML,
      Benchee.Formatters.Console
    ])

    results = Enum.at(output.scenarios, 0)
    assert results.run_time_data.statistics.average <= 50_000_000
  end
end

Let's go ahead and run the tests again:

mix test

On completion, you should see the following report open in your browser:

The HTML report provides a much more detailed view of the benchmarking results and allows us to share results with our team easily. For example:

In addition to the HTML report, Benchee also supports exporting results to JSON, CSV, and XML formats. Exporting results to a file is a great way to integrate them with automation, such as CI/CD pipelines.

Monitoring Your Elixir App in Production

Benchee can help you discover potential performance bottlenecks, but what about how fast things really are in your production app?

To be able to discover new and existing bottlenecks, and solve bugs and other issues your users may face, you need to use an APM. AppSignal has been supporting Elixir developers for years and seamlessly integrates with your app. Bonus: We're the only APM that ships stroopwafels to new users 😎

Wrapping Up and Next Steps

In this tutorial, we discovered how to benchmark Elixir applications with the Benchee library.

We also learned how to compare the performance of different implementations of the same algorithm.

Yet we have only scratched the surface of Benchee's capabilities. As a next step, I highly encourage you to explore the available Benchee configuration options and visualization plugins.

Happy coding!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

DEV Community

Benchmark Your Elixir App's Performance with Benchee

Why Do I Need to Benchmark My Elixir Application?

What Is Benchee?

Prerequisites

Elixir Environment

Setting Up Our Elixir Application

Implement Benchmarking on an Elixir Application

Automate Benchee for Elixir and Run Tests

Improving Benchee Reporting

Monitoring Your Elixir App in Production

Wrapping Up and Next Steps

Top comments (0)

Read next

A Introduction to Understanding Cloud Technology

12 Projects In 12 Months Challenge

GenAIScript - Comment Code with AI

Exploring AWS EKS Auto Mode: A Simplified Kubernetes Experience