DEV Community

Allan MacGregor 🇨🇦
Allan MacGregor 🇨🇦

Posted on • Originally published at allanmacgregor.com on

Circuit Breaker Pattern in Elixir

Circuit breaker is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties.

In the age of microservices, we are more than likely to have services that are calling and are dependent on external services outside of control.

Remote services can hang, fail or become unresponsive. How can we prevent those failures from cascading through the system? from taking critical resources?

Enter the Circuit breaker pattern; the pattern was popularized in the book Release It by Michael Nygard and by thought leaders like Martin Fowler




Circuit Breaker Pattern

The idea behind this pattern is very simple; Failures are inevitable, and trying to prevent them altogether is not realistic.

A way to handle these failures is wrapping these operations into some kind of proxy. This proxy is responsible for monitoring recent failures, and use this information to decide whether to allow the operation to proceed or return an early failure instead.

This proxy is typically implemented as a state machine that mimics the functinonality of a physical circuit breaker which my have 3 states:

  • Closed: In this state the circuit breaker let’s all the requests go through while keeping track of the number of recent failures, and if the number of failures exceeds a specific threshold within a specific timeframe, it will switch to the Open state.
  • Open: In this state, any requests are not sent to external service instead we either fail immediately returning an extension or fall back to a secondary system like a cache.
  • Half-Open: In this state, a limited number of requests from the application are allowed to pass-through and call our external service. Depending on the result of these requests the circuit breaker will either flip to a Closed state or go back to the Open state reseting the counter before trying to open again.

The Circuit Breaker pattern offers a few key advantages worth noting:

  • The Half-Open state gives the external system from recovering without getting flooded.
  • The Open state implementation gives options for how we want to handle failure wether failing right away or falling back to a caching layer or secondary system.
  • This pattern can also be leveraged to help to maintain response times by quickly rejecting calls that are likely to fail or timeout.

Example

For our example, let’s imagine that we have the following scenario:

We are running a job board aggregator that will consume job postings from Github and other sources. However, since we are consuming a few API we run the risk of that API to hit limits or be down.

Let’s start by creating an example API connector to Github Jobs that retrieves the latest 50 jobs posted:

defmodule CircuitBreaker.Api.GithubJobs do 
    ...
    @spec get_positions :: none
    def get_positions do
        case HTTPoison.get(url()) do
        {:ok, response} -> {:ok, parse_fields(response.body)}
        {:error, %HTTPoison.Error{id: _, reason: reason}} -> {:error, reason}
        end
    end
    ...
end
Enter fullscreen mode Exit fullscreen mode

file : lib/circuit_breaker/api/github_jobs.ex

All this connector is doing is making a request to jobs.github.com retrieving the json, parsing, and returning the list of jobs. If we want to test this we can manually call get_positions on our console:

iex(1)> CircuitBreaker.Api.GithubJobs.get_positions
{:ok,
 ["Software Engineer", "Backend Engineer (w/m/d)",
  "Senior Frontend Engineer (f/m/d)", ...]}
Enter fullscreen mode Exit fullscreen mode

Circuit Breaker Switch

Now that we have ability to make calls to get the job postings we need to build our circuit breaker to wrap around the the Api adapter. Let’s take a look at a skeleton for our switch.

defmodule CircuitBreaker.Api.Switch do
  use GenStateMachine, callback_mode: :state_functions

  @name :circuit_breaker_switch
  @error_count_limit 8
  @time_to_half_open_delay 8000

  def start_link do
    GenStateMachine.start_link( __MODULE__ , {:closed, %{error_count: 0}}, name: @name)
  end

  def get_positions do
    GenStateMachine.call(@name, :get_positions)
  end
  ...
end
Enter fullscreen mode Exit fullscreen mode

file : lib/circuit_breaker/api/switch.ex

For implementing our circuit breaker we could use the gen_statem behavior directly or in this case leverage the GenStateMachine package which gives us tracking, error reporting and will work with the supervision tree.

The first two functions we added are:

  • start_link: will start the circuit breaker with an initial state and a specific name.
  • get_positions: this is our public client api that wraps around the Github Jobs adapter we just built.

An important thing to note here is the first line:

  use GenStateMachine, callback_mode: :state_functions
Enter fullscreen mode Exit fullscreen mode

In this callback mode, every time you do a call/3 or a cast/2, the message will be handled by the state_name/3 function which is named the same as the current state. In this case our state_name functions will be open, closed, half_open.

Let’s go ahead and start by adding our closed state code:

  def closed({:call, from}, :get_positions, data) do
    case CircuitBreaker.Api.GithubJobs.get_positions() do
      {:ok, positions} ->
        {:keep_state, %{error_count: 0}, {:reply, from, {:ok, positions}}}
      {:error, reason} ->
        handle_error(reason, from, %{ data | error_count: data.error_count + 1 })
    end
  end
Enter fullscreen mode Exit fullscreen mode

file : lib/circuit_breaker/api/switch.ex

All we are doing is calling the Api adapter get_positions and depending on the results we are either returning the positions list or handling the error.

Let’s go ahead and jump into the terminal and try to get the list of positions through our circuit breaker:

iex(1)> CircuitBreaker.Api.Switch.start_link
{:ok, #PID<0.231.0>}
iex(2)> CircuitBreaker.Api.Switch.get_positions
{:ok,
 ["Software Engineer", "Backend Engineer (w/m/d)",
  "Senior Frontend Engineer (f/m/d)", ...]}
Enter fullscreen mode Exit fullscreen mode

Let’s add the function for the other two states and review how the circuit state change works:

  def half_open({:call, from}, :get_positions, data) do
    case CircuitBreaker.Api.GithubJobs.get_positions() do
      {:ok, positions} ->
        {:next_state, :closed, %{count_error: 0}, {:reply, from, {:ok, positions}}}
      {:error, reason} ->
        open_circuit(from, data, reason, @time_to_half_open_delay)
    end
  end

  def open({:call, from}, :get_positions, data) do
    {:keep_state, data, {:reply, from, {:error, :circuit_open}}}
  end

  def open(:info, :to_half_open, data) do
    {:next_state, :half_open, data}
  end
Enter fullscreen mode Exit fullscreen mode

And let’s add a couple of private utility functions:

  defp handle_error(reason, from, data = %{error_count: error_count}) when error_count > @error_count_limit do
      open_circuit(from, data, reason, @time_to_half_open_delay)
  end

  defp handle_error(reason, from, data) do
    {:keep_state, data, {:reply, from, {:error, reason}}}
  end

  defp open_circuit(from, data, reason, delay) do
    Process.send_after(@name, :to_half_open, delay)
    {:next_state, :open, data, {:reply, from, {:error, reason}}}
  end
Enter fullscreen mode Exit fullscreen mode

Most of the magic is happening on the open_circuit function which is doing two things:

  • First, we schedule a message to set our circuit breaker state to half_open after our specified delay
  • Second, we return a new state setting the circuit breaker fully open

After 8000 milliseconds , the circuit breaker now on open state will receive our scheduled message and change the state to half_open.

Finally, during half_open state, we will try to make the calls to the api endpoint and in case of failure, we will switch automatically back to fully open and try again.

Conclusions

Circuit Breakers are a valuable pattern to have in our arsenal, as they can help increase system stability and have a more reliable way to handle errors with remote services.

This example just scratched the surface of what you can do with circuit breakers, there is plenty of opportunities to expand this pattern further, e.g.

  • Improve the logic for tripping the breaker by also looking at the type of errors, and frequency.
  • Add Monitoring and logging once the circuit breaker changes state
  • Fallback to a secondary service or cache layer before returning failure.

Finally, as with any pattern is important to keep in mind the use case and decided if this kind of behavior is desired.

The full code for this example can be found in circuit_breaker_example

Further Reading

Top comments (0)