DEV Community

loading...
Cover image for Circuit Breaker Pattern and it's States: Build better software

Circuit Breaker Pattern and it's States: Build better software

boxpiperapp profile image BOX PIPER Originally published at boxpiper.com Updated on ・6 min read

Init

In any application, different sets of services/third-party api's communicate either asynchronously (out of score from current context) or synchronously or sometimes both(rare cases).

In Synchronous paradigm, a service (i.e. a caller) calls to another service (i.e. a supplier) and waits until a response is available.
And here lies an issue because it's very likely that the supplier is either in an unusable or unresponsive state due to high latency or it's offline. Due to which resources like threads will get exhausted. The caller will not be able to handle further incoming requests and starts failing too.

This will leads to a cascading effect on other parts of the systems and finally, it ended up with an application break down.

circuit-breaker-2

What's the solution for building a resilient system?

The solution to this problem is the Circuit Breaker Pattern.

Note: This article is taken from it's original source BoxPiper blog.

Circuit breaker design was originated to protect electrical circuits from damage. It's a switch which is designed to stop the flow of current in an electric circuit as a safety measures to prevent overload or short circuit in case of fault detection.

As per Martin Fowler: The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.

- https://martinfowler.com/bliki/CircuitBreaker.html

Explanation:

A caller client invokes a remote service via proxy or protected function that functions similarly to an electrical circuit breaker. When the number of consecutive failures crosses a threshold, the circuit breaker trips, and for the duration of a timeout period all attempts to invoke the remote service will fail immediately.

After the timeout expires the circuit breaker allows a limited number of test requests to pass through. If those requests succeed the circuit breaker resumes normal operation. Otherwise, if there is a failure the timeout period begins again. The circuit breaker has defined 3 states for processing.

circuit-breaker-1

Circuit Breaker States:

  • Closed: It is the default state where requests pass freely and things are working as expected (running smoothly). The state will move from closed to open when the number of failures exceeds the defined threshold and breaker trips.

                    ->->->O->->->
    
  • Open: This state rejects all requests for a defined amount of time, returns an error for calls without executing the function or will not request to the service that was actually requesting. Once the breaker trips, it enters the open state. At this point, any requests to the service will fail automatically.

                    ->->->O
    
  • Half-Open: This state acts as a testing state. After a timeout period from Open State, the breaker allows a set number of requests through in order to test the status of the resource. The half-open state determines if the circuit returns to closed or open. If the circuit stabilises i.e requests completed successfully, then it will move to closed state otherwise breaker is tripped and moves back to open state.

                    ->->->O->
    

Generally, these states are represented in colours as well. Closed as Green, Open as Red and Half Open as Yellow.

Determining thresholds:

In an open state, a simple circuit breaker will need an external intervention to reset it to a closed state when a supplier service is up and running.

A self-resetting behaviour is required in Software Circuit Breaker, which retries after a suitable interval so that the state can move from open to closed. Hence we required threshold value.

Resource-utilisation, uptime, latency, traffic, error rates in a given time-frame, server timeout, increase in errors or failures, failing status codes, and unexpected response types are some of the criteria determines thresholds for the states using monitoring solutions and data analysis of an application.

Implementation:

An opossum is a Node.js circuit breaker that executes asynchronous functions and monitors their execution status. When things start failing, opossum plays dead and fails fast.

Working Code:

Success Example:

RunKit Notebook: https://runkit.com/sanmak/opossum-success-request

const circuitBreaker = require("opossum"); const axios = require("axios"); async function asyncFunctionThatCouldFail(x, y) { const apiCall = await axios .get("https://api.jsonapi.co/rest/v1/speech-to-text/news") .then(function(response) { // handle success console.log(response); }) .catch(function(error) { // handle error console.log(error); }) .then(function() { // always executed }); } const options = { timeout: 3000, // If our function takes longer than 3 seconds, trigger a failure errorThresholdPercentage: 50, // When 50% of requests fail, trip the circuit resetTimeout: 30000, // After 30 seconds, try again. }; const breaker = new circuitBreaker(asyncFunctionThatCouldFail, options); breaker.fallback(() => "Sorry, out of service right now"); breaker.on("fallback", (result) => { console.log(result); }); breaker.on("success", () => console.log("success")); breaker.on("failure", () => console.log("failed")); breaker.on("timeout", () => console.log("timed out")); breaker.on("reject", () => console.log("rejected")); breaker.on("open", () => console.log("opened")); breaker.on("halfOpen", () => console.log("halfOpened")); breaker.on("close", () => console.log("closed")); breaker .fire() .then(console.log) .catch(console.error);

Failure Example:

RunKit Notebook: https://runkit.com/sanmak/opossum-failure-request

const circuitBreaker = require("opossum"); const axios = require("axios"); async function asyncFunctionThatCouldFail(x, y) { const apiCall = await axios .get("https://apii.jsonapi.co/rest/v1/speech-to-text/news") .then(function(response) { // handle success console.log(response); }) .catch(function(error) { // handle error console.log(error); }) .then(function() { // always executed }); } const options = { timeout: 1, // If our function takes longer than 1 millisecond, trigger a failure errorThresholdPercentage: 50, // When 50% of requests fail, trip the circuit resetTimeout: 30000, // After 30 seconds, try again. }; const breaker = new circuitBreaker(asyncFunctionThatCouldFail, options); breaker.fallback(() => "Sorry, out of service right now"); breaker.on("fallback", (result) => { console.log(result); }); breaker.on("success", () => console.log("success")); breaker.on("failure", () => console.log("failed")); breaker.on("timeout", () => console.log("timed out")); breaker.on("reject", () => console.log("rejected")); breaker.on("open", () => console.log("opened")); breaker.on("halfOpen", () => console.log("halfOpened")); breaker.on("close", () => console.log("closed")); breaker .fire() .then(console.log) .catch(console.error);

Opossum offers a bunch of events to process different states of the breaker. As seen in above examples, we have the fallback, success, etc. and event with states like open, halfOpen, close offers an extensive set of features through which we can react to failures and handle gracefully with logging, retries, notification and etc.

States Workflow:

  • State Closed. Service works as expected.
  • Failures started coming in, could be a timeout, server errors or anything else. State Opened. Circuit breaker trips and threshold timeout started.
  • All requests coming in will fail immediately.
  • Threshold time ended, state changes to half-open.
  • Few numbers of request are now allowed through. On a set number of failures, the breaker will again move back to the open state and threshold time started.
  • Service will move back to closed state iff requests in the half-open state succeed.

With the above workflow, circuit breakers optimise resource usages which are tied up in an operations which are likely to fail. Timeouts for the client and an extensive load due to circuit breakage on a struggling server is avoided.

Libraries:

Importance:

With every service/third-party API's implementation in a system, it adds up an uncertainty around it. Circuit Breaker pattern helps in building resilient systems, handling error gracefully and prevents applications from failing through cascading.

Ending Note:

Circuit breakers are in the context of a microservice-heavy architecture as well as to applications that also rely heavily on third-party APIs because a single failure can cascade to other services.

They have grown in popularity with libraries like Hystrix from Netflix, which is a latency and fault tolerance library designed to enable resilience in complex distributed systems where failure is inevitable.

Circuit breakers are gold for monitoring. With proper logging, deeper monitoring can be done which can extensively reveal details like warnings about errors and issues of an application.

Read more about the pattern and strategies:

To read more such interesting topics, follow and read BoxPiper blog.

Support my work and buy me a Coffee. It'll mean the world to me. 😇

Discussion (0)

pic
Editor guide