DEV Community

Cover image for Add Resiliency To Your Lambda Function with a Circuit Breaker
Matt Coulter for CDK Patterns

Posted on

Add Resiliency To Your Lambda Function with a Circuit Breaker

The Lambda Circuit Breaker

This pattern takes advantage of the awesome circuitbreaker-lambda library from Gunnar Grosch

View Codebase:
https://github.com/cdk-patterns/serverless/tree/master/the-lambda-circuit-breaker

To Clone:

//TypeScript
npx cdkp init the-lambda-circuit-breaker
//Python
npx cdkp init the-lambda-circuit-breaker --lang=python
Enter fullscreen mode Exit fullscreen mode

AWS Well Architected

The AWS Well-Architected Framework helps you understand the pros and cons of
decisions you make while building systems on AWS. By using the Framework, you will learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. It provides a way for you to consistently measure your architectures against best practices and identify areas for improvement.

We believe that having well-architected systems greatly increases the likelihood of business success.

The Reliability Pillar

Note - The content for this section is a subset of the Serverless Lens Whitepaper with some minor tweaks.

The reliability pillar includes the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

REL 2: How are you building resiliency into your serverless application?

Evaluate scaling mechanisms for Serverless and non-Serverless resources to meet customer demand, and build resiliency to withstand partial and intermittent failures across dependencies.

Best Practices:

1 / Manage transaction, partial, and intermittent failures: Transaction failures might occur when components are under high load. Partial failures can occur during batch processing, while intermittent failures might occur due to network or other transient issues.

What's Included In This Pattern?

This is an implementation of the simple webservice pattern only instead of our Lambda Function using DynamoDB to store and retrieve data for the user it is being used to tell our Lambda Function if the webservice it wants to call is reliable right now or if it should use a fallback function.

To demonstrate this behaviour the Lambda function has some logic in it to simulate failure. The below logic will randomly fail:

function unreliableFunction () {
  return new Promise((resolve, reject) => {
    if (Math.random() < 0.6) {
      resolve({ data: 'Success' })
      message = 'Success'
    } else {
      reject({ data: 'Failed' })
      message = 'Failed'
    }
  })
}
Enter fullscreen mode Exit fullscreen mode

Then we have a circuitbreaker configured with a fallback mechanism for when it has failed too many times recently:

function fallbackFunction () {
  return new Promise((resolve, reject) => {
    resolve({ data: 'Expensive Fallback Successful' })
    message = 'Fallback'
  })
}

const options = {
  fallback: fallbackFunction,
  failureThreshold: 3,
  successThreshold: 2,
  timeout: 10000
}

exports.handler = async (event:any) => {
  const circuitBreaker = new CircuitBreaker(unreliableFunction, options)
  await circuitBreaker.fire()
  const response = {
    statusCode: 200,
    body: JSON.stringify({
      message: message
    })
  }
  return response
}
Enter fullscreen mode Exit fullscreen mode

When You Would Use This Pattern

When integrating with an external service and you want to provide a cost effective, resilient service for your consumers. If you don't do this and the external service is down you will be paying lambda invocation costs for the full request timeout plus your consumers will be waiting for an ultimately frustrating experience.

How To Test This Pattern

After you deploy this pattern you will have a url for an API Gateway where if you open it in a browser you will get a JSON payload back with one of two messages:

// when the circuit is closed and the unreliable function worked
{
   "message": "Success"
}
// when the circuit is open
{
   "message": "Fallback"
}
Enter fullscreen mode Exit fullscreen mode

So refresh the browser a few times and you will see the message switch to Fallback then if you open the CloudWatch logs for your Lambda Function as you hit it again you should be able to watch the circuit breaker change state

You will see messages like:

  • INFO CircuitBreaker state: OPEN
  • INFO CircuitBreaker state: HALF
  • INFO CircuitBreaker state: CLOSED

Open means that no requests are going through to the unreliable function, half means that some requests are let through to test the stability of the unreliable function and closed means operating as normal.

Oldest comments (0)