Modern distributed systems rely on many external dependencies. Payments,
user verification, notifications, data enrichment and these often live
outside your control. When one of those services becomes unhealthy, your
architecture needs to protect itself.
In this article, we will look at how the Circuit Breaker pattern helps you prevent cascading failures, reduce wasted retries, and allow systems to recover gracefully. We will also walk through an AWS Lambda implementation using DynamoDB as the state store and see how the entire flow works in real serverless applications.
What the Circuit Breaker Pattern Helps You Achieve
- Detects when a dependency becomes unhealthy
- Stops sending requests that are likely to fail
- Gives downstream systems time to recover
- Performs periodic test calls to check if the service is healthy again
- Works extremely well with a storage-first approach
- Prevents cascading failures and wasted retries
- Introduces a small amount of latency and state handling
- Requires tuning of thresholds and cooldown settings
Retries are helpful only up to a point. Once a dependency is clearly
failing, additional retries simply increase the pressure on an already struggling service.
This is where the Circuit Breaker protects the rest of your architecture.
A Simple Way to Think About It
Think of it like an electrical circuit
Closed → electricity flows and everything works
Open → flow stops until it is safe again
When the path is closed, electricity flows freely and the system stays healthy.
When the circuit opens, the flow stops in order to prevent damage.
Your architecture behaves the same way. If a downstream dependency fails too often, you open the circuit and stop sending calls for a while. After a cooldown period, you allow a few test calls in a half-open state to check if things recovered.
Architecture: Circuit Breaker with AWS Lambda, DynamoDB and an External API
OPEN State Diagram
Flow Description
- A request enters through Amazon API Gateway which triggers the Lambda function
- The Lambda function checks the circuit breaker status stored in DynamoDB
- If the circuit is open, the Lambda function fails fast and returns the result
- If the circuit is closed, the Lambda function calls the external dependency (Stripe in this example)
- If the call fails, the failure counter is increased
- If the failure counter crosses the configured threshold, the circuit opens and calls stop for a while
- After the cooldown period, the system transitions into half-open state
- A few test calls decide if the dependency recovered. If successful, the circuit closes again. If not, it reopens
Closed and Half-Open Flow
Description
- A request enters through API Gateway
- The Lambda function reads the current circuit status
- Since it is in closed or half-open state, it makes a call to the external API
- If the call fails, the failure count increments
- An error is returned to the customer
- If the call succeeds, the failure count is decreased or reset
- The value is returned to the caller
Circuit Recovery Flow
- When the number of failures exceed the threshold, the circuit opens
- After a cooldown window, the system moves into half-open state
- A few test calls determine if the dependency is healthy again
- If successful, the circuit closes and traffic resumes
- If failures continue, the circuit reopens and the cycle repeats
This makes the system self-healing without manual intervention.
Implementing a Circuit Breaker in AWS Lambda
Below is a simple TypeScript example showing how this pattern can be
implemented using Lambda and DynamoDB.
How the Code Works
- Load circuit state from DynamoDB
- If no record exists, a new one is created with
closedstate
- If no record exists, a new one is created with
- If the circuit is OPEN
- Check if the cooldown has passed
- If cooldown expired → move to HALF-OPEN
- Otherwise → fail fast without calling the dependency
- Make the dependency call (e.g., Stripe)
- If successful:
- If HALF-OPEN → close the circuit and reset the failure counter
- If CLOSED → keep the circuit closed
- If failed:
- Increment failure counter atomically
- If threshold reached → open the circuit and set
openedAt
- If successful:
- Update DynamoDB with conditional writes
- Prevents concurrent Lambdas from overwriting each other
DynamoDB Table Structure
{
"serviceName": "stripe",
"state": "closed",
"failureCount": 0,
"openedAt": 0,
"lastUpdated": 1732098200
}
Lambda Function (TypeScript)
import {
DynamoDBClient,
GetItemCommand,
UpdateItemCommand,
} from "@aws-sdk/client-dynamodb";
import axios from "axios";
const TABLE = process.env.CIRCUIT_TABLE!;
const FAILURE_THRESHOLD = 5;
const COOLDOWN_SECONDS = 60;
const db = new DynamoDBClient({});
export async function handler(event: any) {
const status = await getCircuitStatus("stripe");
// OPEN → maybe move to HALF-OPEN
if (status.state === "open") {
const now = Math.floor(Date.now() / 1000);
const elapsed = now - status.openedAt;
if (elapsed >= COOLDOWN_SECONDS) {
await setState("stripe", "half-open", status.failureCount, status.openedAt);
status.state = "half-open";
} else {
return failFast("Circuit is open, failing fast");
}
}
// Attempt dependency call
try {
const response = await axios.get(
"https://api.stripe.com/health-check",
{ timeout: 3000 }
);
// HALF-OPEN success → CLOSE circuit
if (status.state === "half-open") {
await setState("stripe", "closed", 0, 0);
}
// CLOSED → remain closed (do not increase failures)
return response.data;
} catch (err) {
// Failure: increment counter atomically
const updatedCount = await incrementFailure("stripe");
// Trip the circuit
if (updatedCount >= FAILURE_THRESHOLD) {
const now = Math.floor(Date.now() / 1000);
await setState("stripe", "open", updatedCount, now);
}
throw new Error("Dependency call failed");
}
}
// Load or initialize circuit state
async function getCircuitStatus(service: string) {
const result = await db.send(new GetItemCommand({
TableName: TABLE,
Key: { serviceName: { S: service } },
}));
if (!result.Item) {
// Initialize default state
await setState(service, "closed", 0, 0);
return {
state: "closed",
failureCount: 0,
openedAt: 0,
lastUpdated: 0
};
}
return {
state: result.Item.state.S,
failureCount: Number(result.Item.failureCount.N),
openedAt: Number(result.Item.openedAt.N),
lastUpdated: Number(result.Item.lastUpdated.N)
};
}
// Atomic increment of failureCount
async function incrementFailure(service: string) {
const now = Math.floor(Date.now() / 1000);
const result = await db.send(new UpdateItemCommand({
TableName: TABLE,
Key: { serviceName: { S: service } },
UpdateExpression: "SET failureCount = failureCount + :inc, lastUpdated = :now",
ExpressionAttributeValues: {
":inc": { N: "1" },
":now": { N: String(now) },
},
ReturnValues: "UPDATED_NEW",
}));
return Number(result.Attributes.failureCount.N);
}
// Controlled state changes (atomic write)
async function setState(
service: string,
state: string,
failureCount: number,
openedAt: number
) {
const now = Math.floor(Date.now() / 1000);
await db.send(new UpdateItemCommand({
TableName: TABLE,
Key: { serviceName: { S: service } },
UpdateExpression:
"SET #s = :s, failureCount = :fc, openedAt = :oa, lastUpdated = :now",
ExpressionAttributeNames: {
"#s": "state",
},
ExpressionAttributeValues: {
":s": { S: state },
":fc": { N: String(failureCount) },
":oa": { N: String(openedAt) },
":now": { N: String(now) },
}
}));
}
// Fast failure response
function failFast(message: string) {
return {
statusCode: 503,
body: JSON.stringify({ message }),
};
}
Final Thoughts
The Circuit Breaker pattern is one of the simplest ways to make your architecture more resilient, cost-efficient, and self-healing. Combined with storage-first ingestion, exponential backoff, and reasonable retry policies, it becomes a powerful tool in any distributed system.
If you are building production systems that depend on external services or shared infrastructure, this pattern deserves a place in your toolbox.


Top comments (0)