We migrated 15 Lambdas to Step Functions to solve cold starts, but what we gained in warmness, we lost in EventBridge message delays. Here's the hidden pattern that changed everything.
Introduction to Step Functions and EventBridge
Step Functions and EventBridge are two AWS services that can be used to build scalable and event-driven architectures. Step Functions provides a way to orchestrate the components of distributed applications and microservices, while EventBridge provides a way to handle events from various sources.
import { DescribeStateMachineCommand } from '@aws-sdk/client-sfn';
import { DescribeEventBusCommand } from '@aws-sdk/client-eventbridge';
const stepFunctionsClient = new StepFunctionsClient({ region: 'us-east-1' });
const eventBridgeClient = new EventBridgeClient({ region: 'us-east-1' });
const describeStateMachineCommand = new DescribeStateMachineCommand({
stateMachineArn: 'arn:aws:states:us-east-1:123456789012:stateMachine:MyStateMachine',
});
const describeEventBusCommand = new DescribeEventBusCommand({
Name: 'default',
});
stepFunctionsClient.send(describeStateMachineCommand).then((data) => {
console.log(data);
});
eventBridgeClient.send(describeEventBusCommand).then((data) => {
console.log(data);
});
Be careful when using Step Functions and EventBridge, as the default retry mechanism in Step Functions can silently drop EventBridge messages if not properly configured, leading to data loss and processing delays.
Our Initial Migration: From Lambda to Step Functions
When we initially migrated our Lambdas to Step Functions, we noticed a significant reduction in cold starts. However, we also noticed that EventBridge messages were being delayed, which was affecting the overall performance of our application.
import { StartExecutionCommand } from '@aws-sdk/client-sfn';
import { PutEventsCommand } from '@aws-sdk/client-eventbridge';
const stepFunctionsClient = new StepFunctionsClient({ region: 'us-east-1' });
const eventBridgeClient = new EventBridgeClient({ region: 'us-east-1' });
const startExecutionCommand = new StartExecutionCommand({
stateMachineArn: 'arn:aws:states:us-east-1:123456789012:stateMachine:MyStateMachine',
input: '{}',
});
const putEventsCommand = new PutEventsCommand({
Entries: [
{
EventBusName: 'default',
Source: 'my-source',
DetailType: 'my-detail-type',
Detail: '{"key": "value"}',
},
],
});
stepFunctionsClient.send(startExecutionCommand).then((data) => {
console.log(data);
});
eventBridgeClient.send(putEventsCommand).then((data) => {
console.log(data);
});
The
Standard workflow execution history 25,000 event limitin Step Functions can cause issues if your workflows are long-running or have many events. To mitigate this, you can use theDescribeExecutionAPI to fetch the execution history in chunks.
The Unseen Delay: EventBridge Message Queuing in Step Functions
When we dug deeper, we found that the delay in EventBridge messages was due to the way Step Functions was handling the messages. By default, Step Functions uses a retry mechanism that can cause messages to be delayed or even dropped.
import { DescribeExecutionCommand } from '@aws-sdk/client-sfn';
import { DescribeEventBusCommand } from '@aws-sdk/client-eventbridge';
const stepFunctionsClient = new StepFunctionsClient({ region: 'us-east-1' });
const eventBridgeClient = new EventBridgeClient({ region: 'us-east-1' });
const describeExecutionCommand = new DescribeExecutionCommand({
executionArn: 'arn:aws:states:us-east-1:123456789012:execution:MyStateMachine:MyExecution',
});
const describeEventBusCommand = new DescribeEventBusCommand({
Name: 'default',
});
stepFunctionsClient.send(describeExecutionCommand).then((data) => {
console.log(data);
});
eventBridgeClient.send(describeEventBusCommand).then((data) => {
console.log(data);
});
If you're using EventBridge Pipes, be aware of the
5-second filter evaluation limit. If your filter evaluation takes longer than this, it will fail silently, causing your messages to be dropped.
Rethinking State Machine Design for Timely Event Processing
To mitigate the delay in EventBridge messages, we rethought our state machine design. We implemented a message queueing mechanism that allowed us to handle messages in a timely manner.
import { UpdateStateMachineCommand } from '@aws-sdk/client-sfn';
import { PutRuleCommand } from '@aws-sdk/client-eventbridge';
const stepFunctionsClient = new StepFunctionsClient({ region: 'us-east-1' });
const eventBridgeClient = new EventBridgeClient({ region: 'us-east-1' });
const updateStateMachineCommand = new UpdateStateMachineCommand({
stateMachineArn: 'arn:aws:states:us-east-1:123456789012:stateMachine:MyStateMachine',
definition: '{}',
});
const putRuleCommand = new PutRuleCommand({
Name: 'MyRule',
EventBusName: 'default',
EventPattern: '{}',
});
stepFunctionsClient.send(updateStateMachineCommand).then((data) => {
console.log(data);
});
eventBridgeClient.send(putRuleCommand).then((data) => {
console.log(data);
});
When implementing a message queueing mechanism, be aware of the
10,000 concurrent child execution limitin Step Functions. If you exceed this limit, you'll get an error with the messageConcurrentExecutionLimitExceeded.
Best Practices for Integrating Step Functions with EventBridge
When integrating Step Functions with EventBridge, it's essential to follow best practices to avoid common pitfalls.
import { DescribeStateMachineCommand } from '@aws-sdk/client-sfn';
import { DescribeEventBusCommand } from '@aws-sdk/client-eventbridge';
const stepFunctionsClient = new StepFunctionsClient({ region: 'us-east-1' });
const eventBridgeClient = new EventBridgeClient({ region: 'us-east-1' });
const describeStateMachineCommand = new DescribeStateMachineCommand({
stateMachineArn: 'arn:aws:states:us-east-1:123456789012:stateMachine:MyStateMachine',
});
const describeEventBusCommand = new DescribeEventBusCommand({
Name: 'default',
});
stepFunctionsClient.send(describeStateMachineCommand).then((data) => {
console.log(data);
});
eventBridgeClient.send(describeEventBusCommand).then((data) => {
console.log(data);
});
When using EventBridge, be aware of the
EventBridge delivery delay under high loadwhich can reach 30+ seconds. To mitigate this, you can use a message queueing mechanism to handle messages in a timely manner.
The Takeaway
Here are some key takeaways when using Step Functions and EventBridge:
- Always configure the retry mechanism in Step Functions to avoid silently dropping EventBridge messages.
- Be aware of the
Standard workflow execution history 25,000 event limitin Step Functions. - Use a message queueing mechanism to handle EventBridge messages in a timely manner.
- Be aware of the
10,000 concurrent child execution limitin Step Functions. - Use EventBridge Pipes with caution, as the
5-second filter evaluation limitcan cause messages to be dropped. - Monitor your EventBridge delivery delay under high load to ensure timely message processing.
Console output:
{
"executionArn": "arn:aws:states:us-east-1:123456789012:execution:MyStateMachine:MyExecution",
"stateMachineArn": "arn:aws:states:us-east-1:123456789012:stateMachine:MyStateMachine",
"status": "RUNNING",
"startDate": "2026-06-24T14:30:00.000Z"
}
{
"Name": "default",
"Arn": "arn:aws:events:us-east-1:123456789012:event-bus/default",
"Policy": "{}"
}
Transparency notice
AI-crafted with Groq, powered by LLaMA 3.3 70B.
The topic was scouted from live AWS and Node.js ecosystem signals, and the content —
including all code examples — was written autonomously without human editing.Published: 2026-06-24 · Primary focus: StepFunctions
All code blocks are intended to be correct and runnable, but please verify them
against the official AWS SDK v3 docs
before using in production.Find an error? Drop a comment — corrections are always welcome.
Top comments (0)