DEV Community

Cover image for Taming the Map State Beast in AWS Step Functions with CDK

Taming the Map State Beast in AWS Step Functions with CDK

So, let me tell you about AWS Step Functions, and one of these is the Map State. If you’ve ever wrestled with this guy before, you know it can be like trying to untangle Christmas lights: infuriating, bewildering, and, eventually, you’ll start to doubt your existence. Don’t worry, you came to the right place, here I am to help. Here’s how I attacked this bad boy using AWS CDK, huh, and even included some jitter in there for kicks! (Yeah, you read that right: jitter. We’re getting fancy!)

The Setup: Why Map State?

person asking why

Map State in Step Functions is like that coworker who wants to do all tasks at once when that is not productive. It is best known for the execution of a task on more than one item of a list. But… and there’s always a “but,” it comes with its eccentricities. For instance, it requires to rewind the transaction and return its own custom payload. Oh, like, hey man, I didn’t get drafted into this position but somehow I’m here.

In my project, I am analyzing laboratory tests. As mentioned before, I leveraged the AWS CDK and instantiated a Distributed Map, with a max concurrency of 10: we’re efficient, but not harebrained. Here’s the kicker: by default, the Map State jeopardizes the pace and loves to stuff its payload straight into your workflow, which often becomes a hot mess. My solution? resultPath: JsonPath.DISCARD. Boom. Problem solved. No more unnecessary ‘clutter’ in my state machine. Not only that, it is like saying to the Map State that, We do not accept what you are giving. Basically, I heard a, “Just do your job and don’t mess it up like the people I work with do!” type of thing.

The Code: Map State Done Right

Image description

Here’s how I handled it in CDK:

const distributedMap = new DistributedMap(this, 'lab-results-distributed-map', {
  itemsPath: '$.Payload.serviceRequests',
  resultPath: JsonPath.DISCARD, // drop the payload like it’s hot
  maxConcurrency: 10,
  itemBatcher: new ItemBatcher({
    maxItemsPerBatch: 10,
  }),
  mapExecutionType: StateMachineType.EXPRESS, // more on this later!
});

distributedMap.itemProcessor(labResultsTask);
distributedMap.addRetry({
  backoffRate: 2,
  interval: core.Duration.seconds(1),
  maxAttempts: 3,
  jitterStrategy: JitterType.FULL, // Jitter FTW
});

const definition = serviceRequestPaginationTask.next(distributedMap).next(
  new Choice(this, 'has more pages?')
    .when(Condition.booleanEquals('$.Payload.hasMorePages', true), serviceRequestPaginationTask)
    .when(
      Condition.booleanEquals('$.Payload.hasMorePages', false),
      new Succeed(this, 'service-request-pagination-no-more-pages'),
    )
    .otherwise(
      new Fail(this, 'lab-result-sfn-job-failed', {
        cause: 'Unexpected Error',
      }),
    ),
);

const logGroup = new LogGroup(this, 'lab-result-sfn-log-group', {
  retention: RetentionDays.ONE_MONTH,
  removalPolicy: core.RemovalPolicy.DESTROY,
});

const stateMachine = new StateMachine(this, 'get-lab-results-sfn', {
  definition,
  timeout: core.Duration.minutes(15),
  stateMachineName: 'get-lab-results-sfn',
  stateMachineType: StateMachineType.EXPRESS,
  logs: {
    destination: logGroup,
    includeExecutionData: true,
    level: LogLevel.ALL,
  },
});
Enter fullscreen mode Exit fullscreen mode

Why Jitter Is Your Best Friend

Let’s talk about jitter. He wasn’t high, not the kind of high one gets when over stimulated after three cups of espresso. What I’m suggesting is that you add some randomness to your retry attempts. Why? Because when you have multiple executions failing and retrying on the same intervals think of the traffic jam during rush hour in the morning a working day. That is why if you add jitter to your retries, you have made your system less crowded and, therefore, more reliable. In my case, I used JitterType.FULL, which means the retries are as random as my Netflix recommendations. I am telling you, it is a doer and changes the dynamics of the game.

Express vs. Standard: The Battle of the State Machines

Now, let’s address the elephant in the room: Standard vs. Express. As for this use case, I opted for Express. Why? Since it is faster and more efficient in terms of expense, for applications that require a high number of transactions. With Standard, you have history of execution and nice-sounding durability, but it’s like using M1 Abrams against go-karts. Express takes less than two minutes for the task, which makes it the right thing to use when working on lab results. Plus, it is cheaper, and who wouldn’t crave for cheap products in today’s world?

Additionally, you can combine the best of both approaches, as explained by The Burning Monk in his insightful post, here’s a small preview I’ve extracted from his article on how to achieve it:

express + standard

Lessons Learned

a person saying lessons learned

Here’s the TL;DR:

  1. Map State Payload: Use resultPath: JsonPath.DISCARD to keep your workflow clean.
  2. Concurrency & Batching: Fine-tune maxConcurrency and itemBatcher to strike a balance between efficiency and resource usage.
  3. Jitter: Always use jitter for retries. Your future self will thank you.
  4. Express > Standard: Express is the way to go for short-lived, high-throughput tasks. Don’t overcomplicate it.

Cheers!

Top comments (0)