DEV Community: Kyle Stratis

Trigger an AWS Step Function with an API Gateway REST API using CDK

Kyle Stratis — Sat, 25 Sep 2021 20:38:22 +0000

AWS documentation can be rough. Have you ever looked for an example of something you're trying to set up but only finding bits and pieces of what you need across several different sites? That was my experience recently when trying to set up a REST API with API Gateway that would trigger a Step Function.

There are some good tutorials and examples for doing this, just in the AWS console. What about infrastructure-as-code geeks? There isn't quite so much. And so, this tutorial. In it, you will learn how to use CDK to set up the following:

an API Gateway REST API that takes a single parameter
an IAM role that allows the API to connect to your Step Function
an API Gateway integration to connect your API to your Step Function, passing along a parameter

This tutorial is for current CDK users looking for examples of connecting AWS services like Step Functions to APIs set up in CDK. While it uses Python CDK, translating to Typescript or other languages should be trivial.

🚨 WARNING: 🚨 Deploying to AWS may incur charges. To ensure this doesn't happen, tear down any deployed resources with cdk destroy.

Define the Step Function

Define a step function as you usually would. For this article, let's assume you created a step function called item_step_function.

Create the API

Use aws_cdk.aws_apigateway's RestApi constructor to create the base API object. You will use this for all further API setup:

from aws_cdk import (
    core,
    aws_apigateway as apigateway,
    aws_iam as iam,
    aws_stepfunctions as sfn,
)
import json

item_step_function = sfn.StateMachine([...])
item_api = apigateway.RestApi(self, "item-api")

🚨 WARNING 🚨 This example code does not do any additional authorization beyond what is done by AWS by default. You may wish to add additional security measures for a production workload.

Set up Role

The earlier you set up the IAM permissions, the better. The proper IAM permissions will allow your API to trigger your step function. You will use the Role construct in the aws_iam package for this.

First, you will instantiate the Role, give it a name, and pick the service that will assume it. Since you want API Gateway to have access to Step Functions, you will use "apigateway.amazonaws.com":

item_api_role = iam.Role(
    self,
    f"item-api-role",
    role_name=f"item-api-role",
    assumed_by=iam.ServicePrincipal("apigateway.amazonaws.com"),
)

Once that is set up, you will need to add a policy to the Role, which defines the permissions that the Role grants to the service that assumes it. AWS IAM provides several managed policies that cover most use cases, so it is unlikely that you will need to craft your own. For this guide, you will use the AWSStepFunctionsFullAccess managed policy, but in most cases, you will want to use a more restrictive managed or custom-built policy.

item_api_role.add_managed_policy(
    iam.ManagedPolicy.from_aws_managed_policy_name("AWSStepFunctionsFullAccess")
)

With these two calls, you've created a role with a policy that will allow your API to interact with your Step Function. You will still need to link with this Role with the API itself, but that will come later.

Set up resources

Resources are any path-based pieces of your request URI. While you can nest resources using the .add_resource() method, you will add a single resource level for this tutorial. To use a resource as a parameter, surround your parameter name with curly braces. Note that you have to add it to the root of the API object:

step_function_trigger_resource = item_api.root.add_resource("{item_id}")

Your request URI will look something like [https://aws-generated-tld.com/1337](https://aws-generated-tld.com/1337) where 1337 is the item_id.

🚨 Note 🚨 You can also set up a query string parameter if you wish. For this tutorial, we will stick with path-based parameters.

Connect Your API to Your Step Function

CDK provides an AWSIntegration construct that is supposed to make it easier to integrate with other AWS services. It does not. At least, not by itself.

The AWSIntegration construct is difficult to use because implementations for different services aren't well-documented. You may not even know the internal service name for Step Functions or any other service you wish to integrate and have difficulty finding it. (if you do, the AWS CLI is here to help: aws list-services).

Request Templates

Before setting the integration itself, you need to set up a request template. A request template allows you to build the request you are making to your Step Function, including transmitting your API parameters to the Step Function.

The template is a dictionary and should have a single key of "application/json". Its value is a JSONified dictionary with your step function's ARN and input, which is part of the Step Function StartExecution request syntax. You will use some methods built into the request from Amazon, specifically $input.params(), which allows you to grab some or all of your request's parameters.

I strongly recommend you escape any Javascript by wrapping your $input.params() call with $util.escapeJavaScript():
"$util.escapeJavaScript($input.params('item_id'))".

Your template should look like this:

request_template = {
    "application/json": json.dumps(
        {
            "stateMachineArn": item_state_machine.state_machine_arn,
            "input": "{\"item_id\": \"$util.escapeJavaScript($input.params('item_id'))\"}",
        }
    )
}

Step Function Integration

Next, you will define the integration itself. The integration requires you to set a few parameters, but it's not always clear from the CDK documents which are the correct ones to set. This ambiguity is thanks to how general the AWSIntegration construct is: it allows you to use any service, but you have to know what parameters their requests need.

For all integrations, you need to provide the service name. For Step Functions, it's "states". Then you need to provide the action you want to do. To start a Step Function execution, you'll use "StartExecution". This is determined again by the service's API, and you can read more about "StartExecution" in the AWS Step Function documentation.

Then, you'll provide options. In CDK, these are IntegrationOptions. Here, you can define many options, but for this tutorial, the important ones are:

credentials_role: This will take the item_api_role you set up earlier, attaching the Role (and its attached policy) to the API itself via the integration.
integration_responses: This is a list of possible responses to API requests. At the very least, you'll want to return an IntegrationResponse object with a 200 status code, but you can define all sorts of situations that would trigger different status codes.
request_templates: This is where you'll attach the request template you made in the pre

Investigate these options and determine which options are right for your use case. To keep things simple, this example will pass credentials through the integration to the integrated service and only return a 200 response:

item_sfn_integration = apigateway.AwsIntegration(
    service="states",
    action="StartExecution",
    options=apigateway.IntegrationOptions(
        credentials_role=item_api_role,
        integration_responses=[
            apigateway.IntegrationResponse(status_code="200")
        ],
        request_templates=request_template,
    ),
)

Here, you created an integration between the Step Functions service and the API itself. You can think of the integration as the portal that your parameters pass through when traveling from the API to your integrated service.

Connect Integration to REST verbs

Do you remember the resource you set up earlier, defining the item_id parameter for the API? This object comes with an add_method() function, which you can use to connect a REST verb (such as GET, POST, PUT, etc.) to your integration. Doing this will allow a request using the correct verb to reach your integration.

Since you're sending data via the API, you'll use POST and wire it to item_sfn_integration like so:

step_function_trigger_resource.add_method(
    "POST",
    item_sfn_integration,
    method_responses=[apigateway.MethodResponse(status_code="200")],
)

Here, you connected your integration to your step_function_trigger_resource via the POST verb, and you set it to respond with a 200 response status. Like the integration, you can set multiple method response statuses.

Testing and Wrapping Up

To test this, deploy with cdk deploy <location>, open the API Gateway console, and navigate to your API and the REST verb you set up. Just click Test, add your parameter, and check the output. You can also navigate to the Step Functions console and check on your execution.

What's Next?

Now that you've learned how to wire an API up to a Step Function, you can do several things to dive deeper. Here are some suggestions:

Change your path parameter to a query string. How does this change how you set up the API and the service integration?
Make a more complex API. Multiple resources and multiple levels of resources. Can you mimic the structure of a public API, like Reddit's?

Full Example

from aws_cdk import (
    core,
    aws_apigateway as apigateway,
    aws_iam as iam,
    aws_stepfunctions as sfn,
)
import json

item_step_function = sfn.StateMachine([...])

# Initialize the API
item_api = apigateway.RestApi(self, "item-api")

# Set up IAM role and policy
item_api_role = iam.Role(
    self,
    f"item-api-role",
    role_name=f"item-api-role",
    assumed_by=iam.ServicePrincipal("apigateway.amazonaws.com"),
)
item_api_role.add_managed_policy(
    iam.ManagedPolicy.from_aws_managed_policy_name("AWSStepFunctionsFullAccess")
)

# Set up API resources
step_function_trigger_resource = item_api.root.add_resource("{item_id}")

# Set up request template and integration
request_template = {
    "application/json": json.dumps(
        {
            "stateMachineArn": item_state_machine.state_machine_arn,
            "input": "{\"item_id\": \"$util.escapeJavaScript($input.params('item_id'))\"}",
        }
    )
}
item_sfn_integration = apigateway.AwsIntegration(
    service="states",
    action="StartExecution",
    options=apigateway.IntegrationOptions(
        credentials_role=item_api_role,
        integration_responses=[
            apigateway.IntegrationResponse(status_code="200")
        ],
        request_templates=request_template,
    ),
)

# Connect integrations to REST verbs
step_function_trigger_resource.add_method(
    "POST",
    item_sfn_integration,
    method_responses=[apigateway.MethodResponse(status_code="200")],
)

How a Side Project Helped Me Double My Salary

Kyle Stratis — Sun, 04 Feb 2018 05:44:57 +0000

Originally published on my personal blog

The dust has settled. The boxes are (mostly) unpacked. The cats have claimed their perches. At the beginning of January, Tallahassee got its first real snow in decades, and my wife and I prepared to take our adventure to our mutual dream city, Boston. While the 1,300 mile trip could consume a post or two on its own, today I'd like to talk about what brought me from Tallahassee to Boston, and how I got there.

Like everything in life, there was an element of luck involved. I was lucky to have a great recruiter, working for me, I was lucky to have interviewed with people who saw the value in my projects in particular, as well as side projects in general, and saw a fit for me in a fast-moving team working on highly experimental data tooling. However, the harder your work, the luckier you seem to get - and there are elements from this experience that I think can aid people in any job search.

The Beginning of the Job Search

What motivated me to look in the first place? I enjoyed the team I worked with (the building we were in? Not so much), and while we never saw Tallahassee as our forever home, my wife and I both made great lifelong friends there and had a great routine. Well, as we were preparing for our first anniversary trip, a recruiter for Amazon reached out to me on LinkedIn. I wasn't planning on taking it, but after the results of my pitching Danqex (formerly NASDANQ - and fodder for another post) and with the encouragement of my career sherpa (who also works for Amazon), I decided to go for it. That got me itching to see what was out there given my interests and experience - and there was a lot. I figured I'd look for remote opportunities, save up some money, then move in the early summer or fall to Boston. The best laid plans, yadda yadda yadda.

Updating My Resume

To prepare for this, I had to update my resume. I added some projects and experience I got while working at Homes.com, but I think the most important thing was adding my work as Danqex cofounder, CEO, and data lead. This was something I worried about - I didn't want to give the impression that I'd up and leave right away for Danqex, but at the same time it was (and continues to be) a source of super-concentrated experience in a number of areas - team management, project management, development, technology selection, even dealing with investors and knowing a bit about how funding works. So I added it to my resume - one thing I've learned in my career is to be competely upfront in the job search, because the hunt is a lot like dating: a good fit is far more important than the quickest almost-fit. Danqex also worked as a great conversation starter - who doesn't like to talk memes?

Mismatches and Encouragement

The Amazon interview came and went. My wife and I had just gotten back from our trip, and I hadn't much time to review data structures and algos. I wasn't aware that I'd have a coding test (something that is always a source of anxiety) at the first phone interview, so that happened and I powered through that anxiety because I had no choice. Getting that out of the way was great, and I actually enjoyed the problem given - unfortunately Amazon decided to pass on me. It was disappointing, but now I didn't have to move to Seattle.

I had quite a few other interviews, which was actually very encouraging because my previous job searches did not often get past the resume submission. These were often great, geeking out with someone else about technologies we loved and work that we'd do. Unfortunately a lot of these were not good matches - often because of a lack of experience with their technologies. The types of companies that don't encourage on-the-job learning (or can't) aren't the ones I necessarily want to work for - picking up a language to do work is pretty quick, even though mastering it takes many hours of work. Picking up supporting technologies (think Kafka, etc.) is much quicker.

One job I applied to I didn't realize was for in Boston and was through an external recruitment firm. I was nervous when this became clear, because I've only heard bad things about these, but I am incredibly grateful for the experience and it worked out perfectly. The business model here is interesting: companies that don't necessarily have the resources to do their own recruitment on a large scale will pay another firm to do it, in this industry this is mostly startups or companies undergoing very rapid growth. The firm that posted the ad I applied to was WinterWyman, and they have teams dedicated to different fields. The recruiter, Jamie, contacted me, and said he didn't think what I applied to would be the ideal fit, but he'd talk to them anyway - in the meantime, though, he wanted to know what my priorities were as far as my career and what things are important for me in a company. I told him I was ready to make an impact on society in some way - one of the jobs I applied for was for a company doing research on various mental issues as detected in social media postings. I wanted to be able to take point on my projects and have ownership, have the opportunity to advance, work with interesting technologies, process lots of data, and a few others, but my most important priority was impactful work. He returned to me a list of companies that I did some research on, and picked a few. One didn't want me because I didn't have a CS degree (their loss - and I'm happy not to work in a culture with those attitudes), a few others dropped off, but one in particular was right up my alley and was really interested in me.

The Match

This company was PatientsLikeMe which has a track record of not only improving lives of patients through connecting them in support networks, but was also doing some groundbreaking research with the data that users of the platform provide. Impactful? You bet. They wanted a data engineer for a new data engineering team that supports the research team and builds tools to trial before bringing them to full production status. Ownership? Plenty. I had two phone interviews, one with my future boss, and one with a teammate. Both were a lot of fun, talking about Danqex/Nasdanq, my experience, my educational background, and more. Jamie helped me prepare for both, and called me quickly to let me know that the team was really excited about me and would be setting me up for a trip to Cambridge for an in-person interview.

I flew in to Logan (almost missed my flight because my car greeted us in the morning with a flat), spent a few days with my dad, and then got settled in my hotel in Cambridge. My recruiter had been giving me a lot of details on what to expect for the interview, which helped put me at ease, and after a good night of sleep I got dressed and headed over to the PLM offices. While there I went through a few rounds of interviewing, two with my future boss, one with the other team members, one with HR, and one with one of the scientists on the biocomputing team, who we'd be supporting. The topics ranged from the function of outer hair cells (the subject of my research in grad school) to the design of database tables given some features to RaspberryPis to how to trade memes for profit. Instead of an interview, it was more like meeting with a bunch of interesting and smart people that geek out over the same things I do, and getting to chat about our passions. It was fun.

After the interview I met with some family and before leaving, I got to meet Jamie in person for breakfast. He informed me there was another person being interviewed, but that I'd hear something within the next couple of weeks. A little over one long week later, Jamie called me to let me know an offer was coming. The offer came, and it was exactly what I was looking for and the match was officially made.

One mild issue - the job was not remote. We'd be moving to Boston on the heels of record-breaking cold with plenty of winter left for us.

Lessons learned

Wrapped up in all that are a few lessons that can be gleaned from this, I think:

Work on your side projects, take them farther than makes sense, and be a cheerleader for them and the work you did on them.
Be prepared to answer truthfully any hard questions about those projects. One of the questions asked was how my priorities would shift with this job in relation to Danqex. Of course, the job would come first, Danqex was born as a side project and that's how we're equipped to work on it. You'll likely get those questions and more about your projects, and you should know them inside and out. Pitching to investors, while not feasible for many, was a great preparation for this.
Get outside of your comfort zone with technologies you work with. This is especially useful if you, like I was, work with a less in-demand language at your current job.
Find the best match, not the the first yes. I've been working at PLM now for just under 3 weeks, and it has been the best match for me. A fun, stimulating culture (we have journal club between 1 and 3 times a week, and also a well-stocked beer fridge!), brilliant people to work with (2 of my teammates have PhDs from MIT, the other studied at WPI), a team that's all about rapid prototyping, proving a tool we make, and then letting it mature to another team to maintain, and a shared drive to truly push the science of chronic illness and improve the lives of our patients in tangible ways. Work that truly matters is one of the greatest motivators of all.
To help with the above lesson - don't wait (if you can help it) until you absolutely need a job to start looking for your next step. This keeps you from making any spur of the moment emotional decisions, and keeps the ball in your court as you wade through rejections and negotiations.

A MongoDB Optimization

Kyle Stratis — Sun, 29 Oct 2017 21:43:51 +0000

Originally published on my personal blog

Recently at Homes.com, one of my coworkers was charged with speeding up a batch process that we were required to run at a scheduled interval. No big deal, but he was stuck: the process required a number of steps at every typical 'stage', for identifying the data we needed to pull, for pulling the data, for transforming the data, and for writing the transformed data back to Mongo. When he was talking about the process, I realized this would be a perfect use case for Mongo's aggregation framework. I offered to help, based on my experience with the aggregation framework I got while working on NASDANQ, and immediately got to work on designing an aggregation pipeline to handle this process.

Original Solution

This batch process exists to update mean and median home value data for a given area. A rough overview is laid out below:

Query a large collection (> 1 TB) for two fields that help identify an area in which a property resides.
Join into a single ID
Use this to pull location data from another collection on a separate database (> 25 GB).
For each property in this area, we pull from another collection the price.
These are loaded into an array in our script, and then we iterate to find the mean and median.

At our estimates, if we could run this process to completion with no interruption, it would take ~104 days to finish. This is unacceptable for obvious reasons.

Attempt the First

We ran into an architecture issue early on. MongoDB's aggregation pipeline doesn't support working across multiple databases, and the collections we needed to work on were split between a few databases. Luckily, we were able to move the collections onto a single database so we could start testing the pipeline. We started with a 9 stage monster - the multiple collections we had to match on required multiple match stages and a stage to perform what was essentially a join. Because the data is stored in memory at each stage, and each stage is limited to 100MB of memory we first attempted to switch on the allowDiskUse option to allow the pipeline to at least run. And run it did. Our DBA team notified us that we were spiking memory usage to unacceptable levels while the pipeline was running.

We reduced the pipeline to 7 stages - we take location data as inputs to the pipeline, match on this to the 25GB collection, use $lookup to join in the 1TB collection holding the ID fields on one of the ID fields, project the fields we actually want, unwind, redact, sort by value, group by location (2 fields), take an average, and sort again. The pipeline looked like this:

$match->$lookup->$project->$unwind->$redact
                                      |
                                      V
       VALUES<-$sort<-$avg<-$group<-$sort

This failed at $lookup. Why? For many locations, we were joining in nearly 1 million documents per month of data, and we wanted multiple years of that data. Across all locations, this fails to solve our performance issues. It also fails to run.

Attempt the Second

Our first thought was to add the unique identifier (which was a combination of two fields in different collections) to the 1TB collection, but that was not workable due to the size of the collection. Instead, what we can do is project a concatenated version of the two fields we are using as a UID, use $lookup to join in the 25GB collection on that - because it's much faster to make this change to the smaller collection. Simultaneously, we were testing performance differences in sorting in our ETL code vs. within the pipeline itself. Since the time taken in the code to run these sorts was trivial, we could remove these stages from the pipeline. Now we are looking at all IDs for a single month taking 10 days, but we need to run multiple years - this comes out to worse performance than the original solution.

However, an interesting finding was when we ran a single identifier at a time - 1 for one month took about a minute. But one location identifier over 3 years only took 10 minutes. I mention this because it demonstrates how nicely aggregation pipeline performance scales as your dataset grows. And remember how I said we projected all IDs for one month taking 10 days? In light of the results showing that time taken in the pipeline does not scale linearly with data size, we ran the pipeline and found that a single month for all identifiers took about 14 hours. This is a big improvement, but not enough. So we look for more optimizations.

Attempt the Third

This was a smaller change, and on its own created a big change in time taken to process our data. We re-architected our data so that we had a temporary collection of a single month of data. We generally process one month at a time, despite the overall length of time we want. We were able to cut the time in half from the previous attempt - a single month for all identifiers now took only 7 hours and we were now not querying the full 1TB collection when we only wanted a small piece of it anyways. Creating the temporary collections is trivial and done by our counterparts on the DBA team as a part of their data loading procedures.

Attempt the Fourth

After seeing these results, management was fully convinced of the need of redesigning how we stored this data. Let this be a lesson: hard data is very convincing. So now each month's data will be in its own collection, when we load data the location data will also be added in to these monthly collections avoiding costly joins via $lookup. Surprisingly, while testing, adding this information did not impact our overall data preloading times. This location data was also indexed for quicker querying. All of this allowed us to go from a 7-stage aggregation pipeline to 3. Now we start with the split collections, project the fields we are interested (location, value, etc.), group on location, average them and also add all individual values to an array (for sorting and finding the median in code), and output to a temp collection. If we want to process a period of time longer than a month, we rinse and repeat.

For all identifiers in each month, our processing time went from 7 hours to 8 minutes. Then the queries made on the generated collections to get the computed averages plus arrays of individual values to calculate the median in code added a minute per output collection if we did it serially. Being primarily ETL pipeline builders, we do nothing serially. We tested with 7 workers, and the added processing time goes from a minute to 30 seconds. In production, we have a worker pool that numbers in the hundreds, so this additional time was satisfactory. If we assume a conservative 7.5 minutes to process a month of data, then projecting to 3 years we estimated we should see runtime of around 4.5 hours. We decided we were happy with this, especially when we considered the original process was projected to take 104 days.

$project->$avg->$out

Conclusion

I learned a lot of lessons from this little project, and wanted to distill them here as a sort of tl;dr to ensure that they can be passed on to the reader.

Management is convinced by data. Run your proposed changes, show them graphs or numbers of performance improvements. You're all working to the same goal.
ABB. Always Be Building. In my case, my work on NASDANQ gave me the knowledge I needed to hear a teammate's struggles, identify a use case, and implement a plan to alleviate those issues.
Standups suck, but they're useful. Hearing my teammate's struggles with this code during a standup meeting allowed me to assist him and come up with a workable solution.
More generally, communication is important. Not only was understanding my teammate's needs important, but this project required constant contact between our DBA team, me and my teammate (who was running many of our tests), and management of our team, the DBA team, and the larger team we report to.
And, finally, MongoDB's aggregation pipelines are incredibly powerful. They're worth learning and getting familiar with if you work with sizable datasets at all.

Hi, I'm Kyle Stratis

Kyle Stratis — Fri, 16 Jun 2017 14:22:15 +0000

I have been coding professionally for 3 years.

You can find me on Twitter as @KyleStratis

I live in Boston, MA.

I work for ~~NASDANQ~~ Danqex and PatientsLikeMe

I mostly program in these languages: Python, Perl, and Go.

I am currently learning more about Python and data science.

Nice to meet you.