<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Monica Colangelo</title>
    <description>The latest articles on DEV Community by Monica Colangelo (@monica_colangelo).</description>
    <link>https://dev.to/monica_colangelo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F788307%2Fb3a510b1-a027-4df1-a6b0-1cb9bb179f61.JPG</url>
      <title>DEV Community: Monica Colangelo</title>
      <link>https://dev.to/monica_colangelo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/monica_colangelo"/>
    <language>en</language>
    <item>
      <title>Automated Mass Tagging in AWS Across Accounts and Organizations</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Thu, 17 Aug 2023 16:44:47 +0000</pubDate>
      <link>https://dev.to/aws-builders/automated-mass-tagging-in-aws-across-accounts-and-organizations-2ehn</link>
      <guid>https://dev.to/aws-builders/automated-mass-tagging-in-aws-across-accounts-and-organizations-2ehn</guid>
      <description>&lt;h1&gt;
  
  
  Tagging strategy: easier said...
&lt;/h1&gt;

&lt;p&gt;In the expansive world of AWS, tagging resources stands out as both a straightforward task and an essential one. On the surface, it's about assigning a label, a seemingly simple action. Yet, the implications of this action are profound. Tags are not just mere identifiers; they're pivotal tools in organizing, managing, and optimizing your cloud environment, because they cater to a range of organizational needs, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expense Tracking&lt;/strong&gt;: for instance, with &lt;a href="https://aws.amazon.com/blogs/aws-cloud-financial-management/cost-allocation-blog-series-2-aws-generated-vs-user-defined-cost-allocation-tag/"&gt;&lt;strong&gt;Cost Allocation Tags&lt;/strong&gt;&lt;/a&gt;, you can monitor specific costs tied to a particular project or department.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Infrastructure Automation&lt;/strong&gt;: tags can trigger &lt;strong&gt;Automated Infrastructure Activities&lt;/strong&gt;. Think of an instance that's tagged as 'development' being automatically shut down outside of working hours to save costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Project Phases&lt;/strong&gt;: with &lt;strong&gt;Workload Lifecycle&lt;/strong&gt; tags, you can easily identify whether a particular resource is in the 'testing', 'development', or 'production' phase.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Issue Resolution&lt;/strong&gt;: &lt;strong&gt;incident management&lt;/strong&gt; tags can help in quickly identifying resources that might be affected during an outage or incident.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;: &lt;strong&gt;update management&lt;/strong&gt; tags can indicate when a resource was last patched or updated, ensuring timely maintenance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operational Insights&lt;/strong&gt;: for a clear view of your operations, &lt;strong&gt;Operational Observability&lt;/strong&gt; tags can denote the health or status of resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Protection&lt;/strong&gt;: &lt;strong&gt;risk and security management&lt;/strong&gt; tags can highlight resources that contain sensitive data, ensuring they have tighter security controls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Access Management&lt;/strong&gt;: &lt;strong&gt;identity and access tags&lt;/strong&gt; can dictate who within your organization can access specific resources, reinforcing security protocols.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A well-defined tagging strategy is paramount. AWS itself recognizes the significance of this and has published &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/tagging-best-practices/tagging-best-practices.html"&gt;an extensive whitepaper&lt;/a&gt; detailing best practices and guidelines. This strategy isn't just about knowing what to tag, but understanding the 'why' and 'how' behind each tag.&lt;/p&gt;

&lt;h2&gt;
  
  
  ...than done
&lt;/h2&gt;

&lt;p&gt;So, once we've established our tagging strategy, is it smooth sailing from there? Well, as the saying goes, "easier said than done." Indeed, a strategy is only as good as its &lt;strong&gt;execution&lt;/strong&gt; plan. A comprehensive strategy must be paired with a &lt;strong&gt;pragmatic&lt;/strong&gt; action plan detailing its implementation.&lt;/p&gt;

&lt;p&gt;From our list of tagging use cases, one aspect becomes abundantly clear: tags, while seemingly simple tools, cater to a &lt;strong&gt;myriad&lt;/strong&gt; of distinct purposes. These &lt;strong&gt;purposes&lt;/strong&gt;, in turn, address the needs of &lt;strong&gt;diverse teams&lt;/strong&gt; within an organization. Whether it's Finance, Operations, Security, or Development teams, each has its &lt;strong&gt;unique requirements&lt;/strong&gt;. These teams might possess different skill sets, operate on varying timelines, and even employ distinct &lt;strong&gt;tools&lt;/strong&gt; for their tagging activities. The challenge then is &lt;strong&gt;coordination&lt;/strong&gt;: how do these teams work in tandem without stepping on each other's toes?&lt;/p&gt;

&lt;h1&gt;
  
  
  Organizational complexity
&lt;/h1&gt;

&lt;p&gt;To truly grasp the intricacies of tagging, let's delve into a &lt;strong&gt;real-world scenario&lt;/strong&gt; I encountered. In this setup, there's a team, aptly named "&lt;strong&gt;Cloud Center&lt;/strong&gt;", responsible for managing the AWS &lt;strong&gt;Organizations&lt;/strong&gt;. Their tasks encompass creating, overseeing, and auditing the AWS accounts affiliated with the organization.&lt;/p&gt;

&lt;p&gt;Within this Organization, there are several Organizational Units (&lt;strong&gt;OUs&lt;/strong&gt;) mirroring the company's internal structure. Each OU houses multiple &lt;strong&gt;projects&lt;/strong&gt;, and each project might have separate &lt;strong&gt;accounts&lt;/strong&gt; for development, testing, and production.&lt;/p&gt;

&lt;p&gt;Each OU is backed by a &lt;strong&gt;Platform Team&lt;/strong&gt;, providing project teams with essential tools, such as CI/CD &lt;strong&gt;pipelines&lt;/strong&gt; and Infrastructure as Code (&lt;strong&gt;IaC&lt;/strong&gt;) execution tools like &lt;strong&gt;Terraform&lt;/strong&gt;. Then, every project has a &lt;strong&gt;separate team&lt;/strong&gt;, which is diverse, comprising roles like backend engineers, DevOps specialists, QAs, and more. Notably, many team members are often &lt;strong&gt;consultants or contractors&lt;/strong&gt; dedicated to specific projects rather than company employees.&lt;/p&gt;

&lt;p&gt;Additionally, there are &lt;strong&gt;shared OUs&lt;/strong&gt; and accounts dedicated to operational services, like &lt;strong&gt;networking&lt;/strong&gt;, Transit Gateway, Network Firewall, and DNS, or &lt;strong&gt;security&lt;/strong&gt;-centric tasks: each of these is managed by a different team.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iLvxYVnm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59q4fwhdatzc62s82cf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iLvxYVnm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/59q4fwhdatzc62s82cf2.png" alt="" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On &lt;strong&gt;top&lt;/strong&gt; of all that, this Organization belongs to a &lt;strong&gt;larger corporate group&lt;/strong&gt;. At the group level, there's a need to monitor the spending of each subsidiary company. This requires &lt;strong&gt;data extraction&lt;/strong&gt; from each Organization, necessitating the &lt;strong&gt;tagging of AWS accounts themselves&lt;/strong&gt;, with unique labels and values consistent across the entire corporate group.&lt;/p&gt;

&lt;h1&gt;
  
  
  The mass-tagging hierarchy
&lt;/h1&gt;

&lt;p&gt;In such a multifaceted environment, expecting every individual to simply read a tagging strategy manual and apply it flawlessly is &lt;strong&gt;wishful thinking&lt;/strong&gt;. Each team has its tagging objectives aligned with its goals. However, keeping track of everyone's tagging needs would be a Herculean task. While enforcing a &lt;a href="https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_tag-policies.html"&gt;Tagging Policy&lt;/a&gt; can provide some structure, manually &lt;strong&gt;reconciling&lt;/strong&gt; the requirements of so many teams would be a colossal &lt;strong&gt;drain&lt;/strong&gt; on time and resources.&lt;/p&gt;

&lt;p&gt;The reality of the matter is that tags, in many instances, aren't particularly volatile entities. In an ideal setup, they act as labels assigned during the creation of a resource. Once in place, these tags seldom change, except for specific use cases. Given this nature, &lt;strong&gt;it's counterproductive to burden individuals&lt;/strong&gt; with a task that, with the right precautions, can be seamlessly automated. After all, machines are inherently better suited for &lt;strong&gt;repetitive&lt;/strong&gt; and mundane tasks than humans.&lt;/p&gt;

&lt;p&gt;This realization led to the adoption of a &lt;strong&gt;multi-tiered mass-tagging strategy&lt;/strong&gt;. Each "tier" or "level" employed a recurring &lt;strong&gt;Lambda&lt;/strong&gt; function to tag all resources under its &lt;strong&gt;purview&lt;/strong&gt;. Care was taken to ensure that tags from one level didn't overwrite or remove those from another.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Center Command
&lt;/h2&gt;

&lt;p&gt;At the &lt;strong&gt;topmost tier&lt;/strong&gt;, the Cloud Center took on the responsibility of tagging AWS accounts directly. This was primarily for billing, finance, and cost allocation purposes. They utilized tags that were &lt;strong&gt;globally unique&lt;/strong&gt; within the corporate group, along with additional tags indicating the &lt;strong&gt;OU, project, and environment&lt;/strong&gt; specifics. Here's how the process was streamlined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every night, an &lt;strong&gt;EventBridge&lt;/strong&gt;-scheduled &lt;strong&gt;Lambda&lt;/strong&gt; function would activate. This function would:&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;read&lt;/strong&gt; all the tags (both key and value) for each account&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def list_accounts():
    existing_accounts = [
        account
        for accounts in _org_client.get_paginator("list_accounts").paginate()
        for account in accounts['Accounts']
    ]
    return existing_accounts

def get_account_tags( account_id):
    formatted_tags = _org_client.list_tags_for_resource(
        ResourceId=account_id)
    return formatted_tags

def handler(event, context):
    for account in list_accounts():
    account_id = account.get('Id')

    try:
        tags = get_account_tags(account_id)
    except Exception as ce:
        logger.error(
            f'Exception retrieving tags in Organization for account {account_id}: {ce}')
        continue
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;create a message for each account in an &lt;strong&gt;SQS&lt;/strong&gt; queue&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sqs.send_message(
    QueueUrl=sqs_queue_url,
    DelaySeconds=15,
    MessageAttributes={
        'Account': {
            'DataType': 'String',
            'StringValue': account_id
        },
        'Tags': {
            'DataType': 'String',
            'StringValue': json.dumps(response_json)
        },
        'Region': {
            'DataType': 'String',
            'StringValue': reg
        }
    },
    MessageBody=(
        f'Tag value for account {account_id} in region {reg}'
    )
)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;each message in the queue would then &lt;strong&gt;trigger a second Lambda&lt;/strong&gt; function. This function would:&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;read the message content&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for record in event['Records']:
    account_id = record['messageAttributes']['Account']['stringValue']
    tags_raw = record['messageAttributes']['Tags']['stringValue']
    reg = record['messageAttributes']['Region']['stringValue']
    receipt_handle = record['receiptHandle']
    tags = json.loads(tags_raw)['Tags']
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;assume an IAM role in the target account&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;save the tag list in an &lt;strong&gt;SSM parameter&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;list resources to be tagged using the &lt;code&gt;resourcegroupstaggingapi&lt;/code&gt;&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client = create_boto3_client(account_id, 'resourcegroupstaggingapi', assume_role(account_id), reg)
map = client.get_resources(ResourcesPerPage=50)
list = get_resources_to_tag(map['ResourceTagMappingList'], tagkey, tagvalue)

[...]

def get_resources_to_tag(map, tagkey, tagvalue):
    resourcelist = []
    for resource in map:
        logger.debug(f'Resource: {resource}')
        if resource['ResourceARN'].startswith('arn:aws:cloudformation'):
            logger.debug(
                f'Resource {resource} is a cloudformation stack, we do not need to tag it')
            continue
        to_be_tagged = True
        for tag in resource['Tags']:
            if tag['Key'] == tagkey and tag['Value'] == tagvalue:
                to_be_tagged = False
                logger.debug(
                    f'Found tag {tagkey} with value {tagvalue} in resource, no need to retag')
                break
        if to_be_tagged == True:
            logger.debug(
                f'NOT FOUND tag {tagkey} with value {tagvalue} in resource, need to tag')
            resourcelist.append(resource['ResourceARN'])
    return resourcelist
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;finally, apply the tags.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach ensured that the Cloud Center team maintained tags in a centralized manner, eliminating the need for disparate synchronization efforts.&lt;/p&gt;

&lt;p&gt;💡&lt;br&gt;
&lt;em&gt;For those interested, you can find a Python version of these Lambda functions &lt;/em&gt;&lt;a rel="noopener noreferrer nofollow" href="https://github.com/theonlymonica/aws-multi-level-tagging"&gt;&lt;strong&gt;&lt;em&gt;here&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1ixjXoGf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/226d9cwg0sik89swa49b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1ixjXoGf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/226d9cwg0sik89swa49b.png" alt="" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform Team Playbook
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;second "tier"&lt;/strong&gt; in this tagging hierarchy is occupied by the &lt;strong&gt;Platform Teams of each OU&lt;/strong&gt;, and their approach mirrors that of the Cloud Center, albeit with some tailored modifications.&lt;/p&gt;

&lt;p&gt;In the case of the Platform Teams, their management account has read delegation over the Organization. The nightly process for them unfolds as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Account Listing from Organization&lt;/strong&gt;: Triggered by &lt;strong&gt;EventBridge&lt;/strong&gt; at a different time than the Cloud Center's process, a &lt;strong&gt;Lambda&lt;/strong&gt; function initiates and reads the &lt;strong&gt;list of accounts&lt;/strong&gt; specific to its OU from the Organization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mass Tagging (If Necessary)&lt;/strong&gt;: If the Platform Team has its specific tags to apply, it employs a mass-tagging approach &lt;strong&gt;identical&lt;/strong&gt; to the Cloud Center's. It's worth noting that &lt;strong&gt;not all Platform Teams have this requirement&lt;/strong&gt;. Some use this technique to assign tags that indicate, for instance, which EBS volumes need &lt;strong&gt;backups&lt;/strong&gt; or which non-production EC2 instances can be &lt;strong&gt;shut down&lt;/strong&gt; during nights and weekends.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Terraform Pipeline Integration&lt;/strong&gt;: Given that these Platform Teams provide project teams with Terraform execution pipelines, they adopt a methodology (detailed in &lt;a href="https://letsmake.cloud/automating-the-injection-of-cicd-runtime-information-into-terraform-code"&gt;this article&lt;/a&gt;) that &lt;strong&gt;dynamically&lt;/strong&gt; instructs the AWS provider in Terraform to "ignore" certain tags. This list of "ignored" tags is a merger of the Cloud Center's tags (read from the SSM Parameter saved by the Cloud Center's Lambda function) and their own.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;provider "aws" {

  ignore_tags {
    keys = ["cost_centre","environment","territory","service","billing_team"]
  }

  [...]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Project Team Precision
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;final tier&lt;/strong&gt; in this tagging hierarchy is the &lt;strong&gt;project teams&lt;/strong&gt;. Their primary focus is on their specific projects, and they shouldn't be burdened with the complexities of the overarching tagging strategy. While &lt;strong&gt;transparency&lt;/strong&gt; is essential, and indeed, tag information is openly shared (given that tags are visible and not shrouded in secrecy), it's not the project teams' &lt;strong&gt;responsibility&lt;/strong&gt; to manage or be overly concerned with them.&lt;/p&gt;

&lt;p&gt;These teams have the liberty to add &lt;strong&gt;project-specific tags&lt;/strong&gt; using their primary tool, Terraform. However, there's a catch: they can &lt;strong&gt;only&lt;/strong&gt; use Terraform through the &lt;strong&gt;pipeline&lt;/strong&gt; provided by their respective Platform Team. This constraint is in place because individual user accounts have very limited permissions, typically read-only. This restriction ensures that resources are not proliferated haphazardly without version control on Git, avoiding the pitfalls of &lt;a href="https://docs.cloudposse.com/glossary/clickops/"&gt;ClickOps&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The beauty of the pipeline's design is its runtime instruction to Terraform to ignore specific tags. This feature acts as a &lt;strong&gt;safeguard&lt;/strong&gt;. Even if a team member inadvertently adds a tag in Terraform that matches an existing tag but with a different value, the pipeline ensures that the original value remains untouched and the new value is disregarded.&lt;/p&gt;

&lt;h1&gt;
  
  
  Making Sense of the Tagging Puzzle
&lt;/h1&gt;

&lt;p&gt;Tagging in AWS might seem like a small task, but as we've seen, it's a big deal. Getting from a plan on paper to actually tagging everything right is no walk in the park. But with a good system in place and everyone &lt;strong&gt;on the same page&lt;/strong&gt;, it becomes a lot easier.&lt;/p&gt;

&lt;p&gt;What's the &lt;strong&gt;main lesson&lt;/strong&gt; here? Keep things &lt;strong&gt;automated&lt;/strong&gt; and work together. Machines are great at repetitive tasks, so let's let them handle that. And when teams &lt;strong&gt;collaborate&lt;/strong&gt;, the whole tagging process becomes smoother and more efficient.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>organization</category>
      <category>lambda</category>
      <category>tagging</category>
    </item>
    <item>
      <title>Guardian of the Functions: Keeping an Eye on your Galaxy of AWS Step Functions with Custom Metrics on CloudWatch</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Tue, 18 Jul 2023 18:09:21 +0000</pubDate>
      <link>https://dev.to/aws-builders/guardian-of-the-functions-keeping-an-eye-on-your-galaxy-of-aws-step-functions-with-custom-metrics-on-cloudwatch-4kg7</link>
      <guid>https://dev.to/aws-builders/guardian-of-the-functions-keeping-an-eye-on-your-galaxy-of-aws-step-functions-with-custom-metrics-on-cloudwatch-4kg7</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Managing &lt;strong&gt;multiple AWS Step Functions&lt;/strong&gt; can quickly turn into a complex task, especially when each function forms a crucial link in a broader process. For instance, consider a data processing system where numerous files are uploaded, analyzed, and then relocated. Each step of this process could be orchestrated by its own Step Function, executing a variety of tasks in sequence.&lt;/p&gt;

&lt;p&gt;For a team &lt;strong&gt;monitoring&lt;/strong&gt; this process, an error in any of these functions could disrupt the entire sequence and halt the processing of subsequent files. Therefore, having a clear, real-time understanding of the status of each Step Function's latest execution is not just a nice-to-have—it's essential.&lt;/p&gt;

&lt;p&gt;Now, imagine a scenario where your team is handling not just one, but dozens or even hundreds of such sequences—each represented by an AWS Step Function. Manually monitoring the status of each function's latest execution becomes an incredibly time-consuming task, and the risk of missing a crucial error increases.&lt;/p&gt;

&lt;p&gt;This is where our &lt;strong&gt;Guardian&lt;/strong&gt; comes into play 🧑‍🚒&lt;/p&gt;

&lt;p&gt;Our goal is to create an intuitive &lt;strong&gt;dashboard&lt;/strong&gt; that offers an at-a-glance overview of the status of each Step Function. Think of it as a &lt;strong&gt;traffic light system&lt;/strong&gt;: green for successful executions, red for failures. At any moment, a &lt;strong&gt;quick look&lt;/strong&gt; at this dashboard will tell us if all our functions are operating correctly or if there's a hitch in our sequence that needs our immediate attention.&lt;/p&gt;

&lt;p&gt;In this blog post, we will outline how to use &lt;strong&gt;Terraform&lt;/strong&gt; and AWS &lt;strong&gt;CloudWatch&lt;/strong&gt; to achieve this. Terraform will help us set up and manage our infrastructure, while AWS CloudWatch will provide the platform for our monitoring dashboard. With these tools at our disposal, we'll turn the daunting task of overseeing a multitude of AWS Step Functions into a manageable, even effortless process.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Missing Piece in AWS&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When dealing with AWS Step Functions, one might assume that AWS would offer a native &lt;strong&gt;metric&lt;/strong&gt; in CloudWatch for monitoring the status of the most recent execution of a function. After all, AWS provides a plethora of such metrics out of the box for many of its services.&lt;/p&gt;

&lt;p&gt;Unfortunately, this isn't the case for Step Functions. While &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/procedure-cw-metrics.html#cloudwatch-step-functions-execution-metrics"&gt;AWS does offer metrics&lt;/a&gt; like the total number of executions, succeeded executions, failed executions, and throttled executions, these are all aggregate metrics. They provide a broad view of a function's performance but do not offer insight into the status of each function's &lt;strong&gt;&lt;em&gt;latest&lt;/em&gt;&lt;/strong&gt; execution.&lt;/p&gt;

&lt;p&gt;This lack of granularity can be a significant hurdle when monitoring a large number of Step Functions, especially when the status of the most recent execution is the key metric we're interested in.&lt;/p&gt;

&lt;p&gt;So how can we fill this gap? The solution is to create our own &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html"&gt;&lt;strong&gt;custom metric&lt;/strong&gt;&lt;/a&gt;, and in the next section, we'll dive into how we can use AWS Lambda and CloudWatch to do just that.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Creating a Custom Metric with AWS Lambda&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Since AWS doesn't offer a native metric for the &lt;strong&gt;status of the latest execution&lt;/strong&gt; of a Step Function, we need to create this metric ourselves. To do this, we'll use AWS &lt;strong&gt;Lambda&lt;/strong&gt;, a service that lets you run your code without provisioning or managing servers.&lt;/p&gt;

&lt;p&gt;The idea is straightforward: we'll create a Lambda function that periodically checks the status of the latest execution of each of our Step Functions and then publishes this information as a custom metric to CloudWatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Configuring IAM permissions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first thing we need to do is ensure our Lambda function has the necessary permissions to both read the status of our Step Functions and publish custom metrics to CloudWatch. To do this, we can create an IAM role with the following policy (or you can see how I use Terraform to create it, in the next chapter):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"states:ListStateMachines"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"states:DescribeExecution"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"cloudwatch:PutMetricData"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy allows our function to list all state machines (i.e., Step Functions), describe their executions, and put metric data into CloudWatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Creating the Lambda function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With our permissions configured, we can now create our Lambda function. Here's a high-level overview of what our function will do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;List all Step Functions in our account using the &lt;code&gt;ListStateMachines&lt;/code&gt; API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each Step Function, fetch the most recent execution using the &lt;code&gt;DescribeExecution&lt;/code&gt; API.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Map the status of each execution to a numerical value with a function &lt;code&gt;status_to_number&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Publish these numerical statuses as custom metrics to CloudWatch using the &lt;code&gt;PutMetricData&lt;/code&gt; API.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's an example of what the Python code for this Lambda function might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;boto3&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Initialize clients
&lt;/span&gt;    &lt;span class="n"&gt;sf_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'stepfunctions'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cw_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'cloudwatch'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# List all state machines
&lt;/span&gt;    &lt;span class="n"&gt;state_machines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sf_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;list_state_machines&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="s"&gt;'stateMachines'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Loop through all state machines
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sm&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state_machines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Get latest execution status
&lt;/span&gt;        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sf_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;describe_execution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;executionArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'stateMachineArn'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Map status to a numerical value
&lt;/span&gt;        &lt;span class="n"&gt;status_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;status_to_number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Publish custom metric to CloudWatch
&lt;/span&gt;        &lt;span class="n"&gt;cw_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_metric_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'StepFunctions'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;MetricData&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="s"&gt;'MetricName'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="s"&gt;'Value'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status_value&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;status_to_number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mapping&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'RUNNING'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'SUCCEEDED'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'FAILED'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'TIMED_OUT'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'ABORTED'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mapping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# return 0 if status is not in the mapping
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, each state is represented by a unique numerical value, providing more granular information about the status of your Step Function executions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Scheduling the Lambda function&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The final piece of the puzzle is to ensure our Lambda function &lt;strong&gt;runs periodically&lt;/strong&gt; to keep our custom metrics up-to-date. To provide the most recent status of our Step Functions, we will &lt;strong&gt;schedule&lt;/strong&gt; our Lambda function to run at regular intervals using Amazon &lt;strong&gt;EventBridge&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;How do we ensure that our CloudWatch alarm reflects only the most recent state of the Step Function, not past states? This is a valid concern, as CloudWatch alarms often aggregate data over a certain time period, potentially mixing up the statuses of different Step Function executions.&lt;/p&gt;

&lt;p&gt;This is where choosing the &lt;strong&gt;right statistic&lt;/strong&gt; for our CloudWatch alarm comes into play. We will use the '&lt;strong&gt;&lt;em&gt;Maximum&lt;/em&gt;&lt;/strong&gt;' statistic with a period of 1 hour for our alarm. This ensures that the alarm state always reflects the highest (i.e., most severe) status reported by the Lambda function in the past hour.&lt;/p&gt;

&lt;p&gt;Why 'Maximum' and why a period of 1 hour? The 'Maximum' statistic ensures that if there's any failed execution (which we mapped to a higher value), it will be the status taken into account. The period of 1 hour is less than the frequency at which our Lambda function is invoked (3 hours). This way, each evaluation period of the alarm is guaranteed to consider only the most recent execution status.&lt;/p&gt;

&lt;p&gt;💡&lt;br&gt;
&lt;em&gt;Remember, the right frequency and period may depend on your use case, and you may need to adjust these values to fit your specific needs.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-Step: Creating our Monitoring Tool with Terraform
&lt;/h2&gt;

&lt;p&gt;You have many AWS Step Functions to monitor and you're probably thinking, 'Surely, I don't have to set all of this up manually, right?' Fear not, because that's where &lt;strong&gt;Terraform&lt;/strong&gt; comes in. By leveraging Infrastructure as Code, we can automate the process of creating our monitoring dashboard, saving time and ensuring consistent configuration. Let's dive into how we can use Terraform to solve our monitoring problem without having to resort to endless manual setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraforming the Custom Metrics Lambda function
&lt;/h3&gt;

&lt;p&gt;Let's first create the Lambda function using the &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/lambda/aws/latest"&gt;Terraform AWS Lambda Module&lt;/a&gt;. The AWS Lambda function will be responsible for checking the status of each Step Function and then pushing the corresponding status value to CloudWatch.&lt;/p&gt;

&lt;p&gt;Below is a possible Terraform configuration that creates the Lambda function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda_step_function_status"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;source&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"terraform-aws-modules/lambda/aws"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4.16.0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;function_name&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${local.project}-${var.env}-step-function-status-check"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;handler&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_function_status.lambda_handler"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;runtime&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python3.8"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;memory_size&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;architectures&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"x86_64"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;publish&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;source_path&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${path.module}/../source/lambda/step_function_status.py"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;artifacts_dir&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${path.root}/.terraform/lambda-builds/"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;attach_policy_statements&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;policy_statements&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;step_functions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;effect&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;actions&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"states:ListStateMachines"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"states:DescribeStateMachine"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"states:DescribeStateMachineForExecution"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;resources&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;cloudwatch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;effect&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;actions&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"cloudwatch:PutMetricData"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;resources&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Terraform configuration creates a new AWS Lambda function called &lt;code&gt;step-function-status-check&lt;/code&gt;. The function is configured with Python 3.8 as the runtime environment and the handler is set to &lt;code&gt;step_function_status.lambda_handler&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;source_path&lt;/code&gt; parameter is used to specify the location of the Python script, which contains the logic for checking the status of Step Functions and pushing the results to CloudWatch. The AWS Lambda function is granted permissions to list and describe state machines (i.e., Step Functions) and to put metric data to CloudWatch.&lt;/p&gt;

&lt;p&gt;We can then use the outputs of this Lambda function in our subsequent steps to set up the CloudWatch alarms and dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraforming a trigger event for Lambda in Eventbridge
&lt;/h3&gt;

&lt;p&gt;Now, let's use an &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eventbridge/aws/latest"&gt;EventBridge module&lt;/a&gt; to schedule our Lambda function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_function_status_cron_event"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;source&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"terraform-aws-modules/eventbridge/aws"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.17.2"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;create_bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;bus_name&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;rules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;step_function_status_cron&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;description&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Trigger to Step Function Status Check Lambda"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;schedule_expression&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cron(0 */3 * * ? *)"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Every&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;hours&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;targets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;step_function_status_cron&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda_step_function_status_check_cron"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;arn&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;module.lambda_step_function_status.lambda_function_arn&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;jsonencode(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"trigger"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cron"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;create_role&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code, we're creating an &lt;strong&gt;EventBridge rule&lt;/strong&gt; that will trigger our Lambda function every 3 hours as per your original requirement. This schedule expression, &lt;code&gt;cron(0 */3 * * ? *)&lt;/code&gt;, translates to "At minute 0 past every 3rd hour."&lt;/p&gt;

&lt;p&gt;The target of this rule is our previously created &lt;code&gt;lambda_step_function_status_check&lt;/code&gt; Lambda function. This means when the rule is triggered, it will execute the &lt;code&gt;lambda_step_function_status_check&lt;/code&gt; Lambda function. The &lt;code&gt;input&lt;/code&gt; is optional and can be used to pass specific event data to the Lambda function.&lt;/p&gt;

&lt;p&gt;We set &lt;code&gt;create_role&lt;/code&gt; to &lt;code&gt;false&lt;/code&gt; assuming that an existing IAM role will be used that has the necessary permissions for these resources. If you need to create a new role, you can change this to &lt;code&gt;true&lt;/code&gt; and ensure the role has the appropriate permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraforming the Step Functions
&lt;/h3&gt;

&lt;p&gt;For this example, I'll assume we have a list of step function names stored in a Terraform variable. This list will be used to generate each step function and its corresponding alarm. Here's a simplified example, using an &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/step-functions/aws/latest"&gt;AWS Step Functions module&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;variable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_functions"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;description&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A list of step function names"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;list(string)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;default&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"step1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step3"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_functions"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;source&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"terraform-aws-modules/step-functions/aws"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.7.3"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;for_each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;toset(var.step_functions)&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${each.value}-step-function"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;definition&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;file(&lt;/span&gt;&lt;span class="s2"&gt;"${path.module}/definitions/${each.value}.json"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;logging_configuration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;include_execution_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;level&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;cloudwatch_log_group_name&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/aws/stepfunctions/${each.value}-step-function"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;cloudwatch_log_group_retention_in_days&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, we're using the &lt;code&gt;for_each&lt;/code&gt; construct in Terraform to iterate over the list of step function names and create a step function for each. The definition for each step function is assumed to be stored in a separate JSON file in the &lt;code&gt;definitions&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;The output of this module is a map of step function resources, indexed by their name.&lt;/p&gt;

&lt;p&gt;💡&lt;br&gt;
&lt;em&gt;Please remember to replace the placeholders with your actual step function definitions and settings. This is a simplified example, and in a real-world scenario you would probably need to customize this further to match your actual infrastructure and business needs.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraforming Cloudwatch Alarms based on the Custom Metrics
&lt;/h3&gt;

&lt;p&gt;Let's proceed to the &lt;strong&gt;CloudWatch metric and alarm&lt;/strong&gt; setup. For this, we will use the &lt;code&gt;aws_cloudwatch_metric_alarm&lt;/code&gt; resource, which will create an alarm for each of our Step Functions. We will use the &lt;em&gt;Maximum&lt;/em&gt; statistic of our custom metric and set a &lt;strong&gt;threshold&lt;/strong&gt;, so that an alarm is triggered if the Step Function fails:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_cloudwatch_metric_alarm"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_function_alarm"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;for_each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;module.step_functions&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;alarm_name&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StepFunctionStatusAlarm-${each.key}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;comparison_operator&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GreaterThanOrEqualToThreshold"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;evaluation_periods&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;metric_name&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${each.key}_Status"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;namespace&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StepFunctions"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;period&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3600"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;should&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;be&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;less&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;than&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;execution&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Function&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;statistic&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Maximum"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;threshold&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;status&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;failure&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;alarm_description&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This metric checks status of Step Function ${each.key}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;alarm_actions&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;any&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;actions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;want&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;be&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;triggered&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;when&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;alarm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;goes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;off&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;treat_missing_data&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"missing"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create an &lt;strong&gt;alarm&lt;/strong&gt; for each of the Step Functions, and trigger it if the status of the last execution is higher than the number corresponding to 'SUCCEEDED'.&lt;/p&gt;

&lt;p&gt;💡&lt;br&gt;
&lt;em&gt;Please be aware that the period of the alarm should be set to a value that is less than the execution time of the Step Function. This is to ensure that the alarm always considers only the last execution of the Step Function.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraforming the Guardian dashboard
&lt;/h3&gt;

&lt;p&gt;Finally, let's create our &lt;strong&gt;dashboard&lt;/strong&gt; to keep an eye on all our Step Functions. We will use the &lt;code&gt;aws_cloudwatch_dashboard&lt;/code&gt; resource to do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_cloudwatch_dashboard"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_function_dashboard"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;dashboard_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StepFunctionStatusDashboard"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;dashboard_body&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;jsonencode(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;widgets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alarm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Step Functions Last Execution Status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"alarms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;values(aws_cloudwatch_metric_alarm.step_function_alarm&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;.arn)&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create a CloudWatch Dashboard with a single widget that shows the status of all our Step Function alarms. This way, you can quickly glance at the dashboard and see if there are any issues with your Step Functions, as shown in the screenshot below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1_euDIYY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rw6kzkb2x6relu7ykwpn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1_euDIYY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rw6kzkb2x6relu7ykwpn.png" alt="" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, we've crafted an efficient, &lt;strong&gt;automated system to monitor&lt;/strong&gt; the status of numerous AWS Step Functions, and &lt;strong&gt;visualized&lt;/strong&gt; this data in an easily &lt;strong&gt;digestible&lt;/strong&gt; dashboard. This solution not only saves you time by avoiding manual checks but also provides a real-time representation of the health of your processes.&lt;/p&gt;

&lt;p&gt;This system is flexible, customizable, and can be adapted to monitor different types of Step Functions or to include multiple alarms per function. We've utilized AWS services and Terraform to ensure it can keep up with dynamic cloud environments and be easily &lt;strong&gt;adjustable&lt;/strong&gt; to meet your specific needs.&lt;/p&gt;

&lt;p&gt;By 'keeping an eye on the &lt;strong&gt;herd&lt;/strong&gt;', we emphasize the importance of reliable, automated monitoring in today's complex IT landscapes. The goal of this solution is to enhance &lt;strong&gt;operational efficiency&lt;/strong&gt;, aid in troubleshooting, and ensure the smooth running of your business processes. Keep coding and monitoring smart!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>stepfunctions</category>
      <category>monitoring</category>
      <category>lambda</category>
    </item>
    <item>
      <title>Push the Green Button: Creating Event Gadgets with IoT and Serverless Architecture</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Thu, 22 Jun 2023 20:32:46 +0000</pubDate>
      <link>https://dev.to/aws-builders/push-the-green-button-creating-event-gadgets-with-iot-and-serverless-architecture-3dlk</link>
      <guid>https://dev.to/aws-builders/push-the-green-button-creating-event-gadgets-with-iot-and-serverless-architecture-3dlk</guid>
      <description>&lt;p&gt;Preparing for the &lt;a href="https://aws.amazon.com/it/events/summits/milano/"&gt;AWS Summit Milano 2023&lt;/a&gt; not only as an &lt;a href="https://aws.amazon.com/developer/community/community-builders/community-builders-directory/"&gt;AWS Community Builder&lt;/a&gt; but as a representative of my company (sponsor of the event), I found myself grappling with an issue that seems small but is indeed profound - the ubiquitous, forgettable, and somewhat outdated practice of &lt;strong&gt;corporate giveaways&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of contributing to the mountain of corporate freebies that usually end up in a drawer somewhere, I wondered, why not leverage my tech know-how for something more meaningful and sustainable? Thus was born the idea to create a simple but compelling swag: a unique, sustainable memento, in the form of &lt;strong&gt;a tree planted for our visitors.&lt;/strong&gt; 🌱&lt;/p&gt;

&lt;h2&gt;
  
  
  Greening the Gadget: An Unconventional Approach to Event Giveaways
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;TL;DR: This chapter details my journey to develop an idea for an eco-friendly gadget for the AWS Summit. The project involves creating a physical button that, when pressed, starts a process to plant a tree through Tree-nation, yielding a unique URL for each tree. This URL is then transformed into a QR code through AWS Lambda, giving me a tangible, scannable memento that participants can take home and redeem at their convenience. If you're primarily interested in the technical side of this endeavor, feel free to skip ahead to the next section.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Tree-gifting Made Easy
&lt;/h3&gt;

&lt;p&gt;As I explored potential platforms for my project, I found many that offered the opportunity to &lt;strong&gt;fund reforestation&lt;/strong&gt; efforts across the globe. Some even had options for &lt;strong&gt;gifting trees&lt;/strong&gt; - a sweet gesture, isn't it? But here's where the challenge arose: nearly all required prior registration of the gift recipient, complete with name and email address.&lt;/p&gt;

&lt;p&gt;This didn't quite fit the &lt;strong&gt;vision&lt;/strong&gt; I had. I wanted the process to be swift, convenient and not involve me handling personal data or permissions. Who wants to fill out forms in a crowded event?&lt;/p&gt;

&lt;p&gt;Luckily, I found &lt;a href="https://tree-nation.com/"&gt;Tree-nation&lt;/a&gt;. Not only do they have &lt;a href="https://kb.tree-nation.com/knowledge/api-manual"&gt;a well-documented API platform&lt;/a&gt; to interact with, but they also offer the option to gift trees 'anonymously'. This meant that I could buy a tree as a gift and provide my user a URL; the user could then independently redeem their tree. No exchange of personal information and no long sign-ups at my booth during the event.&lt;/p&gt;

&lt;p&gt;Now, having a URL for each tree was handy, but the challenge was: how to efficiently share these URLs at an event? The obvious answer is a &lt;strong&gt;QR code&lt;/strong&gt;, printed and handed out to the user, so that it would serve as the physical gadget from the event. An eco-friendly token of their contribution to the greener cause, that they could hold in their hands, take home, and scan whenever they chose. No need to scan it right then and there, no strings attached. They could redeem their tree whenever they were ready.&lt;/p&gt;

&lt;p&gt;Turning a URL into a QR code seemed like a task tailor-made for an &lt;strong&gt;AWS Lambda function&lt;/strong&gt;. So, armed with the plan to use QR codes and AWS Lambda, the user journey was starting to shape up quite nicely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Button Chronicles: Swapping Digital for the Real Deal
&lt;/h3&gt;

&lt;p&gt;With the destination locked in thanks to Tree-nation, it was time to figure out the journey. The original plan? A virtual button on a &lt;strong&gt;web page or app&lt;/strong&gt;, ready to be displayed on a tablet at the AWS Summit. With just a tap from a visitor, the tree-planting process would be set in motion. I quickly started sketching out a prototype to bring this concept to life.&lt;/p&gt;

&lt;p&gt;However, as the prototype took shape, I realized it wasn't hitting the mark. It was essentially a webpage interacting with an API, which felt a tad &lt;strong&gt;ordinary&lt;/strong&gt;. I was looking for something with a bit more spark, a little more 'wow' factor, or at least more unusual.&lt;/p&gt;

&lt;p&gt;So, the virtual button was out, and a real, physical button took its place in my idea. It felt risky as I hadn't ventured into programming an electronic board before. But it was a thrilling challenge and an opportunity to break new ground. Anyway, this switch wasn't just about making the project more exciting; it was about adding a tangible touch to the user experience, and giving the project an original edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pressing Forward: The Button and Code Considerations
&lt;/h2&gt;

&lt;p&gt;Navigating through my idea, I immediately liked an &lt;a href="https://www.rowse-automation.co.uk/abb-1sfa619101r1022"&gt;ABB normally-open push button&lt;/a&gt; that belonged to my household. To connect this button to the digital world, I opted for an &lt;a href="https://en.wikipedia.org/wiki/ESP8266"&gt;ESP8266 electronic board&lt;/a&gt; and used &lt;a href="https://support.arduino.cc/hc/en-us/articles/360019833020-Download-and-install-Arduino-IDE"&gt;Arduino IDE&lt;/a&gt; as the development platform to create and flash the necessary code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JSkVXgSw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bk9s9g6jbi5jati73d1t.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JSkVXgSw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bk9s9g6jbi5jati73d1t.jpg" alt="" width="500" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From an architectural perspective, I ventured down several paths before finding the best solution for my specific use case:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Simplified Approach with Hardcoded Credentials&lt;/strong&gt;: The original blueprint of my project was pretty straightforward. I created an &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/urls-invocation.html"&gt;AWS Lambda function directly exposed to the Internet&lt;/a&gt;, acting as an endpoint for a simple HTTPS request.&lt;/p&gt;

&lt;p&gt;In this initial "&lt;strong&gt;No-Auth&lt;/strong&gt;" setup, a user/password combination was directly verified by the Lambda code. &lt;strong&gt;AWS API Gateway&lt;/strong&gt; was also considered in this scenario, since it could serve as a robust and secure front door to manage the Lambda function. However, it seemed like an &lt;strong&gt;overkill&lt;/strong&gt; for a single-function project of this nature, pushing me to consider a more streamlined approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;IAM-Authenticated Approach&lt;/strong&gt;: I then evaluated an approach that hinged on &lt;strong&gt;AWS IAM&lt;/strong&gt; to authenticate requests. This would ensure robust security but would also add a layer of complexity to the solution.&lt;/p&gt;

&lt;p&gt;Whether directly invoking the AWS Lambda function or sending a message to an &lt;strong&gt;Amazon SQS queue&lt;/strong&gt;, the device would need to include &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html"&gt;a Sigv4 signature in its request&lt;/a&gt;. This signature, generated using an Access Key and Secret Key combination, would be verified by AWS before proceeding with the request.&lt;/p&gt;

&lt;p&gt;While this approach certainly enhanced security, it did not come without drawbacks. Notably, the time taken to verify the Sigv4 signature &lt;strong&gt;significantly affected performance&lt;/strong&gt;. The authentication process alone took around 10 seconds, and when adding the 2-second execution time of the Lambda function, the total operation time ballooned to 12 seconds. Given this drawback, I moved on to explore another option.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS IoT Core:&lt;/strong&gt; Given that my system was physical, I turned to &lt;strong&gt;AWS IoT Core&lt;/strong&gt;. Devices registered on IoT Core have the native ability to communicate through queues with MQTT protocol (both publish and subscribe) and HTTPS (publish only). This interaction occurs through &lt;strong&gt;SSL certificates&lt;/strong&gt; signed by AWS and installed on the device. Initially, I thought the MQTT protocol wasn't suitable for my use case because it required a constantly open connection. My button was a one-shot device and thus &lt;strong&gt;HTTPS seemed more appropriate&lt;/strong&gt;. However, after rewriting the code, I found that inserting a message into the queue took around 7 seconds, which was better than before, but &lt;strong&gt;not exactly impressive&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MQTT with open connection:&lt;/strong&gt; Finally, I wrote another version of the code, which established an &lt;strong&gt;MQTT connection&lt;/strong&gt; authenticated through certificates at device startup and kept it open. With this approach, sending a message at the button press was almost &lt;strong&gt;instantaneous&lt;/strong&gt;! This was a significant improvement. The execution time for Lambda remained the same (2 seconds), but with a better authentication system.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After weighing the pros and cons, I settled on the MQTT solution. Some might argue that a button should not maintain an open connection, as it would be more suitable for devices sending continuous sensor data. However, for my particular use case, I deemed it an &lt;strong&gt;acceptable compromise&lt;/strong&gt;. As for the HTTPS solutions, while I could have implemented better systems (JWT or other types of authentication), I found it &lt;strong&gt;out of scope&lt;/strong&gt; for my project and wanted to use something &lt;strong&gt;readily available&lt;/strong&gt;, such as IAM or SSL certificates.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;In light of the performance measured, I chose the compromise that seemed most acceptable to me. This doesn't mean it will work for everyone. I would always recommend considering your specific use case and determining what would work best for your situation.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For a visitor to physically take away a token of their participation, the project needed to incorporate a mechanism to create a &lt;strong&gt;physical output&lt;/strong&gt;. This requirement introduced a printer into the mix. More specifically, a printer that could generate QR codes linked to the newly planted trees. To orchestrate this, I decided to leverage AWS IoT again, by registering a &lt;strong&gt;label printer&lt;/strong&gt; as a device and using the AWS IoT Jobs service to send print requests as needed. Thus, the final step of our Lambda function involves the creation of an &lt;a href="https://docs.aws.amazon.com/iot/latest/developerguide/iot-jobs.html"&gt;&lt;strong&gt;AWS IoT Job&lt;/strong&gt;&lt;/a&gt; that instructs the printer to print the respective QR code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--olNDKf16--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lrq6z8ghv49dculqqndw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--olNDKf16--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lrq6z8ghv49dculqqndw.jpg" alt="" width="800" height="865"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Green Code Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unrolling the AWS infrastructure
&lt;/h3&gt;

&lt;p&gt;To manage the AWS infrastructure, I opted for &lt;a href="https://aws.amazon.com/cdk/"&gt;AWS Cloud Development Kit (AWS CDK)&lt;/a&gt; in Python. You can find the &lt;a href="https://github.com/theonlymonica/aws-iot-tree-button"&gt;complete solution on my Github repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I started by creating an Internet of Things (IoT) &lt;strong&gt;thing&lt;/strong&gt;, which is a representation of a specific device or logical entity. In this case, it represents the button in my system.&lt;/p&gt;

&lt;p&gt;Next, I generate a certificate signing request (CSR) for my IoT thing, used to request a device &lt;strong&gt;certificate&lt;/strong&gt;. This certificate allows the device to connect to AWS IoT. To manage permissions, I create an IoT policy and attach it to my IoT thing.&lt;/p&gt;

&lt;p&gt;After creating the virtual resource on AWS, I can &lt;strong&gt;download&lt;/strong&gt; the SSL certificate that AWS generated for you through the CSR I provided.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Diving into the Device Code: AWS IoT Connectivity with ESP8266&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Next, we move on to the C++ code written for the &lt;strong&gt;ESP8266&lt;/strong&gt; board. I started by creating a &lt;code&gt;Secrets.h&lt;/code&gt; file where I inserted the certificate we just downloaded, the corresponding private key (created by CDK), and AWS's root CA (&lt;a href="https://www.amazontrust.com/repository/AmazonRootCA1.pem"&gt;download it here&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;pgmspace.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cp"&gt;#define SECRET
&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;WIFI_SSID&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"XXXXXXXXX"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;WIFI_PASSWORD&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"YYYYYYYYY"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cp"&gt;#define THINGNAME "button"
&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;MQTT_HOST&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"abcdefghijkl-ats.iot.eu-west-1.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Amazon Root CA 1&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;cacert&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;PROGMEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;R"EOF(
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
)EOF"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Device Certificate&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;client_cert&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;PROGMEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;R"KEY(
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
)KEY"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Device Private Key&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;privkey&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;PROGMEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;R"KEY(
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
)KEY"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;MQTT_HOST&lt;/code&gt; value can be retrieved by executing &lt;code&gt;aws iot describe-endpoint --endpoint-type iot:Data-ATS&lt;/code&gt; in the AWS CLI.&lt;/p&gt;

&lt;p&gt;Several key functions that power the IoT button application are included in the &lt;code&gt;.ino&lt;/code&gt; file that I flash on my device:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The setup() Function&lt;/strong&gt;: this function is called once at the beginning when the device is powered up:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Serial&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;115200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;pinMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BUTTON_PIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INPUT_PULLUP&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;connectAWS&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;The function begins by setting up the serial communication for debugging purposes and configuring the &lt;strong&gt;button's input pin&lt;/strong&gt; (corresponding to the physical pin on the device where the button is wired). Then, it invokes the &lt;code&gt;connectAWS()&lt;/code&gt; function.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The connectAWS() Function&lt;/strong&gt;: this function is responsible for establishing a connection to the AWS IoT Core service:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;connectAWS&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;WiFi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WIFI_STA&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;WiFi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WIFI_SSID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WIFI_PASSWORD&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;//... Connecting to WiFi ...&lt;/span&gt;
  &lt;span class="n"&gt;NTPConnect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setTrustAnchors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;cert&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setClientRSACert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;client_crt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MQTT_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8883&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messageReceived&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="n"&gt;reconnectAWS&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;This function connects the ESP8266 to the WiFi network, sets the device's time, &lt;strong&gt;sets the trust anchors and client certificates&lt;/strong&gt; to the secure WiFi client &lt;code&gt;net&lt;/code&gt; and the MQTT server host and port to the MQTT client &lt;code&gt;client&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The reconnectAWS() Function&lt;/strong&gt;: as the name suggests, this function is used to connect to the AWS IoT Core service:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;reconnectAWS&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connected&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;THINGNAME&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AWS_IOT_SUBSCRIBE_TOPIC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;This function keeps trying to reconnect to the AWS IoT Core service as long as the client is not connected. If the connection is successful, it subscribes to the MQTT topic defined by &lt;code&gt;AWS_IOT_SUBSCRIBE_TOPIC&lt;/code&gt; (this serves as the first "connection" with the platform, even if we are not waiting for any messages). If the connection is not successful, the function waits for 5 seconds before retrying the connection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The loop() Function&lt;/strong&gt;: The &lt;code&gt;loop()&lt;/code&gt; function runs in a loop after the &lt;code&gt;setup()&lt;/code&gt; function completes:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connected&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; 
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;reconnectAWS&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// ... button reading and debouncing code ...&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;button_state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;message_sent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;publishMessage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="n"&gt;message_sent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;message_sent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;In this function, I first check if the device is still connected to AWS IoT Core. If not, it tries to reconnect using the &lt;code&gt;reconnectAWS()&lt;/code&gt; function. After that, it checks the button's state. If the button is &lt;strong&gt;pressed&lt;/strong&gt; (indicated by a state of &lt;strong&gt;LOW&lt;/strong&gt;), a message is published to AWS IoT Core if it hasn't been sent already. This message sending is guarded by the &lt;code&gt;message_sent&lt;/code&gt; variable to ensure that only &lt;strong&gt;one message is sent per button press&lt;/strong&gt;. After the button is released, &lt;code&gt;message_sent&lt;/code&gt; is reset to false, enabling the next button press to send a message. Lastly, &lt;code&gt;client.loop()&lt;/code&gt; is called to allow the MQTT client to &lt;strong&gt;maintain&lt;/strong&gt; its connection to the server.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Lambda in the middle
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;workhorse&lt;/strong&gt;, responsible for executing my project's logic, is an &lt;strong&gt;AWS Lambda function&lt;/strong&gt; that sends a request to the &lt;a href="https://documenter.getpostman.com/view/6643991/S17m1X5P#4d57b1af-95c4-4b62-9644-a719d49d32ce"&gt;Tree-nation API to plant a tree&lt;/a&gt;. The request includes the Tree-nation &lt;code&gt;planter_id&lt;/code&gt; (which is the ID associated with the Tree-nation &lt;strong&gt;account&lt;/strong&gt;, provided by Tree-nation's support team. This, along with the token obtained from Tree-nation and stored as an SSM parameter, is used to authenticate API requests) and the selected &lt;code&gt;species_id&lt;/code&gt; (a list of &lt;strong&gt;tree species&lt;/strong&gt; that the user wishes to plant. During each operation, one species is &lt;strong&gt;randomly&lt;/strong&gt; selected. I've selected several species from the Tree-nation catalogue, so this is a list of IDs from their platform). If the request is not successful, the function retries after a brief wait.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="s"&gt;"recipients"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s"&gt;"internal_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="s"&gt;"planter_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;planter_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"species_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'Content-Type'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'application/json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'Authorization'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'Bearer '&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"POST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon receiving a successful response from Tree-nation, the function generates a &lt;strong&gt;QR code&lt;/strong&gt; that links to the newly planted tree's page on the Tree-nation website.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;collect_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="s"&gt;'trees'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s"&gt;'collect_url'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;input_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collect_url&lt;/span&gt;
&lt;span class="n"&gt;qr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qrcode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QRCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;box_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;border&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;qr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;qr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;make_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'black'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'white'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated QR code is then saved as an &lt;strong&gt;image&lt;/strong&gt; in an Amazon S3 bucket. Additionally, the function stores the image's S3 URL in a DynamoDB table for future reference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".png"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'ID'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'treenation_id'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;treenation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'payment_id'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'timestamp'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'URL'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"https://"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".s3.amazonaws.com/"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".png"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lastly, the function creates an &lt;strong&gt;AWS IoT Job&lt;/strong&gt; that sends a command to the registered printer device. For this purpose, a &lt;a href="https://docs.aws.amazon.com/iot/latest/developerguide/job-templates.html"&gt;&lt;strong&gt;job template&lt;/strong&gt;&lt;/a&gt; provided by AWS is used, which runs a command — in this case, a shell script hosted on the device with a parameter value corresponding to the S3 URL of the image to print.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;iot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;jobId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'RunCommand-'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;jobTemplateArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'arn:aws:iot:'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;'::jobtemplate/AWS-Run-Command:1.0'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documentParameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'command'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"/opt/print.sh,s3://"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;imageid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".png"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Printing the QR Code automatically
&lt;/h3&gt;

&lt;p&gt;The final step in our process is the physical output, the printed QR code that we hand to our event attendee. For this task, I had a &lt;strong&gt;Brother QL-500 label printer&lt;/strong&gt; borrowed from the available equipment at the company. There's an open-source Python library on Github, &lt;a href="https://github.com/pklaus/brother_ql"&gt;brother_ql&lt;/a&gt;, that makes interacting with this printer incredibly straightforward from a Linux command line.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--frYlI8Yy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1sryowgrkw1a70u2pqlp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--frYlI8Yy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1sryowgrkw1a70u2pqlp.jpg" alt="" width="500" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My interaction device was an old &lt;strong&gt;Raspberry Pi 2&lt;/strong&gt; equipped with a &lt;strong&gt;Wi-fi dongle&lt;/strong&gt;. As for the connection between the Raspberry Pi and the Brother QL-500 label printer, I established it using a &lt;strong&gt;standard USB cable&lt;/strong&gt;. The fact that the &lt;code&gt;brother_ql&lt;/code&gt; library enables easy command line usage made the Raspberry Pi a perfect choice for this setup. This allowed the device to receive AWS IoT Jobs, process the commands, and consequently interact with the printer to produce the QR code labels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IPfgs1YT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hx115mhymahjvy3i5smo.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IPfgs1YT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hx115mhymahjvy3i5smo.jpg" alt="" width="500" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used the AWS CDK again to create the necessary resources on &lt;strong&gt;AWS IoT&lt;/strong&gt;. The setup for this process was very similar to the one I performed for the button. I saved the &lt;strong&gt;SSL certificates&lt;/strong&gt; onto the Raspberry Pi to make them available for the device's connection with AWS IoT.&lt;/p&gt;

&lt;p&gt;To receive commands from AWS IoT and translate them into actions performed by the printer, the Raspberry Pi uses the &lt;a href="https://docs.aws.amazon.com/iot/latest/developerguide/iot-sdks.html"&gt;aws-iot-device-client&lt;/a&gt;, which I downloaded and installed on it. You can follow &lt;a href="https://github.com/awslabs/aws-iot-device-client"&gt;these instructions&lt;/a&gt; to do the same.&lt;/p&gt;

&lt;p&gt;This installation includes a service in &lt;strong&gt;systemd&lt;/strong&gt; that allows your device to start communicating with AWS IoT automatically whenever it boots up.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;configuration file&lt;/strong&gt; &lt;code&gt;/etc/.aws-iot-device-client/aws-iot-device-client.conf&lt;/code&gt; sets up the connection between the Raspberry Pi device and the AWS IoT Core, enabling the necessary functionalities for the system. It uses the SSL certificates previously saved onto the Raspberry Pi to establish a secure connection. &lt;strong&gt;Jobs are enabled as the primary functionality&lt;/strong&gt;, as that's how the printing instructions are received from the Lambda function.&lt;/p&gt;

&lt;p&gt;A setup can be seen in the official example of a configuration file for the AWS IoT Device Client, available at this &lt;a href="https://github.com/awslabs/aws-iot-device-client/blob/main/config-template.json"&gt;link&lt;/a&gt;. This allows you to explore a fully annotated configuration file, where each section's purpose is explained in detail.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"abcdefghijkl-ats.iot.eu-west-1.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cert"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/opt/certs/greengadget_printer.public.crt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/opt/certs/greengadget_printer.private.key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"root-ca"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/opt/certs/root-CA.crt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"thing-name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"printer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"logging"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enable-sdk-logging"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DEBUG"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"STDOUT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"handler-directory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this setup, the device is ready to securely connect to AWS IoT and receive the Jobs containing the instructions to print the QR codes.&lt;/p&gt;

&lt;p&gt;As a final detail, the &lt;code&gt;print.sh&lt;/code&gt; shell script that the Job is instructed to run is exceptionally straightforward. Its function is to set the necessary parameters for the Brother printer, &lt;strong&gt;fetch&lt;/strong&gt; the QR code image from the provided S3 URL, print it using the &lt;code&gt;brother_ql&lt;/code&gt; command line tool, and finally remove the temporary file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BROTHER_QL_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;QL-500
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BROTHER_QL_PRINTER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;file:///dev/usb/lp0
&lt;span class="nv"&gt;TMP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/file.png
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nv"&gt;$1&lt;/span&gt; &lt;span class="nv"&gt;$TMP_FILE&lt;/span&gt;
brother_ql print &lt;span class="nt"&gt;-l&lt;/span&gt; 38 &lt;span class="nv"&gt;$TMP_FILE&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$TMP_FILE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;BROTHER_QL_MODEL&lt;/code&gt; and &lt;code&gt;BROTHER_QL_PRINTER&lt;/code&gt; environment variables are set to specify the printer model and its connection interface respectively. Here, &lt;code&gt;file:///dev/usb/lp0&lt;/code&gt; represents a connection through the first USB printer device file in a Linux system.&lt;/p&gt;

&lt;p&gt;With this script, &lt;strong&gt;the printing process becomes fully automated&lt;/strong&gt;. Every time a new Job is created by the Lambda function, the device receives the Job, runs the script, fetches the correct QR code from S3, and prints it out. It's a seamless, hands-off process, right from the button press to the physical QR code label in hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;BONUS Section: The Art of DIY&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In addition to the technical details, there is an equally fun part of the project: &lt;strong&gt;DIY crafting&lt;/strong&gt;! After all, for the event, I didn't want the button and its electronic board to be simply left unprotected on a table. It needed a touch of &lt;strong&gt;design&lt;/strong&gt;, albeit delivered with a playful tone.&lt;/p&gt;

&lt;p&gt;The idea was to construct a &lt;strong&gt;small box&lt;/strong&gt; to house the electronic board, with a hole for the button to protrude through. 3D printing was the first idea that came to my mind; however, it soon dawned on me that producing a &lt;strong&gt;plastic&lt;/strong&gt; object was at odds with our goal of &lt;strong&gt;sustainable&lt;/strong&gt; gifting. It was then that I pivoted to a more eco-friendly material - &lt;strong&gt;wood&lt;/strong&gt;. This choice added a touch of warmth and charm to the final product.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zQLf7xlr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsf3mict61g27873slm1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zQLf7xlr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsf3mict61g27873slm1.jpg" alt="" width="500" height="685"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The box ended up with a drawer that can be opened to reveal the board:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ba29saRy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ff8h4njgpgdy3bk9eoya.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ba29saRy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ff8h4njgpgdy3bk9eoya.jpg" alt="" width="500" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All of this was painted with water-based paint:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xqSuHONZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uyveiiv2e8psr0qpxx1c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xqSuHONZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uyveiiv2e8psr0qpxx1c.jpg" alt="" width="500" height="693"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I carved out a pocket in the wood to fit in another piece of wood that symbolizes a stylized tree (that I owned before and just repainted accordingly).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the final result!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OP0EIDZ3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6p9wdbxtxqxffrbjys9v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OP0EIDZ3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6p9wdbxtxqxffrbjys9v.jpg" alt="" width="500" height="667"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm not a pro at crafting but I had immense &lt;strong&gt;fun&lt;/strong&gt; with this DIY part of the project; it's an incredibly rewarding work! Do you like it?&lt;/p&gt;

&lt;p&gt;So, I'm happy to think that our job is not just about creating high-tech applications or services - it can also be a way to promote &lt;strong&gt;sustainability&lt;/strong&gt; in creative and fun ways. I hope this journey at the crossroads of technology and the environment has inspired you as much as it did me. In the end, I transformed a simple button press into a &lt;strong&gt;real-world positive impact&lt;/strong&gt;. And let me tell you, there's nothing quite like seeing your code come to life... and plant a tree! 🌱&lt;/p&gt;

</description>
      <category>aws</category>
      <category>iot</category>
      <category>serverless</category>
      <category>greentech</category>
    </item>
    <item>
      <title>Automating the injection of CI/CD runtime information into Terraform provider</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Fri, 31 Mar 2023 18:35:29 +0000</pubDate>
      <link>https://dev.to/aws-builders/automating-the-injection-of-cicd-runtime-information-into-terraform-code-3ph4</link>
      <guid>https://dev.to/aws-builders/automating-the-injection-of-cicd-runtime-information-into-terraform-code-3ph4</guid>
      <description>&lt;p&gt;As a DevOps engineer or software developer, you may have encountered scenarios where you must inject CI/CD &lt;strong&gt;runtime information into your Terraform provider code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This information could be anything from &lt;strong&gt;environment-specific variables&lt;/strong&gt; to runtime configuration values &lt;strong&gt;only available during the CI/CD process&lt;/strong&gt;. However, provider usage is evaluated very early on in the Terraform run, before we have enough context to do variable interpolations, so you can't use variables there (like you can normally do with standard resources and &lt;a href="https://developer.hashicorp.com/terraform/language/values/variables#environment-variables"&gt;environment variables&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;I stumbled upon a real-world use case dealing with some Terraform code to create &lt;strong&gt;AWS&lt;/strong&gt; resources, where I need to add information about the role ARN to be assumed, and this information &lt;strong&gt;cannot be statically inserted&lt;/strong&gt; in the code, because this code needs to be executed with different roles depending on some condition and/or constraints.&lt;/p&gt;

&lt;p&gt;Another use case is to add default tags to all providers, such as the &lt;strong&gt;build number&lt;/strong&gt;, to ensure consistency across all created AWS resources (and maybe you can't be sure that a &lt;code&gt;default_tags&lt;/code&gt; entry is present in every provider).&lt;/p&gt;

&lt;p&gt;To automate the process of injecting CI/CD runtime information into our Terraform provider, we'll introduce the tool &lt;a href="https://github.com/minamijoyo/hcledit"&gt;hcledit&lt;/a&gt;. With &lt;code&gt;hcledit&lt;/code&gt; and some other manipulation, we can insert data into the Terraform code using &lt;code&gt;regex&lt;/code&gt;/&lt;code&gt;grep&lt;/code&gt; to find the correct place to add it.&lt;/p&gt;

&lt;p&gt;How did I do it? Let's see the code! Here's my Bash function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;initialize_provider&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;# allows the last command in a pipeline to be executed in the current shell environment, rather than a subshell&lt;/span&gt;
  &lt;span class="nb"&gt;shopt&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; lastpipe
  &lt;span class="c"&gt;# this is a simple regex to match every provider&lt;/span&gt;
  &lt;span class="nv"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'provider[[:blank:]]\+"\([[:alnum:]_-]\+\)"[[:blank:]]*{'&lt;/span&gt;

  &lt;span class="c"&gt;# Search for files in the current directory that contain the regular expression&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;file &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-rl&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$regex&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; .&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Processing file: &lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
      &lt;span class="c"&gt;# Create temporary file with the same name as the original file&lt;/span&gt;
      &lt;span class="nv"&gt;temp_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.tmp"&lt;/span&gt;
      &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$temp_file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
      &lt;span class="c"&gt;# declare an array variable&lt;/span&gt;
      &lt;span class="nb"&gt;declare&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; arr

      &lt;span class="c"&gt;# Modify file contents and write to temporary file&lt;/span&gt;
      &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$regex&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; line&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
          &lt;span class="c"&gt;# Extract the provider name, which is the second word without the quotes and braces&lt;/span&gt;
          &lt;span class="nv"&gt;pn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s2"&gt;"s/&lt;/span&gt;&lt;span class="nv"&gt;$regex&lt;/span&gt;&lt;span class="s2"&gt;/ &lt;/span&gt;&lt;span class="se"&gt;\1&lt;/span&gt;&lt;span class="s2"&gt;/g"&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'"{}'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
          &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Found provider: &lt;/span&gt;&lt;span class="nv"&gt;$pn&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
          &lt;span class="c"&gt;#add the provider name to the array&lt;/span&gt;
          arr+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pn&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;done&lt;/span&gt;

      &lt;span class="c"&gt;# Iterate over the array&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;provider_name &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
          &lt;span class="c"&gt;# Add the assume_role block to the provider&lt;/span&gt;
          hcledit block append provider.&lt;span class="nv"&gt;$provider_name&lt;/span&gt; assume_role &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$temp_file&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;--newline&lt;/span&gt;
          hcledit attribute append provider.&lt;span class="nv"&gt;$provider_name&lt;/span&gt;.assume_role.role_arn &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$ROLE_ARN&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$temp_file&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;
          &lt;span class="c"&gt;# Add the default_tags block to the provider&lt;/span&gt;
          hcledit block append provider.&lt;span class="nv"&gt;$provider_name&lt;/span&gt; default_tags &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$temp_file&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;--newline&lt;/span&gt;
          hcledit attribute append provider.&lt;span class="nv"&gt;$provider_name&lt;/span&gt;.default_tags.tags &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$BUILD_ID&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$temp_file&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;
      &lt;span class="k"&gt;done
      &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$temp_file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;done&lt;/span&gt;

  &lt;span class="c"&gt;# Move temporary files to original file names&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;temp_file &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;.tmp&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
      &lt;/span&gt;&lt;span class="nb"&gt;mv&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$temp_file&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;temp_file&lt;/span&gt;&lt;span class="p"&gt;%.tmp&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;done&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's clarify what it does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The first line enables the &lt;code&gt;lastpipe&lt;/code&gt; shell option, which allows the last command in a pipeline to be executed in the current shell environment, rather than a subshell. It is most useful if you call this function from another process (i.e. your CI/CD pipeline).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;regex&lt;/code&gt; is a regular expression that matches &lt;strong&gt;every&lt;/strong&gt; provider in the Terraform code. I need to match multiple providers because you can have more than one, for example when you deploy in different regions. &lt;strong&gt;Beware&lt;/strong&gt; that, if you use non-AWS providers, you may change this regex to exclude them or, in general, better match your needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The script searches for files in the current directory that contain the &lt;code&gt;regex&lt;/code&gt; regular expression.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each file found, the script creates a temporary file with the same name as the original file and copies the contents of the original file to the temporary file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The script declares an empty array called &lt;code&gt;arr&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The script reads through the contents of the original file and extracts the &lt;strong&gt;names of all the providers&lt;/strong&gt; matching the &lt;code&gt;regex&lt;/code&gt; regular expression. Each provider name is added to the &lt;code&gt;arr&lt;/code&gt; array.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The script iterates over the &lt;code&gt;arr&lt;/code&gt; array and appends an &lt;code&gt;assume_role&lt;/code&gt; block to each provider in the Terraform code, using &lt;code&gt;hcledit&lt;/code&gt;, which is much more convenient than Bash directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each &lt;code&gt;assume_role&lt;/code&gt; block, the script appends a &lt;code&gt;role_arn&lt;/code&gt; attribute with the value of &lt;code&gt;$ROLE_ARN&lt;/code&gt;, using &lt;code&gt;hcledit&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Then the script uses &lt;code&gt;hcledit&lt;/code&gt; again to add a &lt;code&gt;default_tags&lt;/code&gt; block to each provider in the Terraform code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For each &lt;code&gt;default_tags&lt;/code&gt; block, the script appends a &lt;code&gt;tags&lt;/code&gt; attribute with a list containing the value of &lt;code&gt;$BUILD_ID&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, the script moves the temporary files to their original file names by removing the &lt;code&gt;.tmp&lt;/code&gt; extension.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In conclusion, by automating the injection of CI/CD runtime info into your Terraform code with tools like &lt;code&gt;hcledit&lt;/code&gt; and a little bit of scripting know-how, you can easily add environment-specific variables and runtime configuration values to your Terraform code, making it more efficient and less error-prone.&lt;/p&gt;

&lt;p&gt;Happy automating!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>terraform</category>
      <category>bash</category>
    </item>
    <item>
      <title>Continuous Delivery for the rest of us</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Wed, 04 Jan 2023 11:16:21 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/continuous-delivery-for-the-rest-of-us-37ik</link>
      <guid>https://dev.to/monica_colangelo/continuous-delivery-for-the-rest-of-us-37ik</guid>
      <description>&lt;h2&gt;
  
  
  Getting the bigger picture
&lt;/h2&gt;

&lt;p&gt;DevOps methodologies have now taken hold; the use of &lt;strong&gt;pipelines&lt;/strong&gt; for code builds is a well-established practice adopted by any modern development team. But when we change the conversation from &lt;strong&gt;Integration&lt;/strong&gt; to &lt;strong&gt;Deployment&lt;/strong&gt;, I often find myself looking at extremely simplified examples, where newly built code is released into production with a single command at the end of the build and test stages. Which is, of course, the definition of &lt;a href="https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment" rel="noopener noreferrer"&gt;Continuous Deployment&lt;/a&gt;. But in my daily work, I have learned that reality is rarely that simple.&lt;/p&gt;

&lt;p&gt;If your team is super smart and has every possible test in place, and your business structure allows you to apply Continuous Deployment directly from commit to production, first of all, congratulations! Unfortunately, this is not the case everywhere: the rest of us often need to deploy our code in different environments at different moments in time (intervals of days, or weeks! 😰 &lt;em&gt;I know, I know...&lt;/em&gt;), there are &lt;strong&gt;release windows&lt;/strong&gt; to be met to release to production, and there are &lt;strong&gt;acceptance tests&lt;/strong&gt; performed by other teams so the testing environment can only be updated at agreed-upon times, and a variety of other constraints, especially in large and very structured companies. Still, we don't want to give up the benefits of automation.&lt;/p&gt;

&lt;p&gt;Automation, yes, I said it. But, how? When your artefacts need to be deployed in &lt;strong&gt;multiple environments,&lt;/strong&gt; you can't just repeat the same process for each environment: you need to deploy the software &lt;strong&gt;without rebuilding&lt;/strong&gt; it. This aligns with the &lt;strong&gt;&lt;em&gt;"build once, deploy anywhere"&lt;/em&gt;&lt;/strong&gt; principle, which states that once a release candidate for a software component has been created, it should not be altered in any way before it is deployed to production. And if you find yourself in the situation that I just described, with different timing for each environment, you can't just put your deployments in line and execute them one after another in the same pipeline execution.&lt;/p&gt;

&lt;p&gt;I have come across various articles on the Web about &lt;strong&gt;GitOps&lt;/strong&gt;, and while they can be useful, they focus on specific, isolated aspects of configuration, or they often oversimplify, leaving me feeling like I'm missing the &lt;strong&gt;bigger picture&lt;/strong&gt;: which is, of course, the &lt;strong&gt;process&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this article, I want to illustrate an approach that I successfully have applied in several projects, &lt;strong&gt;combining&lt;/strong&gt; a classic Continuous Integration &lt;strong&gt;pipeline&lt;/strong&gt; with the Continuous Deployment practices enabled by &lt;strong&gt;GitOps&lt;/strong&gt;, so that the entire workflow goes directly from committing the application code to a Continuous Deployment in a development environment, also managing &lt;strong&gt;multiple Kubernetes environments&lt;/strong&gt; where you can release your code at different moments in time. Or, as I like to call it: &lt;strong&gt;Continuous Delivery for the rest of us&lt;/strong&gt; 🤓&lt;/p&gt;

&lt;h2&gt;
  
  
  A brief definition of GitOps
&lt;/h2&gt;

&lt;p&gt;If you never heard of GitOps before, it is a way of implementing Continuous Deployment for cloud-native applications.&lt;/p&gt;

&lt;p&gt;The term "&lt;strong&gt;GitOps&lt;/strong&gt;" refers to the use of &lt;a href="https://git-scm.com/" rel="noopener noreferrer"&gt;Git&lt;/a&gt; as a single source of truth for &lt;strong&gt;declarative infrastructure&lt;/strong&gt; and application code in a continuous deployment workflow; it reflects the central role that Git plays in this approach to Continuous Deployment. By using Git as the foundation for their deployment process, teams can leverage the power and flexibility of Git to manage and deploy their applications and infrastructure in a reliable and scalable way.&lt;/p&gt;

&lt;p&gt;In a GitOps &lt;strong&gt;workflow&lt;/strong&gt;, developers commit code changes to a Git repository, and &lt;strong&gt;automated processes pull those changes&lt;/strong&gt; and deploy them in a reliable and repeatable manner. This approach enables teams to deploy applications and infrastructure changes with confidence, as the entire deployment process is &lt;strong&gt;version controlled and auditable&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the following I will explain the details of the various steps, starting from the end: it may seem counter-intuitive, but I have the feeling it may be more useful to start from the final goal and go backwards to "how to get there".&lt;/p&gt;

&lt;h2&gt;
  
  
  Argo CD: the GitOps tool
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://argo-cd.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;Argo CD&lt;/a&gt; is a Continuous Deployment tool for Kubernetes. It helps developers and operations teams &lt;strong&gt;automate&lt;/strong&gt; the deployment of applications to Kubernetes clusters.&lt;/p&gt;

&lt;p&gt;Here's how it works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;You define your application's desired state in a &lt;strong&gt;declarative configuration file&lt;/strong&gt;, usually written in the Kubernetes resource manifest format (e.g., YAML).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You commit this configuration file to a &lt;strong&gt;Git repository&lt;/strong&gt;, which serves as the source of truth for your application's desired state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Argo CD &lt;strong&gt;monitors&lt;/strong&gt; the Git repository for changes to the configuration file. When it detects a change, it &lt;strong&gt;synchronizes&lt;/strong&gt; the desired state of the application with the actual state of the application in the cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the actual state of the application differs from the desired state, Argo CD will apply the necessary &lt;strong&gt;changes&lt;/strong&gt; to bring the application back into alignment. This includes creating, updating, or deleting resources in the cluster as needed.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I will not cover the details of the Argo CD installation or configuration procedure here; you can easily find &lt;a href="https://www.eksworkshop.com/intermediate/290_argocd/install/" rel="noopener noreferrer"&gt;many&lt;/a&gt; &lt;a href="https://argo-cd.readthedocs.io/en/stable/operator-manual/installation/" rel="noopener noreferrer"&gt;guides&lt;/a&gt; on this.&lt;/p&gt;

&lt;p&gt;In this discussion, I will use a configuration consisting of a single cluster, with Argo CD installed in a dedicated namespace, and three environments, &lt;em&gt;develop&lt;/em&gt;, &lt;em&gt;staging&lt;/em&gt; and &lt;em&gt;production&lt;/em&gt;, installed in as many namespaces. Depending on your level of experience and the needs of your use case, your topology may vary.&lt;/p&gt;

&lt;p&gt;Argo CD is itself configured via a dedicated Git repository and a pipeline that performs configuration synchronization. These configurations, specifically, include three very important pieces of information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;the &lt;strong&gt;repository&lt;/strong&gt; containing the Kubernetes configurations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the &lt;strong&gt;directory&lt;/strong&gt; within that repository (we'll learn more in the Kustomize chapter)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;the repository &lt;strong&gt;branch or tag&lt;/strong&gt; to use, which will be the only information to be updated when a release needs to be delivered in an environment (we'll learn more in the Release Captain chapter)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So for example, a configuration for &lt;em&gt;develop&lt;/em&gt; environment (my environment is an "application" in Argo CD terms) can be like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Application&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;develop&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;develop&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/mysupercoolk8srepository&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;develop&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://kubernetes.default.svc&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;develop&lt;/span&gt;
  &lt;span class="na"&gt;syncPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;automated&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;prune&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;selfHeal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;syncOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PrunePropagationPolicy=foreground&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PruneLast=true&lt;/span&gt;
    &lt;span class="na"&gt;retry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;backoff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
        &lt;span class="na"&gt;factor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
        &lt;span class="na"&gt;maxDuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here you can see the three pieces of information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;    
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/mysupercoolk8srepository&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlays/develop&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;targetRevision&lt;/code&gt; is, in my case, the &lt;em&gt;main&lt;/em&gt; branch of the Kubernetes YAML files repository, where the integration pipeline of each microservice pushes its updated container image tag (corresponding to commit hash).&lt;/p&gt;

&lt;p&gt;In my topology, for staging and production environments I have the very same configuration, except for &lt;code&gt;targetRevision&lt;/code&gt; and &lt;code&gt;path&lt;/code&gt; properties. These configurations are saved, as I mentioned, in a dedicated repository, with a corresponding very simple pipeline that runs &lt;code&gt;kubectl -f apply&lt;/code&gt; to the cluster when a commit is made.&lt;/p&gt;

&lt;p&gt;As for the &lt;em&gt;develop&lt;/em&gt; environment, its configuration on ArgoCD will always point to the default branch (&lt;em&gt;main&lt;/em&gt; in this case) used by the integration pipeline to update container image tags. In this way, a continuous deployment approach is used in this environment, as the new version of an image is updated as soon as it is available.&lt;/p&gt;

&lt;p&gt;About the other environments, however, I explained before why I've decided to maintain a more conservative approach and make releases in a controlled manner, applying continuous delivery. Therefore, the configuration of these environments on ArgoCD will point to a &lt;strong&gt;specific tag&lt;/strong&gt; applied to the commit on the default branch when an application version, as a whole, is considered ready to be promoted to the next environment (as we'll see in the Release Captain chapter).&lt;/p&gt;

&lt;p&gt;So for example, the &lt;em&gt;staging&lt;/em&gt; environment is configured as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;    
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/mysupercoolk8srepository&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;release/2.7.0&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlays/staging&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and similarly for the &lt;em&gt;production&lt;/em&gt; environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;    
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/mysupercoolk8srepository&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;release/2.6.3&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;overlays/production&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;ArgoCD &lt;a href="https://argo-cd.readthedocs.io/en/stable/user-guide/tracking_strategies/#git" rel="noopener noreferrer"&gt;supports three types of information&lt;/a&gt; as &lt;code&gt;targetRevision&lt;/code&gt;: a &lt;strong&gt;tag&lt;/strong&gt;, a &lt;strong&gt;branch&lt;/strong&gt;, or a &lt;strong&gt;commit&lt;/strong&gt;. Using the commit gives the greatest assurance of &lt;strong&gt;immutability&lt;/strong&gt;; however, it becomes difficult to &lt;strong&gt;track&lt;/strong&gt; releases, which is instead made easier by a release branch or tag. Both of these, however, are not immutable; so it is important for the team to be &lt;strong&gt;disciplined&lt;/strong&gt; and respect the process, i.e., not make any changes to tags or release branches. At the end of the day, the choice is up to you and what's better for your team.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Kustomize: managing multiple environments without duplicating code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kustomize.io/" rel="noopener noreferrer"&gt;Kustomize&lt;/a&gt; is a tool that allows developers to customize and deploy their Kubernetes applications, creating &lt;strong&gt;customized versions of their applications&lt;/strong&gt; by modifying and extending existing resources, without having to write new YAML files from scratch. This can be useful in a variety of scenarios, such as creating &lt;strong&gt;different environments&lt;/strong&gt; (e.g. staging, production), or deploying the same application to different clusters with &lt;strong&gt;slight variations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To use Kustomize, you create a &lt;strong&gt;base&lt;/strong&gt; directory containing your Kubernetes resources and then create one or more &lt;strong&gt;overlays&lt;/strong&gt; that contain the customizations you want to apply. Kustomize then &lt;strong&gt;merges&lt;/strong&gt; the overlays with the base resources to generate the final, customized resources that can be deployed to your cluster. You can find more information about Kustomize logic and syntax &lt;a href="https://kubectl.docs.kubernetes.io/guides/introduction/kustomize/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How is Kustomize configured in my use case? My filesystem structure for the Kubernetes files repository using Kustomize is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
|-- base
|   |-- microservice1
|   |   |-- deployment.yaml
|   |   |-- kustomization.yaml
|   |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; service.yaml
|   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; microservice2
|       |-- deployment.yaml
|       |-- kustomization.yaml
|       &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; service.yaml
&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; overlays
    |-- develop
    |   |-- kustomization.yaml
    |   |-- microservice1
    |   |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
    |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; microservice2
    |       &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
    |-- production
    |   |-- kustomization.yaml
    |   |-- microservice1
    |   |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
    |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; microservice2
    |       &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
    &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; staging
        |-- kustomization.yaml
        |-- microservice1
        |   &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
        &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; microservice2
            &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--&lt;/span&gt; deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's say that I have a microservice2 &lt;code&gt;deployment.yaml&lt;/code&gt; like this (some properties are hidden, just for brevity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat base/microservice2/deployment.yaml&lt;/span&gt; 
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;123456789012.dkr.ecr.eu-west-1.amazonaws.com/my-super-ms&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can notice that I did not put any container &lt;strong&gt;image tag&lt;/strong&gt; here. That's because it is a piece of information that will come &lt;strong&gt;from the build pipeline&lt;/strong&gt; when an image is actually built.&lt;/p&gt;

&lt;p&gt;To better understand this concept, let's see the corresponding &lt;code&gt;kustomization.yaml&lt;/code&gt; file in the base directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat base/microservice2/kustomization.yaml&lt;/span&gt; 
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.config.k8s.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;commonLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;service.yaml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;deployment.yaml&lt;/span&gt;
&lt;span class="na"&gt;images&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;123456789012.dkr.ecr.eu-west-1.amazonaws.com/my-super-ms&lt;/span&gt;
  &lt;span class="na"&gt;newTag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;6ce74723&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What are those last three lines? Well, those are the &lt;strong&gt;changes made by the build pipeline&lt;/strong&gt; when a new container image is created (as we'll see in the next chapter). In the initial version of this file, when I created it, I didn't include them, but merely indicated which files inside the directory to consider. So &lt;strong&gt;this change is exactly what the build pipeline does&lt;/strong&gt; as its last action.&lt;/p&gt;

&lt;p&gt;What about the &lt;strong&gt;overlays&lt;/strong&gt;? As I said before, a Kustomize overlay is a directory that contains customizations that you want to apply to your Kubernetes resources. It is called an "overlay" because it is layered on top of a base directory containing your base resources.&lt;/p&gt;

&lt;p&gt;An overlay directory typically contains one or more Kubernetes resource files, as well as a Kustomization file. The resource files in the overlay directory contain the customizations that you want to apply to your resources, such as &lt;strong&gt;changing the number of replicas&lt;/strong&gt; for a deployment, adding a label to a pod, or maybe having &lt;strong&gt;different ConfigMap contents&lt;/strong&gt; because some parameters differ among environments. The Kustomization file is a configuration file that specifies how the customizations in the overlay should be applied to the base resources.&lt;/p&gt;

&lt;p&gt;Let's see a simple example: as seen before I've set a &lt;code&gt;replicas: 1&lt;/code&gt; spec in my &lt;code&gt;deployment.yaml&lt;/code&gt;, but let's say I want to change this property in the &lt;em&gt;staging&lt;/em&gt; environment to test HA.&lt;/p&gt;

&lt;p&gt;My overlay configuration will be like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat overlays/staging/microservice2/deployment.yaml&lt;/span&gt; 
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-super-ms&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it, this is my &lt;strong&gt;complete&lt;/strong&gt; file: I don't need to replicate my entire Deployment. I just put different values to the parameters I'd like to change.&lt;/p&gt;

&lt;p&gt;What about my overlay Kustomize file? It just needs to know which files have to be merged. In my case it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat overlays/staging/kustomization.yaml&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;                
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.config.k8s.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;

&lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;

&lt;span class="na"&gt;bases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;../../base/microservice1&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;../../base/microservice2&lt;/span&gt;

&lt;span class="na"&gt;patchesStrategicMerge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;microservice1/deployment.yaml&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;microservice2/deployment.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;For this YAML code, a &lt;strong&gt;pipeline&lt;/strong&gt; is perhaps not strictly necessary for the process to work, but I &lt;strong&gt;recommend&lt;/strong&gt; that one should be provided.&lt;/p&gt;

&lt;p&gt;Specifically, a pipeline should be run every time a &lt;strong&gt;pull request&lt;/strong&gt; is opened, and it should check the code for errors and security bugs; you can use tools such as &lt;a href="https://www.checkov.io/" rel="noopener noreferrer"&gt;Checkov&lt;/a&gt; or similar.&lt;/p&gt;

&lt;p&gt;In fact, in this case, the default branch should be "armoured" and &lt;strong&gt;no one should push directly to it, except the build pipeline&lt;/strong&gt;. A developer who intends to make additions or changes to Kubernetes files should commit them to a &lt;strong&gt;temporary branch&lt;/strong&gt; (NOT a release branch) and open a pull request, triggering the pipeline execution, which, in the end, can accept the pull request and merge the code, thus making it immediately available to the &lt;em&gt;develop&lt;/em&gt; environment (which, remember, is configured via Argo CD to be constantly aligned to the default branch).&lt;/p&gt;

&lt;p&gt;The build pipeline in my opinion can instead write its changes directly to the &lt;em&gt;main&lt;/em&gt; branch since the only detail it is going to change is the container image tag and there is no point in performing checks on Kubernetes files with such changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Building bridges
&lt;/h2&gt;

&lt;p&gt;Going backwards, we finally arrived at the starting point: the &lt;strong&gt;build pipeline&lt;/strong&gt;. As I said before, I will not address here what a Continuous Integration pipeline is - there are plenty of examples and explanations on the Web, and I assume that, if you've got so far, you probably already know it. For our process, what matters is that this pipeline, after pushing the new container image to the registry, performs an update on the Kubernetes file repository to communicate the new tag.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Although I used AWS in my design to illustrate this approach, the process is usable with &lt;strong&gt;any CI/CD platform&lt;/strong&gt; and wherever Kubernetes is hosted. I used this approach in different projects, with AWS Code Suite and EKS as well as Gitlab or Bitbucket and Rancher; technicalities don't matter, what really matters is applying a &lt;strong&gt;structured process&lt;/strong&gt;, whatever software products you choose to use and constraints you happen to have.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my example, using &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/getting-started.html" rel="noopener noreferrer"&gt;CodeBuild&lt;/a&gt; as executor and &lt;a href="https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html" rel="noopener noreferrer"&gt;CodeCommit&lt;/a&gt; as a repository, this last stage is run by this &lt;a href="https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html" rel="noopener noreferrer"&gt;&lt;code&gt;buildspec.yaml&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.2&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;git-credential-helper&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yes&lt;/span&gt;

&lt;span class="na"&gt;phases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pre_build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;TAG=`echo $CODEBUILD_RESOLVED_SOURCE_VERSION | head -c 8`&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;REPOSITORY_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$TAG&lt;/span&gt;

  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cd $CODEBUILD_SRC_DIR_k8s_repo&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cd base/$IMAGE_REPO_NAME&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;kustomize edit set image $REPOSITORY_URI&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;git config --global user.email "noreply@codebuild.codepipeline"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;git config --global user.name "CodeBuild"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;git commit -am "updated image $IMAGE_REPO_NAME with tag $TAG"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;git push origin HEAD:main&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;I want to emphasize that the update of the image tag is done in the &lt;em&gt;base&lt;/em&gt; directory of the Kubernetes repository, and &lt;strong&gt;not in the overlays&lt;/strong&gt;: the management of different versions of the application in different environments is done with release branches, as we see in the next chapter. The overlays are only meant to allow for small differences in configurations, not versions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To summarize, the whole process works as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A developer makes &lt;strong&gt;changes&lt;/strong&gt; to an application and pushes a new version of the software to a Git repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A continuous integration &lt;strong&gt;pipeline&lt;/strong&gt; is triggered, which results in a new container image being saved to a &lt;strong&gt;registry&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The last step of the integration pipeline changes the Kubernetes manifests hosted in a dedicated Git repository, automatically updating the specific image with the &lt;strong&gt;newly created tag&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ArgoCD constantly &lt;strong&gt;compares&lt;/strong&gt; the application state with the current state of the Kubernetes cluster. Then, it applies the necessary changes to the cluster configuration; Kubernetes uses its controllers to reconcile the changes required to the cluster resources until the desired configuration is reached.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All this works seamlessly in the development environment. But &lt;strong&gt;how do you make delivery in environments that need to be updated at different times?&lt;/strong&gt; This is where the important role of the Release Captain comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Release Captain👩‍✈️: coordinating releases
&lt;/h2&gt;

&lt;p&gt;A so-called "Release Captain" is a &lt;strong&gt;role&lt;/strong&gt; within a software development team that is responsible for coordinating the release of code from &lt;em&gt;develop&lt;/em&gt; to &lt;em&gt;production&lt;/em&gt; (and every other environment in between).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In teams capable of doing Continuous Deployment directly to production, this role is played entirely by one or more pipelines that execute extensive tests, automatically open and approve merge requests, and tag commits properly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ideally, every team member should be able to act as a Release Captain on a rotating basis. What is the task of the Release Captain within our GitOps process? It is a relevant task, but fortunately not too onerous.&lt;/p&gt;

&lt;p&gt;Let's say that, up to a certain point, development has been focused on the &lt;em&gt;develop&lt;/em&gt; environment. At some point, a release in the &lt;em&gt;staging&lt;/em&gt; environment must finally be established and scheduled. This is where the release captain comes in: he assigns a &lt;strong&gt;release number and tags the commit&lt;/strong&gt; in the Kubernetes file repository default branch. Once this tag is created, the release captain will &lt;strong&gt;modify the configuration&lt;/strong&gt; of the &lt;em&gt;staging&lt;/em&gt; environment in the Argo CD repository, changing &lt;code&gt;targetRevision&lt;/code&gt; by replacing its previous value with the new tag. Once this change is pushed, the triggered pipeline will execute the configuration change directly to Argo CD, which, in turn, will &lt;strong&gt;synchronize&lt;/strong&gt; with the contents of the new tag, effectively deploying the new release.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This approach treats microservice applications as a single block to be released &lt;strong&gt;all at once&lt;/strong&gt; in a given release. This may seem superfluous in many circumstances, especially if there are only a few microservices, but, in my opinion, it is important for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;if acceptance testing is done on a given release, you are assured of passing to production &lt;strong&gt;exactly the same versions&lt;/strong&gt; of all microservices that have been certified as inter-working. In other words, if I have certified that microservice A in version 1.2.3 works with microservice B in version 4.5.6, at the time of promotion to the next environment I need to be sure that I release exactly the same versions &lt;strong&gt;together&lt;/strong&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;a thing is often underestimated from a development point of view but very problematic from an operation point of view: the &lt;strong&gt;rollback process&lt;/strong&gt;. In case of problems, rolling back to the previous version, and returning &lt;code&gt;targetRevision&lt;/code&gt; to the previous value, is extremely quick and safe and saves a lot of headaches.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it! For each release in the &lt;em&gt;staging&lt;/em&gt; environment, simply repeat this process. For releases in the &lt;em&gt;production&lt;/em&gt; environment, it is even simpler: once a release has been tested and judged suitable for deployment to production, there is no tag to be created: you use the same tag that has already been tested, and the only action to take is to &lt;strong&gt;edit the production environment configuration file&lt;/strong&gt; in the Argo CD repository.&lt;/p&gt;

&lt;p&gt;The following drawing summarizes the entire workflow, starting from the push of application code and ending with deployment to the various environments in the Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cx727ochy7na4fqd9tz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cx727ochy7na4fqd9tz.png" alt="gitops-workflow" width="800" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Several aspects of this process can be slightly modified to meet the needs of the team; however, I have found it effective in even different situations, especially, as I mentioned, for projects within large companies that have constraints and yet do not want to give up the benefits of automation.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>A Year of Growth and Impact: Reflections on 2022 as a Woman in Tech</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Tue, 27 Dec 2022 10:30:16 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/a-year-of-growth-and-impact-reflections-on-2022-as-a-woman-in-tech-5go</link>
      <guid>https://dev.to/monica_colangelo/a-year-of-growth-and-impact-reflections-on-2022-as-a-woman-in-tech-5go</guid>
      <description>&lt;p&gt;As the year 2022 comes to a close, it's time to reflect on the accomplishments and successes of the past year. As a &lt;strong&gt;female&lt;/strong&gt; voice in the tech industry, I have made it my mission to inspire, educate, and empower others through my blog and newsletter.&lt;/p&gt;

&lt;p&gt;Throughout the year, I have written a number of &lt;strong&gt;technical articles&lt;/strong&gt; that have received a great deal of positive feedback. These articles have aimed to demystify &lt;strong&gt;complex topics&lt;/strong&gt; and make them accessible to a wider audience.&lt;/p&gt;

&lt;p&gt;In addition to sharing my knowledge through my writing, I have also launched a &lt;a href="https://letsmakecloud.beehiiv.com/subscribe" rel="noopener noreferrer"&gt;&lt;strong&gt;newsletter&lt;/strong&gt;&lt;/a&gt; to further disseminate my ideas and insights. This has allowed me to reach an even larger audience and help more people learn about the exciting world of technology.&lt;/p&gt;

&lt;p&gt;As a woman in tech, I understand the importance of &lt;strong&gt;representation&lt;/strong&gt; and &lt;strong&gt;diversity&lt;/strong&gt; in the industry. It's vital that all voices are &lt;strong&gt;heard&lt;/strong&gt; and that everyone has the &lt;strong&gt;opportunity&lt;/strong&gt; to learn and grow. That's why I strive to be a positive role model and to use my platform to &lt;strong&gt;inspire&lt;/strong&gt; and &lt;strong&gt;empower&lt;/strong&gt; others, particularly other &lt;strong&gt;women&lt;/strong&gt; and &lt;strong&gt;girls&lt;/strong&gt; interested in tech.&lt;/p&gt;

&lt;p&gt;Overall, the year 2022 has been a fulfilling and successful one, and I am grateful for the opportunity to &lt;strong&gt;share&lt;/strong&gt; my knowledge and insights with the world. I look forward to continuing this work in the coming year and beyond, and to helping make the tech industry a more &lt;strong&gt;inclusive&lt;/strong&gt; and &lt;strong&gt;equitable&lt;/strong&gt; place for all.&lt;/p&gt;

</description>
      <category>security</category>
    </item>
    <item>
      <title>4 ultimate reasons to prefer AWS CDK over Terraform</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Mon, 05 Dec 2022 14:38:29 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/4-ultimate-reasons-to-prefer-aws-cdk-over-terraform-34pf</link>
      <guid>https://dev.to/monica_colangelo/4-ultimate-reasons-to-prefer-aws-cdk-over-terraform-34pf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;There is an Italian version of this article; if you'd like to read it &lt;a href="https://letsmake.cloud/4-motivi-fondamentali-per-preferire-aws-cdk-a-terraform"&gt;click here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Over the past few months I have been using &lt;a href="https://aws.amazon.com/cdk/"&gt;AWS CDK&lt;/a&gt; for some projects, and every time I started talking about it, someone would ask: why should I abandon the tool I am using and switch to CDK? What advantages does it offer?&lt;/p&gt;

&lt;p&gt;I will not dwell on implementation details in this post; there are many useful resources to be found online, from &lt;a href="https://cdkworkshop.com/"&gt;tutorials for beginners&lt;/a&gt; to very advanced articles.&lt;/p&gt;

&lt;p&gt;Instead, I want to summarise what I consider to be very interesting features of the framework.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am a passionate advocate of Infrastructure as Code and have been using it extensively since the earliest versions of the tools that have become established leaders in this field today. What you learn with experience is that there is no such thing as the perfect tool that solves every problem or that fits all occasions; there are tools that are adapted to many different situations, or that are selected for certain specific characteristics of the company you work for, its processes, the risks you accept to face, the problems you take on, and so on.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In order to explain the advantages (and limitations) I have found in CDK, it is necessary to take a step back and recall the characteristics of some of the most widely used Infrastructure as Code tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloudformation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/cloudformation/"&gt;Cloudformation&lt;/a&gt; is the Infrastructure as Code service of AWS. It has been active since 2011 (it seems like yesterday, but in the cloud era we are talking about geological eras before that), free of charge, and uses descriptive languages such as JSON and YAML (the latter as of 2016, to the relief of many) to create templates in which the resources to be created on AWS are defined. These templates are processed by the Cloudformation service, which creates the resources as described. If we want to change our infrastructure, we simply re-execute the modified template.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantages
&lt;/h3&gt;

&lt;p&gt;The unbeatable advantage of Cloudformation is the &lt;strong&gt;automatic rollback management&lt;/strong&gt;. If my template contains errors, Cloudformation stops the infrastructure update action and automatically &lt;strong&gt;returns&lt;/strong&gt; to the previous state, i.e. to the last 'working' version of my template.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limits
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Over the years, Cloudformation has undergone many evolutions, introduced features, cross-account usage and more... and yet, nobody loves it. At most, it is tolerated. Why? Because of the languages it uses. JSON and YAML are essentially data serialisation formats and work well with machines... less well with humans. They are certainly easy to read, but extremely tedious to write. Since they are not programming languages, there are no practical (as well as basic) mechanisms such as loops for repetitive operations: if I need to create 10 security groups, I have to list them all, one by one, without fail. If you have ever used Cloudformation, you know what I am talking about.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It works exclusively on AWS.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Terraform
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt; is an open-source tool from Hashicorp for Infrastructure as Code, initially released in 2014. It uses the declarative HashiCorp Configuration Language (&lt;strong&gt;HCL&lt;/strong&gt;), which from the earliest releases immediately seemed friendlier to the writing of infrastructure. Once a user invokes Terraform on a given resource, Terraform performs CRUD actions via the cloud provider's API to obtain the desired state. The code can be factored into modules, promoting reusability and maintainability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantages
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Terraform manages external resources with 'providers'. Users can interact with Terraform providers by declaring resources or using data sources; there are many providers maintained by both Hashicorp and the community, and AWS is one of them. The first advantage is therefore that it is a cross-platform tool.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As the HCL language has evolved over the years, Terraform allows the use of several constructs that function as loops in order to shorten the repetitive writing of similar resources. For example, one of the most common constructs is to cycle through a list:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_ecr_repository" "ecr_repo" {
  count                = length(local.repo_list)
  name                 = local.repo_list[count.index]
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but with the latest versions of Terraform, it is possible to use more complex constructs, such as extracting keys and values from a &lt;strong&gt;map&lt;/strong&gt; to be used as required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
  dynamic "predicates" {
    for_each = [for k, v in each.value["sets"] : {
      set = v
    } if contains(keys(aws_waf_ipset.waf_ipset), v)]
    content {
      data_id = aws_waf_ipset.waf_ipset[predicates.value.set].id
      negated = false
      type    = "IPMatch"
    }
  }
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Probably not the clearest code in the world, though still better than the endless lists of attributes in Cloudformation...&lt;/p&gt;

&lt;h3&gt;
  
  
  Limits
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The infamous &lt;strong&gt;state file&lt;/strong&gt;! Terraform saves the state of the infrastructure in a JSON file that is generated at each execution. Keeping this file is extremely important because it is the "source of truth"; in fact, Terraform consults this file before each execution to establish the discrepancy between the desired state (i.e. the code we want to execute) and the current state, and from this comparison decides what action to take to close the gap. If the state file is lost, Terraform is unable to realise that part of the infrastructure had already been created previously and will want to start from scratch.&lt;/p&gt;

&lt;p&gt;Furthermore, keeping all the code of a very large infrastructure together is a bad practice, for several reasons: operational risk, shared management, handovers, and general maintainability of the code. Typically, each infrastructure 'stack' is created with blocks of code executed separately: this means that each stack will have its own state file, and consequently the preservation of these state files, in the long run, with large teams and very large infrastructures, becomes a very important and delicate issue.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No rollback management&lt;/strong&gt;. The CRUD operations performed by Terraform, as I mentioned earlier, are sequential calls to the cloud provider's API; if for some reason in mid-execution a call fails... Terraform stops and leaves it to the user to put back the changes left in the middle. Not the best way to behave, especially in production environments.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  CDK
&lt;/h2&gt;

&lt;p&gt;OK, finally to the point: what are the characteristics of CDK that make it preferable to the instruments just mentioned? Personally, I see at least four! Let's look at them in order of importance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantage #1: Rollback
&lt;/h3&gt;

&lt;p&gt;CDK is a framework that, when executed, "synthesises" a Cloudformation template and then applies it. Consequently, it inherits all the positive features of Cloudformation, and, in particular, the ability to &lt;strong&gt;automatically roll back&lt;/strong&gt; to the previous state. This is a very important feature in my opinion, especially when making changes to previously created stacks, especially in a production environment. Rollback is a step too often underestimated... until something goes wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantage #2: No state file
&lt;/h3&gt;

&lt;p&gt;As I said, being Cloudformation templates synthesised by the framework, the management of the state of the infrastructure is left to Cloudformation itself, and there are no state files to manage. In addition, it is much easier to consult the status of resources from the same console as the AWS account. Given the risks I listed earlier regarding state file management, this is no small advantage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantage #3: Friendly/familiar programming language
&lt;/h3&gt;

&lt;p&gt;AWS CDK is available for the most popular languages: TypeScript, Python, Java, .NET, and Go. There are no particular differences between these implementations: the choice can be based solely on the user's familiarity with one language or another. In my case, I used Python and my experience was pleasantly simple and smooth, thanks also to extremely comprehensive documentation and support for the main IDEs.&lt;/p&gt;

&lt;p&gt;The use of an actual programming language also has the considerable advantage of being able to perform any type of operation not necessarily linked to CDK, such as requests to external APIs to retrieve information or notifications, manipulation of strings, files, JSON and so on... the limit is your imagination!&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantage #4: Automatic generation of IAM policies
&lt;/h3&gt;

&lt;p&gt;Finally, there is hardly any need to write any IAM roles and policies. The framework, based on the relationships between the resources declared in the code, is able to automatically calculate the necessary permissions and create roles and policies itself, following the principle of only assigning strictly necessary permissions.&lt;/p&gt;

&lt;p&gt;This is by no means a trivial advantage, considering that this mechanism ensures that you do not forget any permissions and, above all, avoid assigning more permissions than you need, either by mistake or out of haste.&lt;/p&gt;

&lt;p&gt;Of course, it is always possible to add permissions that the framework is unable to calculate. For example, it may happen that a Lambda function is created that internally makes API calls to AWS services, in which case the Lambda code is not part of the CDK code and is therefore excluded from the 'calculation'. The permissions required by the function for its calls must therefore be added to the role that the CDK automatically creates.&lt;/p&gt;

&lt;p&gt;In addition to the advantage from the point of view of security, there is also the enormous time-saving in the development of infrastructure code. An example? The creation of a CodePipeline resource with its CodeCommit repository and CodeBuild stage required me to write about 500 lines of Terraform code; in CDK, the IAM part is about ten lines. Impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final considerations
&lt;/h2&gt;

&lt;p&gt;AWS CDK is a tool that solves the problems of Cloudformation without losing its positive features, adding further advantages over other tools. Its greatest limitation, however, is in the fact that it can, of course, only be used on AWS.&lt;/p&gt;

&lt;p&gt;There are other tools that use programming languages for writing infrastructure code, and which are available for use on other cloud providers: for example, &lt;a href="https://www.pulumi.com/"&gt;Pulumi&lt;/a&gt; or &lt;a href="https://developer.hashicorp.com/terraform/cdktf"&gt;cdktf&lt;/a&gt;. However, these tools do not have the same advantages, as they still use API calls (so there is no rollback) and save the state of the infrastructure in special files that have to be managed.&lt;/p&gt;

&lt;p&gt;The persistence of these limitations has always put me off the idea of changing Infrastructure as Code tools because the change of habits, paradigm and especially code base seemed not worth it. AWS CDK, on the other hand, has such advantages that I would seriously consider abandoning other tools.&lt;/p&gt;

&lt;p&gt;And what do you think? Have you tried AWS CDK? Would you consider switching tools in light of the advantages? Let me know in the comments!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cloud</category>
      <category>programming</category>
    </item>
    <item>
      <title>The only newsletter you’ll ever need to read</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Tue, 22 Nov 2022 17:36:51 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/the-only-newsletter-youll-ever-need-to-read-2pom</link>
      <guid>https://dev.to/monica_colangelo/the-only-newsletter-youll-ever-need-to-read-2pom</guid>
      <description>&lt;p&gt;Okay, maybe I was a bit bold in the title... but maybe not, and it's up to you!&lt;/p&gt;

&lt;p&gt;I had been thinking about creating my own &lt;strong&gt;newsletter&lt;/strong&gt; for a while, but couldn't make up my mind. &lt;/p&gt;

&lt;p&gt;Then I joined &lt;strong&gt;Mastodon&lt;/strong&gt; recently (you can follow me here: &lt;a href="https://hachyderm.io/@monica"&gt;https://hachyderm.io/@monica&lt;/a&gt;). And now that I am experiencing a new, much less toxic social, the idea of creating connections and community has finally made me decide! &lt;/p&gt;

&lt;p&gt;My idea is to periodically share some interesting readings about Cloud, DevOps, and Architecture that I find online (of course, on Dev.to too!). Nothing for sale, just knowledge sharing. &lt;/p&gt;

&lt;p&gt;If you like, you can also share with me some brilliant content you may find online, and you could see it in the newsletter itself.&lt;/p&gt;

&lt;p&gt;If you like the idea you're very welcome! &lt;/p&gt;

&lt;p&gt;Please visit &lt;a href="https://letsmake.cloud/newsletter-subscription"&gt;my newsletter subscription page&lt;/a&gt; and let's start making cloud together!&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>devops</category>
      <category>aws</category>
      <category>newsletter</category>
    </item>
    <item>
      <title>Including an existing virtual machine in a CI/CD pipeline</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Sat, 20 Aug 2022 15:51:00 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/include-an-existing-virtual-machine-in-a-cicd-pipeline-52do</link>
      <guid>https://dev.to/monica_colangelo/include-an-existing-virtual-machine-in-a-cicd-pipeline-52do</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published at &lt;a href="https://letsmake.cloud/pipeline-with-vm-step"&gt;https://letsmake.cloud&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this article, we will see how to execute one or more steps of a CI/CD pipeline directly on a "traditional" virtual machine.&lt;/p&gt;

&lt;p&gt;Think of legacy and/or proprietary applications, with license or support &lt;strong&gt;constraints&lt;/strong&gt;, or that you cannot or do not want to re-engineer for any other reason, but which are necessary to perform specific tests or analyses with particular software: for example, scans with software belonging to the security team that prefers to centralize information in a hybrid environment, running simulations with software such as &lt;a href="https://www.mathworks.com/help/simulink/"&gt;MATLAB and Simulink&lt;/a&gt; installed centrally for cross-team use, and so on.&lt;/p&gt;

&lt;p&gt;The constraint of using these software does not mean that modern DevOps methodologies, such as CI/CD pipelines, cannot be used for code development. As we will see, a pipeline can include a step in which the execution of commands or scripts takes place directly &lt;strong&gt;on a virtual machine&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Solution architecture
&lt;/h1&gt;

&lt;p&gt;In today's use case, the virtual machine is an AWS EC2 Windows; the goal is to run some commands on the EC2 every time my code is modified and pushed into a Git repository.&lt;/p&gt;

&lt;p&gt;It is worth mentioning, however, that the &lt;a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-managedinstances.html"&gt;AWS Systems Manager Agent can also be installed on on-premise or hosted elsewhere machines&lt;/a&gt;. This solution is therefore extendable to many applications even if they are not hosted directly on AWS.&lt;/p&gt;

&lt;p&gt;Solution architecture details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a CodeCommit repository contains both the application code and a JSON file that includes information about the commands to be executed on the virtual machine;&lt;/li&gt;
&lt;li&gt;a push on this repository will trigger the execution of a CodePipeline, which in turn &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-step-functions.html"&gt;calls a StepFunction&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;the StepFunction initializes a workflow to execute a command specified in an SSM Document;&lt;/li&gt;
&lt;li&gt;the Document is "sent" from the StepFunction to the virtual machine through a Lambda function;&lt;/li&gt;
&lt;li&gt;a second Lambda, also controlled by the StepFunction, verifies the outcome of the execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X8uUYrc---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mz17x30c9jgndiszdonk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X8uUYrc---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mz17x30c9jgndiszdonk.png" alt="pipeline-stepfunction" width="880" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Prerequisites
&lt;/h1&gt;

&lt;p&gt;The virtual machine is already installed and configured with the software to run, and it has an Instance Profile with the necessary permissions to allow the SSM Document to run (and possibly to access the AWS services needed by the use case).&lt;/p&gt;

&lt;p&gt;In my case, the policies associated with the Instance Profile are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Managed Policy &lt;code&gt;AmazonSSMManagedInstanceCore&lt;/code&gt; (to be managed by SSM)&lt;/li&gt;
&lt;li&gt;AWS Managed Policy &lt;code&gt;AWSCodeCommitReadOnly&lt;/code&gt; (to access the code repository)&lt;/li&gt;
&lt;li&gt;custom policy to allow &lt;code&gt;s3:PutObject&lt;/code&gt; on the output bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  AWS Cloud Development Kit (AWS CDK)
&lt;/h1&gt;

&lt;p&gt;To create this architecture, I used &lt;a href="https://aws.amazon.com/cdk/"&gt;AWS CDK for Python&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;AWS CDK is an open-source software development framework for defining the AWS cloud infrastructure introduced in July 2019. Since AWS CDK uses CloudFormation as a foundation, it has all the benefits of CloudFormation by allowing you to provision cloud resources using modern programming languages such as Typescript, C#, Java and Python.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are not familiar with AWS CDK, you can follow a great &lt;a href="https://cdkworkshop.com/30-python.html"&gt;tutorial here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Using AWS CDK is also advantageous because it allows you to write less code than other "classic" Infrastructure as Code tools (in my example, my approximately 150 lines of Python code generate 740 CloudFormation YAML lines); in particular, many IAM roles and policies are &lt;strong&gt;deduced&lt;/strong&gt; directly from the framework without having to write them explicitly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can find the complete example &lt;a href="https://github.com/theonlymonica/pipeline-existing-vm-integration-examples"&gt;at this link&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  SSM Document
&lt;/h2&gt;

&lt;p&gt;To start developing my solution, I first create an SSM Document for my EC2 Windows, which is the script that needs to be run on the virtual machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schemaVersion: "2.2"
description: "Example document"
parameters:
  Message:
    type: "String"
    description: "Message to write"
  OutputBucket:
    type: "String"
    description: "Bucket to save output"
  CodeRepository:
    type: "String"
    description: "Git repository to clone"
mainSteps:
  - action: "aws:runPowerShellScript"
    name: "SampleStep"
    precondition:
      StringEquals:
        - platformType
        - Windows
    inputs:
      timeoutSeconds: "60"
      runCommand:
        - Import-Module AWSPowerShell
        - Write-Host "Create temp dir"
        - $tempdir=$(-join ((48..57) + (97..122) | Get-Random -Count 32 | % {[char]$_}))
        - New-item "$env:temp\$tempdir" -ItemType Directory
        - Write-Host "Cloning repository"
        - "git clone {{CodeRepository}} $tempdir"
        - $fname = $(((get-date).ToUniversalTime()).ToString("yyyyMMddTHHmmssZ"))
        - Write-Host "Writing file on S3"
        - "Write-S3Object -BucketName {{OutputBucket}} -Key ($fname + '.txt') -Content {{Message}}"
        - Write-Host "Removing temp dir"
        - Remove-Item -path $tempdir -Recurse -Force -EA SilentlyContinue
        - Write-Host "All done!"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example script has 3 parameters: the Git repository to clone, the message to write to the output file, and the S3 bucket to save that file; and then it uses these parameters with the commands to execute. Of course, this is a straightforward example, which can be modified as needed.&lt;/p&gt;

&lt;p&gt;I use this YAML file directly in my Python code to create the Document on AWS SSM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with open("ssm/windows.yml") as openFile:
    documentContent = yaml.load(openFile, Loader=yaml.FullLoader)
    cfn_document = ssm.CfnDocument(self, "MyCfnDocument",
        content=documentContent,
        document_format="YAML",
        document_type="Command",
        name="pipe-sfn-ec2Win-GitS3",
        update_method="NewVersion",
        target_type="/AWS::EC2::Instance"
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lambda
&lt;/h2&gt;

&lt;p&gt;I create the CodeCommit repository where I will save the application code, the S3 bucket to write the processing results, and then the two Lambdas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;repo = codecommit.Repository(self, "pipe-sfn-ec2Repo",
            repository_name="pipe-sfn-ec2-repo"
        )

output_bucket = s3.Bucket(self, 'ExecutionOutputBucket')

submit_lambda = _lambda.Function(self, 'submitLambda',
                    handler='lambda_function.lambda_handler',
                    runtime=_lambda.Runtime.PYTHON_3_9,
                    code=_lambda.Code.from_asset('lambdas/submit'),
                    environment={
                        "OUTPUT_BUCKET": output_bucket.bucket_name,
                        "SSM_DOCUMENT": cfn_document.name,
                        "CODE_REPOSITORY": repo.repository_clone_url_http
                        })

status_lambda = _lambda.Function(self, 'statusLambda',
                    handler='lambda_function.lambda_handler',
                    runtime=_lambda.Runtime.PYTHON_3_9,
                    code=_lambda.Code.from_asset('lambdas/status'))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the Lambda "submit" has 3 environment variables that will serve as parameters for the commands to be executed on the virtual machine. The Lambda code is also in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

ssm_client = boto3.client('ssm')

document_name = os.environ["SSM_DOCUMENT"]
output_bucket = os.environ["OUTPUT_BUCKET"]
code_repository = os.environ["CODE_REPOSITORY"]

def lambda_handler(event, context):
    logger.debug(event)

    instance_id = event["instance_id"]
    message = event["message"]

    response = ssm_client.send_command(
                InstanceIds=[instance_id],
                DocumentName=document_name,
                Parameters={
                    "Message": [message],
                    "OutputBucket": [output_bucket],
                    "CodeRepository": [code_repository]})

    logger.debug(response)

    command_id = response['Command']['CommandId']
    data = {
        "command_id": command_id, 
        "instance_id": instance_id
    }

    return data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This first Lambda "submit" output becomes the second Lambda "status" input: it checks the status of the just started execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

ssm_client = boto3.client('ssm')

def lambda_handler(event, context):

    instance_id = event['Payload']['instance_id']
    command_id = event['Payload']['command_id']

    logger.debug(instance_id)
    logger.debug(command_id)

    response = ssm_client.get_command_invocation(CommandId=command_id, InstanceId=instance_id)

    logger.debug(response)

    execution_status = response['StatusDetails']
    logger.debug(execution_status)

    if execution_status == "Success":
        return {"status": "SUCCEEDED", "event": event}
    elif execution_status in ('Pending', 'InProgress', 'Delayed'):
        data = {
            "command_id": command_id, 
            "instance_id": instance_id,
            "status": "RETRY", 
            "event": event
        }
        return data
    else:
        return {"status": "FAILED", "event": event}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Lambda "status" output will determine the StepFunction &lt;strong&gt;workflow&lt;/strong&gt;: if the SSM Document execution is completed, the StepFunction will terminate with a corresponding status (&lt;em&gt;Success&lt;/em&gt; or &lt;em&gt;Failed&lt;/em&gt;); instead, if the execution is still &lt;em&gt;in progress&lt;/em&gt;, the StepFunction will wait some time and then will re-execute the Lambda again for a new status check.&lt;/p&gt;

&lt;p&gt;I also need to grant the necessary permissions. The first Lambda must be able to launch the execution of the SSM Document on EC2; the second Lambda instead needs the permissions to consult the SSM executions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ec2_arn = Stack.of(self).format_arn(
    service="ec2",
    resource="instance",
    resource_name="*"
)

cfn_document_arn = Stack.of(self).format_arn(
    service="ssm",
    resource="document",
    resource_name=cfn_document.name
)

ssm_arn = Stack.of(self).format_arn(
    service="ssm",
    resource="*"
)

submit_lambda.add_to_role_policy(iam.PolicyStatement(
    resources=[cfn_document_arn, ec2_arn],
    actions=["ssm:SendCommand"]
))

status_lambda.add_to_role_policy(iam.PolicyStatement(
    resources=[ssm_arn],
    actions=["ssm:GetCommandInvocation"]
))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please note that these are the only permissions I have explicitly written in my code, as these are capabilities deriving from Lambdas' internal logic. All other permissions (for example, reading the CodeCommit repository, executing the StepFunction, triggering the CodePipeline, etc.) are &lt;strong&gt;implicitly inferred&lt;/strong&gt; from the CDK framework, greatly shortening the writing of my IaC code.&lt;/p&gt;

&lt;h2&gt;
  
  
  StepFunction
&lt;/h2&gt;

&lt;p&gt;The StepFunction workflow is shown in the following diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9y8qs3au--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a6h8j2nhfobcovjtg2a2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9y8qs3au--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a6h8j2nhfobcovjtg2a2.png" alt="stepfunction-graph" width="203" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is its definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;submit_job = _aws_stepfunctions_tasks.LambdaInvoke(
    self, "Submit Job",
    lambda_function=submit_lambda
)

wait_job = _aws_stepfunctions.Wait(
    self, "Wait 10 Seconds",
    time=_aws_stepfunctions.WaitTime.duration(
        Duration.seconds(10))
)

status_job = _aws_stepfunctions_tasks.LambdaInvoke(
    self, "Get Status",
    lambda_function=status_lambda
)

fail_job = _aws_stepfunctions.Fail(
    self, "Fail",
    cause='AWS SSM Job Failed',
    error='Status Job returned FAILED'
)

succeed_job = _aws_stepfunctions.Succeed(
    self, "Succeeded",
    comment='AWS SSM Job succeeded'
)

definition = submit_job.next(wait_job)\
    .next(status_job)\
    .next(_aws_stepfunctions.Choice(self, 'Job Complete?')
            .when(_aws_stepfunctions.Condition.string_equals('$.Payload.status', 'FAILED'), fail_job)
            .when(_aws_stepfunctions.Condition.string_equals('$.Payload.status', 'SUCCEEDED'), succeed_job)
            .otherwise(wait_job))

sfn = _aws_stepfunctions.StateMachine(
    self, "StateMachine",
    definition=definition,
    timeout=Duration.minutes(5)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CodePipeline
&lt;/h2&gt;

&lt;p&gt;In this example, for the sake of simplicity, I define my pipeline by creating only two steps, the source one and the StepFunction execution one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipeline = codepipeline.Pipeline(self, "pipe-sfn-ec2Pipeline",
    pipeline_name="pipe-sfn-ec2Pipeline",
    cross_account_keys=False
)

source_output = codepipeline.Artifact("SourceArtifact")

source_action = codepipeline_actions.CodeCommitSourceAction(
    action_name="CodeCommit",
    repository=repo,
    branch="main",
    output=source_output
)

step_function_action = codepipeline_actions.StepFunctionInvokeAction(
    action_name="Invoke",
    state_machine=sfn,
    state_machine_input=codepipeline_actions.StateMachineInput.file_path(source_output.at_path("abc.json"))
)

pipeline.add_stage(
    stage_name="Source",
    actions=[source_action]
)

pipeline.add_stage(
    stage_name="StepFunctions",
    actions=[step_function_action]
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I draw your attention to the state_machine_input definition: in this code, I have indicated that the StepFunction input parameters must be read from the abc.json file contained directly in the CodeCommit repository.&lt;/p&gt;

&lt;h1&gt;
  
  
  Execution
&lt;/h1&gt;

&lt;p&gt;To test the solution, push the abc.json file with the following content into the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "instance_id": "i-1234567890abcdef",
    "message": "aSampleMessage"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this way, the developer who writes his own code and has to execute his commands on the virtual machine can indicate both the machine and the execution parameters.&lt;/p&gt;

&lt;p&gt;That's all! Once pushed, the pipeline starts automatically, downloads the code from the repository and launches the StepFunction:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--BdTmJN4u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/38exp7jhrfdii0l8oyp2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--BdTmJN4u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/38exp7jhrfdii0l8oyp2.png" alt="codepipeline" width="854" height="1442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is possible to consult the flow of the StepFunction execution:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7Zh0UYMp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xzamt5qwtyhwnsihdnn2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7Zh0UYMp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xzamt5qwtyhwnsihdnn2.png" alt="stepfunction-list" width="880" height="519"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also consult the execution of the SSM Document:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TwFQwrAw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsoqzijs1ezfter78emh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TwFQwrAw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bsoqzijs1ezfter78emh.png" alt="ssm-document" width="880" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Considerations
&lt;/h1&gt;

&lt;p&gt;Having &lt;strong&gt;constraints&lt;/strong&gt;, imposed by organizational choices or by some kinds of software, is a very common situation, especially in large companies: this should not discourage the introduction of modern methodologies and technologies, because these technologies allow solutions for (almost) any &lt;strong&gt;integration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The introduction of a StepFunction into the pipeline, which may seem like an overengineering in cases where commands take a few seconds to execute on a virtual machine, is actually indispensable when this execution takes a relatively long time.&lt;/p&gt;

&lt;p&gt;Using AWS CDK dramatically shortens code writing time, as long as you are familiar with one of the supported programming languages.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>serverless</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Blue/green deployment of a web server on ECS Fargate</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Sat, 20 Aug 2022 15:44:00 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/bluegreen-deployment-of-a-web-server-on-ecs-fargate-2kif</link>
      <guid>https://dev.to/monica_colangelo/bluegreen-deployment-of-a-web-server-on-ecs-fargate-2kif</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published at &lt;a href="https://letsmake.cloud/bluegreen-fargate"&gt;https://letsmake.cloud&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Seamless technological upgrades of legacy infrastructures
&lt;/h1&gt;

&lt;p&gt;A frontline &lt;strong&gt;web server&lt;/strong&gt; exposing a backend application - who hasn't seen one?&lt;/p&gt;

&lt;p&gt;This apparently simple logical architecture is obviously based on multiple instances that can guarantee high reliability and load balancing.&lt;/p&gt;

&lt;p&gt;It is a model that has existed for decades, but the technologies to make it evolve. Sometimes, due to time or budget, or organizational reasons, it is not possible to &lt;strong&gt;modernize&lt;/strong&gt; specific applications, for example, because they often belong to different teams with different priorities, or because the project group has been dissolved and the application must be "kept alive" as it is.&lt;/p&gt;

&lt;p&gt;These situations are super common, and the result is that many old configurations are never deleted from web servers but instead continue to &lt;strong&gt;stratify&lt;/strong&gt; more and more, even becoming extremely complex.&lt;/p&gt;

&lt;p&gt;This complexity increases the &lt;strong&gt;risk&lt;/strong&gt; of making a mistake when inserting a change exponentially, and the &lt;strong&gt;&lt;em&gt;blast radius&lt;/em&gt;&lt;/strong&gt; is potentially enormous in this situation.&lt;/p&gt;

&lt;p&gt;To summarize, the use case I want to describe has the &lt;strong&gt;constraint&lt;/strong&gt; of not intervening in the configurations and maintaining the logical architecture, but we want to act at a technological level to improve safety and reliability and minimize operational risk.&lt;/p&gt;

&lt;p&gt;The solution I created is based on &lt;strong&gt;ECS Fargate&lt;/strong&gt;, where I transformed old virtual machines into containers, and uses the same methodology that usually applies to backend applications, that is the &lt;strong&gt;blue/green deployment&lt;/strong&gt; technique, with the execution of tests to decide whether a new configuration can go online safely.&lt;/p&gt;

&lt;h1&gt;
  
  
  Blue/green deployment
&lt;/h1&gt;

&lt;p&gt;Blue/green deployment is an application &lt;strong&gt;release model&lt;/strong&gt; that swaps traffic from an older version of an app or microservice to a new release. The previous version is called the blue environment, while the new version is called the green environment.&lt;/p&gt;

&lt;p&gt;In this model, it is essential to &lt;strong&gt;test&lt;/strong&gt; the green environment to ensure its readiness to handle production traffic. Once the tests are passed, this new version is &lt;strong&gt;promoted&lt;/strong&gt; to production by reconfiguring the load balancer to transfer the incoming traffic from the blue environment to the green environment, running the latest version of the application at last.&lt;/p&gt;

&lt;p&gt;Using this strategy increases application &lt;strong&gt;availability&lt;/strong&gt; and reduces &lt;strong&gt;operational risk&lt;/strong&gt;, while also the rollback process is simplified.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fully managed updates with ECS Fargate
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AWS CodePipeline&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-ECSbluegreen.html"&gt;supports&lt;/a&gt; fully automated blue/green releases on &lt;strong&gt;Amazon Elastic Container Service (ECS)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Normally, when you create an ECS service with an &lt;strong&gt;Application Load Balancer&lt;/strong&gt; in front of it, you need to designate a target group that contains the microservices to receive the requests. The blue/green approach involves the creation of &lt;strong&gt;two&lt;/strong&gt; target groups: one for the blue version and one for the green version of the service. It also uses a different listening port for each target group, so that you can test the green version of the service using the same path as the blue version.&lt;/p&gt;

&lt;p&gt;With this configuration, you run both environments in &lt;strong&gt;parallel&lt;/strong&gt; until you are ready to switch to the green version of the service.&lt;/p&gt;

&lt;p&gt;When you are ready to replace the old blue version with the new green version, you swap the listener rules with the target group rules. This change takes place in seconds. At this point the green service is running in the target group with the listener of the "original" port (which previously belonged to the blue version) and the blue service is running in the target group with the listener of the port that was of the green version (until termination).&lt;/p&gt;

&lt;p&gt;At this point, the real &lt;strong&gt;question&lt;/strong&gt; is: how can this system decide if and when the green version is ready to replace the blue version?&lt;/p&gt;

&lt;p&gt;You need a control logic that executes &lt;strong&gt;tests&lt;/strong&gt; to evaluate whether the new version can replace the old one with a high degree of confidence. Swapping from the old to the new version is only allowed after passing these tests.&lt;/p&gt;

&lt;p&gt;All these steps are &lt;strong&gt;fully automatic&lt;/strong&gt; on ECS thanks to the complete integration of AWS CodePipeline + CodeBuild + CodeDeploy services. The control tests in my case are performed by a Lambda.&lt;/p&gt;

&lt;p&gt;The following diagram illustrates the approach described.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--plBkiU5a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c5uom73lr2fkpiaz919e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--plBkiU5a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c5uom73lr2fkpiaz919e.png" alt="bluegreen" width="825" height="777"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Using Terraform to build a blue/green deployment system on ECS
&lt;/h1&gt;

&lt;p&gt;Creating an ECS cluster and a pipeline that builds the new version of the container image to deploy in blue/green mode is not difficult in itself but requires creating many cloud resources to coordinate. Below we will look at some of the key details in creating these assets with Terraform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can find the complete example &lt;a href="https://github.com/theonlymonica/bluegreen-ecs-fargate-examples"&gt;at this link&lt;/a&gt;. Here are just some code snippets useful for examining the use case.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Application Load Balancer
&lt;/h2&gt;

&lt;p&gt;One of the basic resources for our architecture is the load balancer. I enable access logs stored in a bucket because they will be used indirectly for the testing Lambda:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_alb" "load_balancer" {
  name            = replace(local.name, "_", "-")
  internal        = false
  access_logs {
    bucket  = aws_s3_bucket.logs_bucket.bucket
    prefix  = "alb_access_logs"
    enabled = true
  }
  ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So I create two target groups, identical to each other:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_alb_target_group" "tg_blue" {
  name        = join("-", [replace(local.name, "_", "-"), "blue"])
  port        = 80
  protocol    = "HTTP"
  target_type = "ip"
  ...
}

resource "aws_alb_target_group" "tg_green" {
  name        = join("-", [replace(local.name, "_", "-"), "green"])
  port        = 80
  protocol    = "HTTP"
  target_type = "ip"
  ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As mentioned before, I create two listeners on two different ports, 80 and 8080 in this example. The meta-argument &lt;code&gt;ignore_changes&lt;/code&gt; makes Terraform ignore future changes to the &lt;code&gt;default_action&lt;/code&gt; that will have been performed by the blue/green deployment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_alb_listener" "lb_listener_80" {
  load_balancer_arn = aws_alb.load_balancer.id
  port              = "80"
  protocol          = "HTTP"

  default_action {
    target_group_arn = aws_alb_target_group.tg_blue.id
    type             = "forward"
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

resource "aws_alb_listener" "lb_listener_8080" {
  load_balancer_arn = aws_alb.load_balancer.id
  port              = "8080"
  protocol          = "HTTP"

  default_action {
    target_group_arn = aws_alb_target_group.tg_green.id
    type             = "forward"
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ECS
&lt;/h2&gt;

&lt;p&gt;In the configuration of the ECS cluster, the service definition includes the indication of the target group to be associated with the creation. This indication will be ignored in any subsequent Terraform runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_ecs_service" "ecs_service" {
  name             = local.name
  cluster          = aws_ecs_cluster.ecs_cluster.id
  task_definition  = aws_ecs_task_definition.task_definition.arn
  desired_count    = 2
  launch_type      = "FARGATE"

  deployment_controller {
    type = "CODE_DEPLOY"
  }

  load_balancer {
    target_group_arn = aws_alb_target_group.tg_blue.arn
    container_name   = local.name
    container_port   = 80
  }

  lifecycle {
    ignore_changes = [task_definition, load_balancer, desired_count]
  }
  ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CodeCommit
&lt;/h2&gt;

&lt;p&gt;The application code - in my case, the webserver configurations and the &lt;code&gt;Dockerfile&lt;/code&gt; to create the image - is saved in a Git repository on CodeCommit. A rule is associated with this repository that intercepts every push event and triggers the pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_codecommit_repository" "repo" {
  repository_name = local.name
  description     = "${local.name} Repository"
}

resource "aws_cloudwatch_event_rule" "commit" {
  name        = "${local.name}-capture-commit-event"
  description = "Capture ${local.name} repo commit"

  event_pattern = &amp;lt;&amp;lt;EOF
{
  "source": [
    "aws.codecommit"
  ],
  "detail-type": [
    "CodeCommit Repository State Change"
  ],
  "resources": [
   "${aws_codecommit_repository.repo.arn}"
  ],
  "detail": {
    "referenceType": [
      "branch"
    ],
    "referenceName": [
      "${aws_codecommit_repository.repo.default_branch}"
    ]
  }
}
EOF
}

resource "aws_cloudwatch_event_target" "event_target" {
  target_id = "1"
  rule      = aws_cloudwatch_event_rule.commit.name
  arn       = aws_codepipeline.codepipeline.arn
  role_arn  = aws_iam_role.codepipeline_role.arn
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Dockerfile&lt;/code&gt; that will be saved on this repository depends of course on the application. In my case I have a code structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── version.txt
├── Dockerfile
└── etc
    └── nginx
        ├── nginx.conf
        └── conf.d
            ├── file1.conf
            └── ...
        └── projects.d
            ├── file2.conf
            └── ...
        └── upstream.d
            └── file3.conf
            └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My &lt;code&gt;Dockerfile&lt;/code&gt; will be therefore very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM nginx:latest

COPY etc/nginx/nginx.conf /etc/nginx/nginx.conf 
COPY etc/nginx/conf.d /etc/nginx/conf.d 
COPY etc/nginx/projects.d /etc/nginx/projects.d/ 
COPY etc/nginx/upstream.d /etc/nginx/upstream.d/
COPY version.txt /usr/share/nginx/html/version.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CodeBuild
&lt;/h2&gt;

&lt;p&gt;I then configure the step needed to generate the new version of the container through CodeBuild. The &lt;code&gt;privileged_mode = true&lt;/code&gt; property enables the Docker daemon within the CodeBuild container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_codebuild_project" "codebuild" {
  name          = local.name
  description   = "${local.name} Codebuild Project"
  build_timeout = "5"
  service_role  = aws_iam_role.codebuild_role.arn

  artifacts {
    type = "CODEPIPELINE"
  }

  environment {
    compute_type                = "BUILD_GENERAL1_SMALL"
    image                       = "aws/codebuild/standard:6.0"
    type                        = "LINUX_CONTAINER"
    image_pull_credentials_type = "CODEBUILD"
    privileged_mode             = true

    environment_variable {
      name  = "IMAGE_REPO_NAME"
      value = aws_ecr_repository.ecr_repo.name
    }

    environment_variable {
      name  = "AWS_ACCOUNT_ID"
      value = data.aws_caller_identity.current.account_id
    }
  }

  source {
    type      = "CODEPIPELINE"
    buildspec = "buildspec.yml"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;buildspec.yml&lt;/code&gt; file in the CodeBuild configuration is used to define how to generate the container image. This file is included in the repository along with the code, and it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: 0.2
env:
  shell: bash

phases:
  install:
    runtime-versions:
      docker: 19

  pre_build:
    commands:
      - IMAGE_TAG=$CODEBUILD_BUILD_NUMBER
      - echo Logging in to Amazon ECR...
      - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)

  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest

  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing to repo
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CodeDeploy
&lt;/h2&gt;

&lt;p&gt;The CodeDeploy configuration includes three different resources. The first two are relatively simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_codedeploy_app" "codedeploy_app" {
  compute_platform = "ECS"
  name             = local.name
}

resource "aws_codedeploy_deployment_config" "config_deploy" {
  deployment_config_name = local.name
  compute_platform       = "ECS"

  traffic_routing_config {
    type = "AllAtOnce"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I finally configure the blue/green deployment. With this code I indicate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;to perform an automatic rollback in case of deployment failure&lt;/li&gt;
&lt;li&gt;if successful, terminate the old version after 5 minutes&lt;/li&gt;
&lt;li&gt;listeners for "normal" (prod) traffic and test traffic
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_codedeploy_deployment_group" "codedeploy_deployment_group" {
  app_name               = aws_codedeploy_app.codedeploy_app.name
  deployment_group_name  = local.name
  service_role_arn       = aws_iam_role.codedeploy_role.arn
  deployment_config_name = aws_codedeploy_deployment_config.config_deploy.deployment_config_name

  ecs_service {
    cluster_name = aws_ecs_cluster.ecs_cluster.name
    service_name = aws_ecs_service.ecs_service.name
  }

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE"]
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout    = "CONTINUE_DEPLOYMENT"
      wait_time_in_minutes = 0
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5
    }
  }

  load_balancer_info {
    target_group_pair_info {
      target_group {
        name = aws_alb_target_group.tg_blue.name
      }

      target_group {
        name = aws_alb_target_group.tg_green.name
      }

      prod_traffic_route {
        listener_arns = [aws_alb_listener.lb_listener_80.arn]
      }

      test_traffic_route {
        listener_arns = [aws_alb_listener.lb_listener_8080.arn]
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is necessary to insert in the Git repository, together with the code, also two files essential for the correct functioning of CodeDeploy.&lt;/p&gt;

&lt;p&gt;The first file is &lt;code&gt;taskdef.json&lt;/code&gt; and includes the task definition for our ECS service, with the indication of &lt;code&gt;container image&lt;/code&gt;, &lt;code&gt;executionRole&lt;/code&gt; and &lt;code&gt;logConfiguration&lt;/code&gt; to be inserted according to the resources created by Terraform. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "executionRoleArn": "arn:aws:iam::123456789012:role/ECS_role_BlueGreenDemo",
  "containerDefinitions": [
    {
      "name": "BlueGreenDemo",
      "image": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/BlueGreenDemo_repository:latest",
      "essential": true,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ],
      "ulimits": [
        {
          "name": "nofile",
          "softLimit": 4096,
          "hardLimit": 4096
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/aws/ecs/BlueGreenDemo",
          "awslogs-region": "eu-south-1",
          "awslogs-stream-prefix": "nginx"
        }
      }
    }
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ],
  "networkMode": "awsvpc",
  "cpu": "256",
  "memory": "512",
  "family": "BlueGreenDemo"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second file to include in the repository is &lt;code&gt;appspec.yml&lt;/code&gt;, which is the file used by CodeDeploy to perform release operations. The task definition is set with a placeholder (because the real file path will be referenced in the CodePipeline configuration), and the name of the lambda to run for the tests is indicated.&lt;/p&gt;

&lt;p&gt;In our case, the lambda must be executed when the &lt;code&gt;AfterAllowTestTraffic&lt;/code&gt; event arrives, that is, when the new version is ready to receive the test traffic. Other possible hooks are documented &lt;a href="https://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file-structure-hooks.html#appspec-hooks-ecs"&gt;on this page&lt;/a&gt;; my choice depended on my use case and how I decided to implement my tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "&amp;lt;TASK_DEFINITION&amp;gt;"
        LoadBalancerInfo:
          ContainerName: "BlueGreenDemo"
          ContainerPort: 80
Hooks:
  - AfterAllowTestTraffic: "BlueGreenDemo_lambda"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lambda
&lt;/h2&gt;

&lt;p&gt;The Lambda function that performs the tests is created by Terraform, and some environment variables are also configured, the purpose of which will be explained later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lambda_function" "lambda" {
  function_name = "${local.name}_lambda"
  role          = aws_iam_role.lambda_role.arn
  handler       = "app.lambda_handler"
  runtime       = "python3.8"
  timeout       = 300
  s3_bucket     = aws_s3_bucket.lambda_bucket.bucket
  s3_key        = aws_s3_object.lambda_object.key
  environment {
    variables = {
      BUCKET               = aws_s3_bucket.testdata_bucket.bucket
      FILEPATH             = "acceptance_url_list.csv"
      ENDPOINT             = "${local.custom_endpoint}:8080"
      ACCEPTANCE_THRESHOLD = "90"
    }
  }
}

resource "aws_s3_object" "lambda_object" {
  key    = "${local.name}/dist.zip"
  bucket = aws_s3_bucket.lambda_bucket.bucket
  source = data.archive_file.lambda_zip_file.output_path
}

data "archive_file" "lambda_zip_file" {
  type        = "zip"
  output_path = "${path.module}/${local.name}-lambda.zip"
  source_file = "${path.module}/../lambda/app.py"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CodePipeline
&lt;/h2&gt;

&lt;p&gt;Finally, to make all the resources seen so far interact correctly, I configure CodePipeline to orchestrate the three stages corresponding to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;download the source code from CodeCommit&lt;/li&gt;
&lt;li&gt;build of the container image performed by CodeBuild&lt;/li&gt;
&lt;li&gt;a release made by CodeDeploy
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_codepipeline" "codepipeline" {
  name     = local.name
  role_arn = aws_iam_role.codepipeline_role.arn

  artifact_store {
    location = aws_s3_bucket.codepipeline_bucket.bucket
    type     = "S3"
  }

  stage {
    name = "Source"
    action {
      name             = "Source"
      category         = "Source"
      owner            = "AWS"
      provider         = "CodeCommit"
      version          = "1"
      output_artifacts = ["source_output"]

      configuration = {
        RepositoryName        = aws_codecommit_repository.repo.repository_name
        BranchName            = aws_codecommit_repository.repo.default_branch
        PollForSourceChanges  = false
      }
    }
  }

  stage {
    name = "Build"
    action {
      name             = "Build"
      category         = "Build"
      owner            = "AWS"
      provider         = "CodeBuild"
      input_artifacts  = ["source_output"]
      output_artifacts = ["build_output"]
      version          = "1"

      configuration = {
        ProjectName = aws_codebuild_project.codebuild.name
      }
    }
  }

  stage {
    name = "Deploy"
    action {
      category        = "Deploy"
      name            = "Deploy"
      owner           = "AWS"
      provider        = "CodeDeployToECS"
      version         = "1"
      input_artifacts = ["source_output"]

      configuration = {
        ApplicationName                = local.name
        DeploymentGroupName            = local.name
        AppSpecTemplateArtifact        = "source_output"
        AppSpecTemplatePath            = "appspec.yaml"
        TaskDefinitionTemplateArtifact = "source_output"
        TaskDefinitionTemplatePath     = "taskdef.json"
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
   Purpose of the tests and implementation logic of Lambda
&lt;/h1&gt;

&lt;p&gt;The purpose of the control test on the new version of the service to be deployed is not to verify that the new configurations are working and conform to expectations, but rather that they do not introduce "&lt;strong&gt;regressions&lt;/strong&gt;" on the previous behaviour of the web server. In essence, this is a mechanism to reduce the &lt;strong&gt;risk&lt;/strong&gt; of "breaking" something that used to work - which is extremely important in the case of an infrastructure shared by many applications.&lt;/p&gt;

&lt;p&gt;The idea behind this Lambda is to make a series of requests to the new version of the service when it has been created and is ready to receive traffic, but the load balancer is still configured with the old version (trigger event of the &lt;code&gt;AfterAllowTestTraffic&lt;/code&gt; hook configured in the CodeDeploy &lt;code&gt;appspec.yml&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The list of URLs to be tested must be &lt;strong&gt;prepared&lt;/strong&gt; with a separate process: it may be a static list, but in my case, having no control or visibility on the URLs delivered dynamically from the backend (think about a CMS: tons of URLs that can change anytime), I created a night job whose starting point is the webserver access logs of the day before; but I also had to take into account URLs that are rarely visited and that may not be present in the access logs every day. Furthermore, the execution time of a Lambda is limited, therefore a significant subset of URLs must be carefully chosen in order not to prolong the execution excessively and risk the timeout.&lt;/p&gt;

&lt;p&gt;The process of creating this list is linked to how many and which URLs are served by the web server. Therefore it is not possible to provide suggestions on a unique way to generate the list: it is strictly dependent on the use case. The list of URLs of requests to be made is contained, in my example, in the file &lt;code&gt;acceptance_url_list.csv&lt;/code&gt; on an S3 bucket.&lt;/p&gt;

&lt;p&gt;The environment variables used by my Lambda include the bucket and path of this file, the endpoint to send requests to and a parameter introduced to allow for a &lt;strong&gt;margin of error&lt;/strong&gt;. The applications to which the web server sends requests may change as a result of application releases and the URLs that were functional the day before may no longer be reachable; not being able to have complete control, especially in very complex infrastructures, I have chosen to introduce a threshold corresponding to the percentage of requests that must obtain an HTTP 200 response for the test to be considered passed.&lt;/p&gt;

&lt;p&gt;Once the logic described is understood, the Lambda code is not particularly complex: the function makes the requests listed in the list, calculates the percentage of successes, and finally notifies CodeDeploy of the outcome of the test.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
import urllib.request
import os
import csv
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event,context):
    codedeploy = boto3.client('codedeploy')

    endpoint = os.environ['ENDPOINT']
    bucket = os.environ['BUCKET']
    file = os.environ['FILEPATH']
    source_file = "s3://"+os.environ['BUCKET']+"/"+os.environ['FILEPATH']
    perc_min = os.environ['ACCEPTANCE_THRESHOLD']

    count_200 = 0
    count_err = 0

    s3client = boto3.client('s3')
    try:
        s3client.download_file(bucket, file, "/tmp/"+file)
    except:
        pass

    with open("/tmp/"+file, newline='') as f:
        reader = csv.reader(f)
        list1 = list(reader)

    for url_part in list1:
        code = 0
        url = "http://"+endpoint+url_part[0]
        try:
            request = urllib.request.urlopen(url)
            code = request.code
            if code == 200:
                count_200 = count_200 + 1
            else:
                count_err = count_err + 1
        except:
            count_err = count_err + 1
        if code == 0:
            logger.info(url+" Error")
        else:
            logger.info(url+" "+str(code))

    status = 'Failed'
    perc_200=(int((count_200/(count_200+count_err))*100))
    logger.info("HTTP 200 response percentage: ")
    logger.info(perc_200)
    if perc_200 &amp;gt; int(perc_min):
        status = "Succeeded"

    logger.info("TEST RESULT: ")
    logger.info(status)

    codedeploy.put_lifecycle_event_hook_execution_status(
        deploymentId=event["DeploymentId"],            
        lifecycleEventHookExecutionId=event["LifecycleEventHookExecutionId"],
        status=status
    )
    return True
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Integration is a "tailoring" activity
&lt;/h1&gt;

&lt;p&gt;In this article, we have seen how to integrate and coordinate many different objects to make them converge towards end-to-end automation, also including resources that need to be tailored to the specific use case.&lt;/p&gt;

&lt;p&gt;Automation of releases is an activity that I find very rewarding. Historically, releases have always been a thorn in the side, precisely because the activity was manual, not subject to any tests, with many unexpected variables: fortunately, as we have seen, it is now possible to rely on a well-defined, clear and repeatable process.&lt;/p&gt;

&lt;p&gt;The tools we have available for automation are very interesting and versatile, but the cloud doesn't do it all by itself. However, some important &lt;strong&gt;integration&lt;/strong&gt; work is necessary (the code we have seen is only a part; the &lt;a href="https://github.com/theonlymonica/bluegreen-ecs-fargate-examples"&gt;complete example is here&lt;/a&gt;), and above all knowing how to &lt;strong&gt;adapt&lt;/strong&gt; the resources to the use case, always looking for the best solution to solve the specific problem.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>containers</category>
      <category>cloud</category>
    </item>
    <item>
      <title>How to expose multiple applications on Google Kubernetes Engine with a single Cloud Load Balancer</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Sat, 20 Aug 2022 15:38:00 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/how-to-expose-multiple-applications-on-google-kubernetes-engine-with-a-single-cloud-load-balancer-43c4</link>
      <guid>https://dev.to/monica_colangelo/how-to-expose-multiple-applications-on-google-kubernetes-engine-with-a-single-cloud-load-balancer-43c4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published at &lt;a href="https://letsmake.cloud/multiple-gke-single-lb"&gt;https://letsmake.cloud&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my &lt;a href="https://letsmake.cloud/multiple-eks-single-alb"&gt;previous article&lt;/a&gt;, I talked about how to expose multiple applications hosted on AWS EKS via a single Application Load Balancer.&lt;/p&gt;

&lt;p&gt;In this article, we will see how to do the same thing, this time not on AWS but &lt;strong&gt;Google Cloud&lt;/strong&gt;!&lt;/p&gt;

&lt;h1&gt;
  
  
  Network Endpoint Group and Container-native load balancing
&lt;/h1&gt;

&lt;p&gt;On GCP, configurations called &lt;a href="https://cloud.google.com/load-balancing/docs/negs"&gt;Network Endpoint Group (NEG)&lt;/a&gt; are used to specify a group of endpoints or backend services. A common use case using NEGs is deploying services in containers using them as a backend for some load balancers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/container-native-load-balancing"&gt;Container-native load balancing&lt;/a&gt; uses &lt;code&gt;GCE_VM_IP_PORT&lt;/code&gt; NEGs (where NEG endpoints are pod IP addresses) and allows the load balancer to target pods, and distribute traffic among them directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Vk-Yk-_j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ivf0nn4rv8jkhxiu85jg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Vk-Yk-_j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ivf0nn4rv8jkhxiu85jg.png" alt="neg" width="528" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Commonly, container-native load balancing is used for the &lt;em&gt;Ingress&lt;/em&gt; GKE resource. In that case, the &lt;em&gt;ingress-controller&lt;/em&gt; takes care of creating all the necessary resource chain, including the load balancer; this means that each application on GKE corresponds to an &lt;em&gt;Ingress&lt;/em&gt; and consequently a load balancer.&lt;/p&gt;

&lt;p&gt;Without using the &lt;em&gt;ingress-controller&lt;/em&gt;, GCP allows you to create autonomous NEGs; in that case, you have to act manually, and you lose the advantages of the elasticity and speed of a cloud-native architecture.&lt;/p&gt;

&lt;p&gt;To summarize: in my use case, I want to use a single load balancer, configured independently from GKE, and have traffic routed to different GKE applications, depending on the rules established by my architecture; and, at the same time, I want to take advantage of cloud-native automatisms without making manual configuration updating operations.&lt;/p&gt;

&lt;h1&gt;
  
  
  AWS ALB vs GCP Load Balancing
&lt;/h1&gt;

&lt;p&gt;Realizing the same use case on two different cloud providers, the most noteworthy difference is in the "boundary" that Kubernetes reaches in managing resources; or, if we want to look at things from the other point of view, in the configurations that must be prepared on the cloud provider (manually or, as we will see, with Terraform).&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://letsmake.cloud/multiple-eks-single-alb"&gt;article about EKS&lt;/a&gt;, on AWS I configured, in addition to the ALB, the &lt;em&gt;target groups&lt;/em&gt;, one for each application to be exposed; these target groups were created as "empty boxes". Subsequently, I created the &lt;em&gt;deployments&lt;/em&gt; and their related &lt;em&gt;services&lt;/em&gt; on EKS; finally, I made a &lt;code&gt;TargetGroupBinding&lt;/code&gt; configuration (&lt;em&gt;lb-controller&lt;/em&gt; custom resource) to indicate to the pods belonging to a specific service which was the correct target group to register with.&lt;/p&gt;

&lt;p&gt;In GCP, the &lt;strong&gt;Backend Service&lt;/strong&gt; resource (which can be roughly assimilated to an AWS target group) cannot be created as an "empty box", but since its creation it needs to know its targets to forward traffic to. As I said before, in my use case the targets are the NEGs that GKE automatically generates when a Kubernetes service is created; consequently, I will create these &lt;em&gt;services&lt;/em&gt; at the same time as the infrastructure (they will be my "empty boxes"), and I will only manage the application &lt;em&gt;deployments&lt;/em&gt; separately.&lt;/p&gt;

&lt;p&gt;This apparent difference is purely &lt;strong&gt;operational&lt;/strong&gt;: it is just a matter of configuring the Kubernetes service with different tools, and it can be noteworthy if the configuration of the cloud resources (for example, with Terraform) is made by a different team than the one that deploys the applications in the cluster.&lt;/p&gt;

&lt;p&gt;From a functional point of view, the two solutions are exactly equivalent.&lt;/p&gt;

&lt;p&gt;The other difference is that in GKE the VPC IP addresses to be assigned to the pods are managed natively, and they do not require any add-on, while on EKS the &lt;em&gt;VPC CNI&lt;/em&gt; plugin or other similar third-party plugins must be used.&lt;/p&gt;

&lt;h1&gt;
  
  
  Component configuration
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;GKE cluster and network configuration are considered a prerequisite and will not be covered here. The code shown here is partial; a complete example can be &lt;a href="https://github.com/theonlymonica/multiple-app-single-lb-examples"&gt;found here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Kubernetes Services
&lt;/h2&gt;

&lt;p&gt;In this example I create two different applications, represented by Nginx and by Apache, to show traffic routing on two different endpoints.&lt;/p&gt;

&lt;p&gt;With Terraform I create the Kubernetes services related to the two applications; the use of &lt;strong&gt;annotations&lt;/strong&gt; allows the automatic creation of NEGs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "kubernetes_service" "apache" {
  metadata {
    name      = "apache"
    namespace = local.namespace
    annotations = {
      "cloud.google.com/neg" = "{\"exposed_ports\": {\"80\":{\"name\": \"${local.neg_name_apache}\"}}}"
      "cloud.google.com/neg-status" = jsonencode(
        {
          network_endpoint_groups = {
            "80" = local.neg_name_apache
          }
          zones = data.google_compute_zones.available.names
        }
      )
    }
  }
  spec {
    port {
      name        = "http"
      protocol    = "TCP"
      port        = 80
      target_port = "80"
    }
    selector = {
      app = "apache"
    }
    type = "ClusterIP"
  }
}

resource "kubernetes_service" "nginx" {
  metadata {
    name      = "nginx"
    namespace = local.namespace
    annotations = {
      "cloud.google.com/neg" = "{\"exposed_ports\": {\"80\":{\"name\": \"${local.neg_name_nginx}\"}}}"
      "cloud.google.com/neg-status" = jsonencode(
        {
          network_endpoint_groups = {
            "80" = local.neg_name_nginx
          }
          zones = data.google_compute_zones.available.names
        }
      )
    }
  }
  spec {
    port {
      name        = "http"
      protocol    = "TCP"
      port        = 80
      target_port = "80"
    }
    selector = {
      app = "nginx"
    }
    type = "ClusterIP"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  NEG
&lt;/h2&gt;

&lt;p&gt;NEG links always have the same structure, so it's easy to build a list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;locals {
  neg_name_apache = "apache"
  neg_apache      = formatlist("https://www.googleapis.com/compute/v1/projects/%s/zones/%s/networkEndpointGroups/%s", module.enabled_google_apis.project_id, data.google_compute_zones.available.names, local.neg_name_apache)
  neg_name_nginx  = "nginx"
  neg_nginx       = formatlist("https://www.googleapis.com/compute/v1/projects/%s/zones/%s/networkEndpointGroups/%s", module.enabled_google_apis.project_id, data.google_compute_zones.available.names, local.neg_name_nginx)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---tfe_ZZy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r837h0z7ek4dnlob6jw8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---tfe_ZZy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r837h0z7ek4dnlob6jw8.png" alt="gcp-neg" width="880" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Backend
&lt;/h2&gt;

&lt;p&gt;At this point it is easy to create the backend services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "google_compute_backend_service" "backend_apache" {
  name    = "${local.name}-backend-apache"

  dynamic "backend" {
    for_each = local.neg_apache
    content {
      group          = backend.value
      balancing_mode = "RATE"
      max_rate       = 100
    }
  }
...
}

resource "google_compute_backend_service" "backend_nginx" {
  name    = "${local.name}-backend-nginx"

  dynamic "backend" {
    for_each = local.neg_nginx
    content {
      group          = backend.value
      balancing_mode = "RATE"
      max_rate       = 100
    }
  }
...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--L5XW-R7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8q6bdhbzdqk19miz7dey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--L5XW-R7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8q6bdhbzdqk19miz7dey.png" alt="gcp-backend" width="880" height="661"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  URL Map
&lt;/h2&gt;

&lt;p&gt;I then define the &lt;code&gt;url_map&lt;/code&gt; resource, which represents the traffic routing logic. In this example, I use a set of rules that are the same for all domains to which my load balancer responds, and I address the traffic according to the &lt;em&gt;path&lt;/em&gt;; you can customize the routing rules following the &lt;a href="https://cloud.google.com/load-balancing/docs/url-map"&gt;documentation&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "google_compute_url_map" "http_url_map" {
  project         = module.enabled_google_apis.project_id
  name            = "${local.name}-loadbalancer"
  default_service = google_compute_backend_bucket.static_site.id

  host_rule {
    hosts        = local.domains
    path_matcher = "all"
  }

  path_matcher {
    name            = "all"
    default_service = google_compute_backend_bucket.static_site.id

    path_rule {
      paths = ["/apache"]
      route_action {
        url_rewrite {
          path_prefix_rewrite = "/"
        }
      }
      service = google_compute_backend_service.backend_apache.id
    }

    path_rule {
      paths = ["/nginx"]
      route_action {
        url_rewrite {
          path_prefix_rewrite = "/"
        }
      }
      service = google_compute_backend_service.backend_nginx.id
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ORB3h7NN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fnqnz7tazpk6oo311ua4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ORB3h7NN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fnqnz7tazpk6oo311ua4.png" alt="gcp-frontend" width="880" height="772"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;Finally, the resources that bind the created components together are a &lt;code&gt;target_http_proxy&lt;/code&gt; and a &lt;code&gt;global_forwarding_rule&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "google_compute_target_http_proxy" "http_proxy" {
  project = module.enabled_google_apis.project_id
  name    = "http-proxy"
  url_map = google_compute_url_map.http_url_map.self_link
}

resource "google_compute_global_forwarding_rule" "http_fw_rule" {
  project               = module.enabled_google_apis.project_id
  name                  = "http-fw-rule"
  port_range            = 80
  target                = google_compute_target_http_proxy.http_proxy.self_link
  load_balancing_scheme = "EXTERNAL"
  ip_address            = google_compute_global_address.ext_lb_ip.address
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Use on Kubernetes
&lt;/h1&gt;

&lt;p&gt;Once set up on GCP is complete, using this technique on GKE is even easier than on EKS. It is sufficient to insert a &lt;em&gt;deployment&lt;/em&gt; resource that corresponds to the &lt;em&gt;service&lt;/em&gt; already created on the load balancer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache
  labels:
    app: apache
spec:
  selector:
    matchLabels:
      app: apache
  strategy:
    type: Recreate
  replicas: 3
  template:
    metadata:
      labels:
        app: apache
    spec:
      containers:
        - name: httpd
          image: httpd:2.4
          ports:
            - containerPort: 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From now on, each new pod that refers to the deployment associated with that service will automatically be associated with its NEG. To test it, just scale the number of replicas of the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl scale deployment nginx --replicas 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and within a few seconds the new pods will be present as a target of the NEG.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thanks to &lt;a href="https://www.linkedin.com/in/cristian-conte-27a93565"&gt;Cristian Conte&lt;/a&gt; for contributing with his GCP knowledge!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>gcp</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>cloud</category>
    </item>
    <item>
      <title>How to expose multiple applications on Amazon EKS with a single Application Load Balancer</title>
      <dc:creator>Monica Colangelo</dc:creator>
      <pubDate>Sat, 20 Aug 2022 15:32:00 +0000</pubDate>
      <link>https://dev.to/monica_colangelo/how-to-expose-multiple-applications-on-amazon-eks-with-a-single-application-load-balancer-ond</link>
      <guid>https://dev.to/monica_colangelo/how-to-expose-multiple-applications-on-amazon-eks-with-a-single-application-load-balancer-ond</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post was originally published at &lt;a href="https://letsmake.cloud/multiple-eks-single-alb"&gt;https://letsmake.cloud&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Expose microservices to the Internet with AWS
&lt;/h1&gt;

&lt;p&gt;One of the defining moments in building a microservices application is deciding how to &lt;strong&gt;expose&lt;/strong&gt; endpoints so that a client or API can send requests and get responses.&lt;/p&gt;

&lt;p&gt;Usually, each microservice has its endpoint. For example, each URL path will point to a different microservice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;www.example.com/service1 &amp;gt; microservice1
www.example.com/service2 &amp;gt; microservice2
www.example.com/service3 &amp;gt; microservice3
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This type of routing is known as &lt;strong&gt;&lt;em&gt;path-based routing&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This approach has the advantage of being &lt;strong&gt;low-cost&lt;/strong&gt; and simple, even when exposing dozens of microservices.&lt;/p&gt;

&lt;p&gt;On AWS, both &lt;strong&gt;Application Load Balancer (ALB)&lt;/strong&gt; and &lt;strong&gt;Amazon API Gateway&lt;/strong&gt; support this feature. Therefore, with a &lt;strong&gt;single ALB&lt;/strong&gt; or API Gateway, you can expose microservices running as containers with Amazon EKS or Amazon ECS, or serverless functions with AWS Lambda.&lt;/p&gt;

&lt;p&gt;AWS recently proposed a &lt;a href="https://aws.amazon.com/blogs/containers/how-to-expose-multiple-applications-on-amazon-eks-using-a-single-application-load-balancer/"&gt;solution to expose EKS orchestrated microservices via an Application Load Balancer&lt;/a&gt;. Their solution is based on the use of &lt;em&gt;NodePort&lt;/em&gt; exposed by Kubernetes.&lt;/p&gt;

&lt;p&gt;Instead, I want to propose a different solution that uses the EKS cluster &lt;strong&gt;VPC CNI add-on&lt;/strong&gt; and allows the pods to automatically connect to their &lt;em&gt;target group&lt;/em&gt;, without using any &lt;em&gt;NodePort&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Also, in my use case, the Application Load Balancer is managed &lt;strong&gt;independently&lt;/strong&gt; of EKS, i.e. it is not Kubernetes that has control over it. This way you can use other types of routing on the load balancer; for example, you could have an SSL certificate with more than one domain (&lt;em&gt;SNI&lt;/em&gt;) and base the routing not only on the path but also on the domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--L4kDMDMk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/69qd185pmzyrnjwcpuv8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--L4kDMDMk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/69qd185pmzyrnjwcpuv8.png" alt="eks-lb" width="880" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Component configuration
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;The code shown here is partial. A complete example can be found &lt;a href="https://github.com/theonlymonica/multiple-app-single-lb-examples"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  EKS cluster
&lt;/h2&gt;

&lt;p&gt;In this article, the EKS cluster is a prerequisite and it is assumed that it is already installed. If you want, you can read how to install an EKS cluster with Terraform in &lt;a href="https://letsmake.cloud/eks-cluster-autoscaler"&gt;my article on autoscaling&lt;/a&gt;. A complete example can be found in my &lt;a href="https://github.com/theonlymonica/multiple-app-single-lb-examples"&gt;repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  VPC CNI add-on
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html"&gt;VPC CNI (Container Network Interface) add-on&lt;/a&gt; allows you to automatically assign a VPC IP address directly to a pod within the EKS cluster.&lt;/p&gt;

&lt;p&gt;Since we want pods to &lt;strong&gt;self-register&lt;/strong&gt; on their target group (which is a resource outside of Kubernetes and inside the VPC), the use of this add-on is imperative. Its installation is natively integrated on EKS, &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html"&gt;as explained here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Load Balancer Controller plugin
&lt;/h2&gt;

&lt;p&gt;AWS Load Balancer Controller is a controller that helps manage an Elastic Load Balancer for a Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;It is typically used for provisioning an Application Load Balancer, as an Ingress resource, or a Network Load Balancer as a Service resource.&lt;/p&gt;

&lt;p&gt;In our case provisioning is not required, because our Application Load Balancer is managed &lt;strong&gt;independently&lt;/strong&gt;. However, we will use another type of component installed by the CRD to make the pods register to their target group.&lt;/p&gt;

&lt;p&gt;This plugin is not included in the EKS installation, so it must be installed following the &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html"&gt;instructions from the AWS documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you use Terraform, like me, you can consider using a module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;module "load_balancer_controller" {
  source  = "DNXLabs/eks-lb-controller/aws"
  version = "0.6.0"

  cluster_identity_oidc_issuer     = module.eks_cluster.cluster_oidc_issuer_url
  cluster_identity_oidc_issuer_arn = module.eks_cluster.oidc_provider_arn
  cluster_name                     = module.eks_cluster.cluster_id

  namespace = "kube-system"
  create_namespace = false
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Load Balancer and Security Group
&lt;/h2&gt;

&lt;p&gt;With Terraform I create an Application Load Balancer in the public subnets of our VPC and its Security Group. The VPC is the same where the EKS cluster is installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lb" "alb" {
  name                       = "${local.name}-alb"
  internal                   = false
  load_balancer_type         = "application"
  subnets                    = module.vpc.public_subnets
  enable_deletion_protection = false
  security_groups            = [aws_security_group.alb.id]
}

resource "aws_security_group" "alb" {
  name        = "${local.name}-alb-sg"
  description = "Allow ALB inbound traffic"
  vpc_id      = module.vpc.vpc_id

  tags = {
    "Name" = "${local.name}-alb-sg"
  }

  ingress {
    description = "allowed IPs"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "allowed IPs"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;It is important to remember to authorize this Security Group as a source in the Security Group inbound rules of the cluster nodes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, I create the target groups to which the pods will bind themselves. In this example I use two:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lb_target_group" "alb_tg1" {
  port        = 8080
  protocol    = "HTTP"
  target_type = "ip"
  vpc_id      = module.vpc.vpc_id

  tags = {
    Name = "${local.name}-tg1"
  }

  health_check {
    path = "/"
  }
}

resource "aws_lb_target_group" "alb_tg2" {
  port        = 9090
  protocol    = "HTTP"
  target_type = "ip"
  vpc_id      = module.vpc.vpc_id

  tags = {
    Name = "${local.name}-tg2"
  }

  health_check {
    path = "/"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last configuration on the Application Load Balancer is the &lt;strong&gt;listeners&lt;/strong&gt;' definition, which contains the traffic routing rules.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;default&lt;/strong&gt; rule on listeners, which is the response to requests that do not match any other rules, is to refuse traffic; I enter it as a &lt;strong&gt;security&lt;/strong&gt; measure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lb_listener" "alb_listener_http" {
  load_balancer_arn = aws_lb.alb.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "fixed-response"

    fixed_response {
      content_type = "text/plain"
      message_body = "Internal Server Error"
      status_code  = "500"
    }
  }
}

resource "aws_lb_listener" "alb_listener_https" {
  load_balancer_arn = aws_lb.alb.arn
  port              = "443"
  protocol          = "HTTPS"
  certificate_arn   = aws_acm_certificate.certificate.arn
  ssl_policy        = "ELBSecurityPolicy-2016-08"

  default_action {
    type = "fixed-response"

    fixed_response {
      content_type = "text/plain"
      message_body = "Internal Server Error"
      status_code  = "500"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual rules are then associated with the listeners. The listener on port 80 has a simple redirect to the HTTPS listener. The listener on port 443 has rules to route traffic according to the path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lb_listener_rule" "alb_listener_http_rule_redirect" {
  listener_arn = aws_lb_listener.alb_listener_http.arn
  priority     = 100

  action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }

  condition {
    host_header {
      values = local.all_domains
    }
  }
}

resource "aws_lb_listener_rule" "alb_listener_rule_forwarding_path1" {
  listener_arn = aws_lb_listener.alb_listener_https.arn
  priority     = 100

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.alb_tg1.arn
  }

  condition {
    host_header {
      values = local.all_domains
    }
  }

  condition {
    path_pattern {
      values = [local.path1]
    }
  }
}

resource "aws_lb_listener_rule" "alb_listener_rule_forwarding_path2" {
  listener_arn = aws_lb_listener.alb_listener_https.arn
  priority     = 101

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.alb_tg2.arn
  }

  condition {
    host_header {
      values = local.all_domains
    }
  }

  condition {
    path_pattern {
      values = [local.path2]
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Getting things work on Kubernetes
&lt;/h1&gt;

&lt;p&gt;Once setup on AWS is complete, using this technique on EKS is super easy! It is sufficient to insert a &lt;strong&gt;TargetGroupBinding&lt;/strong&gt; type resource for each deployment/service we want to expose on the load balancer through the target group.&lt;/p&gt;

&lt;p&gt;Let's see an example. Let's say I have a deployment with a service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: nginx
spec:
  selector:
    app.kubernetes.io/name: nginx
  ports:
    - port: 8080
      targetPort: 80
      protocol: TCP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only configuration I need to add is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
  name: nginx
spec:
  serviceRef:
    name: nginx
    port: 8080
  targetGroupARN: "arn:aws:elasticloadbalancing:eu-south-1:123456789012:targetgroup/tf-20220726090605997700000002/a6527ae0e19830d2"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From now on, each new pod that belongs to the deployment associated with that service will &lt;strong&gt;self-register&lt;/strong&gt; on the indicated target group. To test it, just scale the number of replicas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl scale deployment nginx --replicas 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and within a few seconds the new pods' IPs will be visible in the target group.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_Lee3jOr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4js3jzd413ykq8gpqi4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_Lee3jOr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4js3jzd413ykq8gpqi4i.png" alt="target-groups" width="880" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
