<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: pmalmirae</title>
    <description>The latest articles on DEV Community by pmalmirae (@pmalmirae).</description>
    <link>https://dev.to/pmalmirae</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F915073%2F014464da-0ceb-4159-956e-1ff1691f55df.jpg</url>
      <title>DEV Community: pmalmirae</title>
      <link>https://dev.to/pmalmirae</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pmalmirae"/>
    <language>en</language>
    <item>
      <title>Speeding Up Data on AWS: From Ingestion to Insights</title>
      <dc:creator>pmalmirae</dc:creator>
      <pubDate>Wed, 07 Aug 2024 11:37:02 +0000</pubDate>
      <link>https://dev.to/aws-builders/speeding-up-data-on-aws-from-ingestion-to-insights-2am9</link>
      <guid>https://dev.to/aws-builders/speeding-up-data-on-aws-from-ingestion-to-insights-2am9</guid>
      <description>&lt;p&gt;&lt;em&gt;In a production-scale cloud environment, data is scattered across various storage formats and locations, such as RDS databases, DynamoDB tables, time series databases, S3 files, and external systems. While Amazon QuickSight can directly connect to many data sources, it is often not preferred due to design principles, costs, performance, and user experience. Instead, the best practice is to build a centralized data lake with tools to consolidate and transform data for business intelligence tools. But how can you optimize the data pipeline from ingestion to insights to ensure processed data is ready for analysis as quickly as possible?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, we use real-world open data sets on Helsinki region public traffic, imported as DynamoDB tables. We showcase, how we can transform data from the source to a Data Lake in S3, combine the Data Sets in QuickSight to create interesting and actionable insights, and eventually, how we can speed up the Data Pipeline to ensure the insights are always as up-to-date as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of a typical Serverless Data Pipeline on AWS
&lt;/h2&gt;

&lt;p&gt;NordHero has implemented data pipelines for various customers on AWS utilizing our &lt;a href="https://www.nordhero.com/offerings/data-to-insights" rel="noopener noreferrer"&gt;Data to Insights Jump Start offering&lt;/a&gt;. The solution uses &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Glue Jobs&lt;/strong&gt; to extract data from their sources, transform the data to be efficiently utilized with BI tools, and load the data in Parquet or ORC format to a data lake based on &lt;strong&gt;Amazon S3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Glue Crawlers&lt;/strong&gt; to determine the data lake schemas and to store the schemas in &lt;strong&gt;AWS Glue Data Catalog&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Athena&lt;/strong&gt; to provide a scalable and super-fast SQL interface to the data stored in the S3 data lake&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS QuickSight&lt;/strong&gt; to analyze the data, build actionable insights on the data, and deliver the insights to business users &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focm2mbi4pwd6afy9ojrd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focm2mbi4pwd6afy9ojrd.jpg" alt="An example of Serverless Data Pipeline on AWS" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Glue is an AWS-managed service, meaning that AWS manages the needed compute instances, their software, and the scaling of the resources. You only pay for your data's processing time. You can create and run several AWS Glue jobs to extract, transform, and load (ETL) data from various data sources into the data lake and build different curated datasets in the data lake for various data consumption needs. &lt;/p&gt;

&lt;p&gt;Amazon S3 is an ideal service to be used as the storage foundation for a data lake, providing several benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability and Elasticity:&lt;/strong&gt; Amazon S3 can scale massively to store virtually unlimited amounts of data, without the need for provisioning or managing storage infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Lake Architecture:&lt;/strong&gt; S3 enables a decoupled storage and compute architecture, allowing you to store data in its raw form and use various analytics services and tools to process and analyze the data without being tied to a specific compute engine. S3 integrates seamlessly with various AWS analytics services like Amazon Athena, AWS Glue, Amazon EMR, Amazon QuickSight, and AWS Lake Formation, enabling you to build end-to-end data processing and analytics pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-Effective:&lt;/strong&gt; Amazon S3 offers a cost-effective storage solution, with pricing based on the amount of data stored and accessed. You can also leverage different storage classes (e.g., S3 Glacier) for cost optimization based on data access patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Durability and Availability:&lt;/strong&gt; Amazon S3 is designed for 99.999999999% durability and 99.99% availability, ensuring your data is safe and accessible when needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Lake Security and Compliance:&lt;/strong&gt; Amazon S3 provides robust security features, including access control, encryption at rest and in transit, and integration with AWS Identity and Access Management (IAM) for granular permissions management. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Sharing and Collaboration:&lt;/strong&gt; With Amazon S3, you can easily share data across teams, projects, or even with external parties, enabling collaboration and data monetization opportunities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Data Repository:&lt;/strong&gt; A data lake on Amazon S3 serves as a centralized repository for all your structured, semi-structured, and unstructured data, breaking down data silos and enabling data democratization within your organization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's an example AWS Glue Job script, written in Python, that extracts passenger data from a DynamoDB table named &lt;em&gt;hsl-passengers&lt;/em&gt;, transforms the column names from uppercase to lowercase, casts &lt;em&gt;passenger_count&lt;/em&gt; field from &lt;em&gt;String&lt;/em&gt; to &lt;em&gt;Integer&lt;/em&gt; type, and lastly writes the transformed data in an S3 bucket in &lt;em&gt;Parquet&lt;/em&gt; format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;getResolvedOptions&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GlueContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.job&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.dataframe&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IntegerType&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;


&lt;span class="n"&gt;SPARK_CONTEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;GLUE_CONTEXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GlueContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SPARK_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GLUE_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark_session&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GLUE_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_dynamo_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Reads DynamoDB table into a DynamicFrame&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;dyf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GLUE_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_dynamic_frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;connection_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;connection_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb.input.tableName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tablename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb.throughput.read.percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb.splits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dyf&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Writes log to multiple outputs&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3_output_base_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Writes data to specific folder in S3&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s3_output_base_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Output folder must contain path element &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;processed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to be valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;write_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writing output to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;overwrite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;partitionBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;

    &lt;span class="c1"&gt;# @params: [JOB_NAME]
&lt;/span&gt;    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getResolvedOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3_output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GLUE_CONTEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Let's only overwrite partitions that have changed, even though we store all data
&lt;/span&gt;    &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.sql.sources.partitionOverwriteMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;s3_output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3_output_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nf"&gt;write_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Parameter &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3_output_path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s3_output_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Reading data from DynamoDb
&lt;/span&gt;    &lt;span class="n"&gt;passengers_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;read_dynamo_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hsl-passengers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toDF&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;passengers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;passengers_raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumnRenamed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OBJECTID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumnRenamed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SHORTID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;short_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumnRenamed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STOPNAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passenger_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PASSENGERCOUNT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;IntegerType&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PASSENGERCOUNT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;write_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3_output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passengers-data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;passengers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you trigger the AWS Glue Job, AWS Glue fires up the needed Apache Spark compute instances, manages the parallel job execution between cluster nodes, and ramps down the compute services after the job execution has finished.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing the data delivery to the data lake with AWS Glue
&lt;/h2&gt;

&lt;p&gt;You need to consider several things when optimizing the ETL process with AWS Glue, but eventually, it comes down to two criteria. The primary criterion is &lt;strong&gt;Time&lt;/strong&gt;. The data lake is never 100% up-to-date with the source data. So the key question is, how often should the data be updated? The secondary criterion is always &lt;strong&gt;Money&lt;/strong&gt;. After setting the Time criterion, how can the costs of the ETL process be optimized?&lt;/p&gt;

&lt;p&gt;To beat the criteria, you need to plan your data pipeline well. Here are a few spices to compete against the clock:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scale cluster capacity:&lt;/strong&gt; Adjust the number of Data Processing Units (DPUs) and worker types based on your workload requirements. AWS Glue allows you to scale resources up or down to match the demands of your ETL jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the latest AWS Glue version:&lt;/strong&gt; AWS regularly releases new versions of AWS Glue with performance improvements and new features. Upgrade to the latest version to take advantage of these enhancements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduce data scan:&lt;/strong&gt; Minimize the amount of data your jobs scan by using techniques like partitioning, caching, and filtering data early in the ETL process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallelize tasks:&lt;/strong&gt; Divide your ETL tasks into smaller parts and process them concurrently to improve throughput. AWS Glue supports parallelization through features like repartitioning and coalesce operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimize planning overhead:&lt;/strong&gt; Reduce the time spent on planning by optimizing your AWS Glue Data Catalog, using the correct data types, and avoiding unnecessary schema changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize shuffles:&lt;/strong&gt; Minimize the amount of data shuffled between tasks, as shuffles can be resource-intensive. Use techniques like repartitioning and coalescing to reduce shuffles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize user-defined functions (UDFs):&lt;/strong&gt; If you're using UDFs, ensure they are efficient and optimize their execution using vectorization and caching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use AWS Glue Auto Scaling:&lt;/strong&gt; Enable AWS Glue Auto Scaling to adjust the number of workers based on your workload automatically, ensuring efficient resource utilization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and tune:&lt;/strong&gt; Use AWS Glue's monitoring capabilities, such as the Spark UI and CloudWatch metrics, to identify bottlenecks and tune your jobs accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage AWS Glue Workflow:&lt;/strong&gt; Use AWS Glue Workflow to orchestrate and manage your ETL pipelines, ensuring efficient execution and resource utilization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize data formats:&lt;/strong&gt; Use columnar data formats like Parquet or ORC, which are optimized for analytical workloads and can improve query performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage AWS Glue Data Catalog:&lt;/strong&gt; Use the AWS Glue Data Catalog to store and manage your data schemas, which can improve planning and reduce overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimize data compression:&lt;/strong&gt; Use appropriate compression techniques to reduce the amount of data transferred and stored, improving performance and reducing costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid processing the same data multiple times:&lt;/strong&gt; Use AWS Glue Job bookmarks to track the data already processed by the ETL job, and update only the changed partitions when loading data to the data lake.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's an example of AWS Glue Workflow. The workflow processes Helsinki Region Transport (HSL) open data on passenger amounts and public transport stops and shifts. The workflow has a trigger named &lt;em&gt;hsl-data-glue-workflow-trigger&lt;/em&gt; that is configured to start once per hour. The trigger will fire up five parallel AWS Glue Jobs to process data related to &lt;em&gt;shifts&lt;/em&gt;, &lt;em&gt;passengers&lt;/em&gt;, &lt;em&gt;stoptypes&lt;/em&gt;, &lt;em&gt;network&lt;/em&gt; and &lt;em&gt;stops&lt;/em&gt;. When all these Jobs end up in the SUCCESS state, the &lt;em&gt;hsl-data-glue-crawler-trigger&lt;/em&gt; is triggered to start an AWS Glue Crawler to update the data schemas in the AWS Glue Data Catalog.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffl0hv8aj561ehyid6omg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffl0hv8aj561ehyid6omg.jpg" alt="An example of AWS Glue Workflow that processes open HSL transport data" width="727" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS Glue Workflows support three types of start triggers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schedule:&lt;/strong&gt; The workflow is started according to a defined schedule (e.g., daily, weekly, monthly, or a custom cron expression).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-demand:&lt;/strong&gt; The workflow is started manually from the AWS Glue console, API, or AWS CLI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EventBridge event:&lt;/strong&gt; The workflow starts with a single Amazon EventBridge event or a batch of Amazon EventBridge events.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Optimizing the data insights experience with Amazon QuickSight
&lt;/h2&gt;

&lt;p&gt;From a data consumption perspective, the key criterion is that the data be up-to-date and instantly available. If there's lots of data in the data lake, updating an analysis or dashboard view in Amazon QuickSight might take even tens of seconds. That will make the business analytics experience very poor and generate lots of costs. &lt;/p&gt;

&lt;p&gt;Amazon QuickSight has solved this issue with a lightning-fast in-memory caching solution called &lt;em&gt;SPICE&lt;/em&gt; (Super-fast, Parallel, In-memory Calculation Engine). When configuring QuickSight DataSets, you have the option to either query the underlying data directly or utilize SPICE. QuickSight comes with a 10 GB SPICE allocation per QuickSight Author license, and additional SPICE capacity can be purchased with GB/month pricing.&lt;/p&gt;

&lt;p&gt;When using SPICE, the underlying data from the data sources, such as a data lake, is loaded into SPICE. QuickSight Analyses and Dashboards utilize only the version of data available in SPICE. SPICE can be refreshed &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;manually&lt;/li&gt;
&lt;li&gt;by a preconfigured schedule&lt;/li&gt;
&lt;li&gt;through QuickSight API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SPICE refresh timing becomes an issue when targeting to have as recent data available in QuickSight as possible. Consider a situation where the Glue Workflow, containing multiple ETL jobs, runs once per hour and updates several datasets in the S3 data lake. In our imaginary example, the workflow process typically lasts 20 minutes. Still, depending on the amount of changed data in the data sources since the last run and the current utilization level of the AWS-managed Glue hardware, the Workflow run can take between 14 and 40 minutes. &lt;/p&gt;

&lt;p&gt;In addition, the QuickSight SPICE refresh process runs on AWS-managed computing resources, and in our case, refreshing one QuickSight DataSet might take 2-8 minutes. &lt;/p&gt;

&lt;p&gt;And in a typical production-scale QuickSight environment, the order of DataSet refreshes matter. There are "simple" DataSets that are not dependent on any other DataSet and then there combined DataSets that utilize simple DataSets. Before starting to refresh one DataSet in QuickSight, we need to be sure that all underlying DataSets our DataSet is dependent on have first been refreshed.&lt;/p&gt;

&lt;p&gt;Here's an example of a combined DataSet in QuickSight. All datasets available in data lake have first been brought to QuickSight, and now the &lt;em&gt;passengers&lt;/em&gt; data is first joined with &lt;em&gt;stops&lt;/em&gt; data while passenger amounts are counted per stop. Then &lt;em&gt;stops&lt;/em&gt; data is joined with &lt;em&gt;stoptypes&lt;/em&gt; data (whether the stop is a glass shelter, steel shelter, post,...), &lt;em&gt;network&lt;/em&gt; data (is it a bus stop, subway stop, tram stop,...) and &lt;em&gt;shifts&lt;/em&gt; data (number of public transport shifts between different stops).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1xjodfqonz3hsgwt04m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1xjodfqonz3hsgwt04m.jpg" alt="An example of a combined DataSet in QuickSight" width="800" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing the whole data pipeline from ingestion to insights with event triggering and AWS Step Functions
&lt;/h2&gt;

&lt;p&gt;So how can we manage this all automatically and with optimal timing? To solve the issues, we need to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use AWS Glue Workflow to automate and order the Jobs and Crawlers within the Glue ETL process &lt;/li&gt;
&lt;li&gt;Refresh the QuickSight DataSets into SPICE immediately after Glue Workflow has finished its execution&lt;/li&gt;
&lt;li&gt;Refresh the QuickSight DataSets in the correct order so that the "simple" DataSets get updated first and combined DataSets right after the simple ones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Luckily we can achieve the latter two by utilizing AWS StepFunctions and CloudWatch Event Triggering! &lt;/p&gt;

&lt;h3&gt;
  
  
  Triggering a Step Function when Glue Workflow has finished
&lt;/h3&gt;

&lt;p&gt;AWS Glue Crawlers create CloudWatch Events on their lifecycle changes, and we can trigger an AWS Step Function State Machine execution when the last Crawler in our Glue Workflow sends a &lt;em&gt;Succeeded&lt;/em&gt; event. Here's a CDK/TypeScript snippet on creating the event rule to watch for Glue Crawler state change events and to start the Step Function execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;    &lt;span class="c1"&gt;// Event rule to trigger the Step Function&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnRule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CrawlerSucceededRule&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Glue crawler succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;smTriggerRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`statemachine-trigger-rule`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;eventPattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws.glue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;detail-type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Glue Crawler State Change&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;crawlerName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;equals-ignore-case&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`hsl-data-glue-crawler`&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfnStateMachine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attrArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfnStateMachine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attrName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;smTriggerRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Refreshing QuickSight DataSets with AWS Step Functions State Machine
&lt;/h3&gt;

&lt;p&gt;The AWS Step Functions have inbuilt integrations with loads of different AWS services, including AWS QuickSight. Therefore, it is straightforward to build a State Machine that starts refreshing the QuickSight DataSet SPICE - the process is called DataSet Ingestion. The following image shows an AWS Step Functions State Machine that processes QuickSight DataSet Ingestions in two phases - in the first phase, it parallelly ingests data on five QuickSight DataSets: &lt;em&gt;network&lt;/em&gt;, &lt;em&gt;stops&lt;/em&gt;, &lt;em&gt;passengers&lt;/em&gt;, &lt;em&gt;shifts&lt;/em&gt; and &lt;em&gt;stoptypes&lt;/em&gt;. When all those ingestions have been successfully finished, the State Machine continues to ingest the second set of QuickSight DataSets, which in this example contains only one DataSet: &lt;em&gt;passengers-and-stops&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzi9ru87hablf9mnxlvq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvzi9ru87hablf9mnxlvq.jpg" alt="An example of a Step Function that ingests QuickSight DataSet refreshes in two phases" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For each QuickSight DataSet, the State Machine&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starts the Ingestion process with &lt;strong&gt;CreateIngestion&lt;/strong&gt; call and saves the IngestionId value of the started Ingestion process&lt;/li&gt;
&lt;li&gt;Checks the Ingestion status with &lt;strong&gt;DescribeIngestion&lt;/strong&gt; call&lt;/li&gt;
&lt;li&gt;If IngestionStatus is &lt;em&gt;COMPLETED&lt;/em&gt;, &lt;em&gt;CANCELLED&lt;/em&gt; or &lt;em&gt;FAILED&lt;/em&gt;, it will pass the phase&lt;/li&gt;
&lt;li&gt;Otherwise, it will wait for 20 seconds and check the Ingestion status again&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summing it up
&lt;/h2&gt;

&lt;p&gt;As an end result, we now have a data pipeline that is triggered automatically by a predefined schedule, or with EventBridge event, and that starts ingesting QuickSight DataSets in correct order and as quickly as underlying data is updated. And now we can enjoy the actionable, up-to-date insights:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7xrylhabrwdq97q9x8v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7xrylhabrwdq97q9x8v.jpg" alt="An example of AWS QuickSight Analysis that utilizes combined QuickSight DataSets" width="800" height="589"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, we reviewed the components of AWS's serverless data lake solution and explored ways to optimize its performance and user experience. Lastly, we learned how to automate the whole process from data ingestion to data insights with AWS Step Functions and AWS Glue Crawler Event Triggering. &lt;/p&gt;

&lt;p&gt;We hope you enjoyed the journey. If you would like to set up a Serverless Data Pipeline and Data Lake on AWS, we are here to help. Just contact &lt;a href="https://www.nordhero.com/" rel="noopener noreferrer"&gt;NordHero&lt;/a&gt; or &lt;a href="https://calendly.com/pekka-malmirae-nordhero/30min" rel="noopener noreferrer"&gt;book a meeting with me&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;The examples in this article were build using data adapted from Helsinki Region Transport's (HSL) public data on transport stations, shifts and passengers. The original data is available on &lt;a href="https://hri.fi" rel="noopener noreferrer"&gt;Helsinki Region Infoshare&lt;/a&gt; site.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>aws</category>
      <category>awsdatalake</category>
      <category>etl</category>
    </item>
    <item>
      <title>What's new and noteworthy on AWS - Summer 2023 edition</title>
      <dc:creator>pmalmirae</dc:creator>
      <pubDate>Wed, 23 Aug 2023 06:43:47 +0000</pubDate>
      <link>https://dev.to/aws-builders/whats-new-and-noteworthy-on-aws-summer-2023-edition-2i8p</link>
      <guid>https://dev.to/aws-builders/whats-new-and-noteworthy-on-aws-summer-2023-edition-2i8p</guid>
      <description>&lt;p&gt;&lt;em&gt;During my summer vacation, I occasionally glimpsed the AWS announcements feed but didn’t have time to dig into the list. I still got the feeling that there were some major releases, so I decided to go through all those hundreds of announcements and put together a comprehensive list of the most remarkable and noteworthy releases and new features around my favorite topics: Data &amp;amp; Analytics, Serverless Architecture &amp;amp; App Development, AWS Management &amp;amp; DevOps &amp;amp; IaC, and Security. I hope that also you find some gems from the list!&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;The summer of 2023 was indeed scorching hot for what comes to AWS and releasing new services and features. From the start of June until late August 2023, the list of recent announcements is impressive. There are over 600 announcements that I read through, and I hand-picked 45 top news on Data, Serverless, DevOps, and Security spaces. Several features have now been released as GA or in Preview that had already been announced at AWS re:Invent in November 2022 with flashing lights. But talk is cheap, so here are my top picks from the summer releases, ordered by solution area and release date!&lt;/p&gt;

&lt;h2&gt;
  
  
  Data &amp;amp; Analytics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Amazon QuickSight launches geospatial heatmap for points on maps
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 5, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It has been possible to create analyses and dashboards with geospatial map visuals also earlier, but now it is possible to have a geospatial heat map with your own data.&lt;br&gt;
Please see following example image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SGKeZZzV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.aws.amazon.com/images/quicksight/latest/user/images/heat-map-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SGKeZZzV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://docs.aws.amazon.com/images/quicksight/latest/user/images/heat-map-1.png" alt="Amazon QuickSight geospatial heat map, image from AWS QuickSight Documentation" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Geospatial heat map style uses color gradations to indicate areas of high and low data point concentration, allowing readers to zoom in and out, pan across the map, and explore the data in detail. When the map is zoomed in to a certain level, the heat layer automatically reverts back to the basic points, allowing readers to interact with the underlying points.&lt;/p&gt;

&lt;p&gt;See here for more details: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-quicksight-geospatial-heatmap-points-maps/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-quicksight-geospatial-heatmap-points-maps/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Glue for Ray is now generally available
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 5, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS Glue for Ray is now generally available. AWS Glue for Ray is based on open-source compute framework &lt;strong&gt;&lt;a href="https://ray.io"&gt;ray.io&lt;/a&gt;&lt;/strong&gt;. AWS Glue for Ray combines Glue's serverless capability for data integration with possibility to develop ETL jobs with Python programming language.&lt;/p&gt;

&lt;p&gt;AWS Glue for Ray facilitates the distributed processing of your Python code over multi-node clusters. You can create and run Ray jobs anywhere that you can run AWS Glue ETL jobs. This includes existing AWS Glue jobs, command line interfaces (CLIs), and APIs.&lt;/p&gt;

&lt;p&gt;AWS Glue for Ray is generally available currently in the following AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland).&lt;/p&gt;

&lt;p&gt;More info: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-ray-generally-available/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-ray-generally-available/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Glue Data Quality is now generally available
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 6, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS announces general availability of AWS Glue Data Quality, a capability that automatically measures and monitors data lake and data pipeline quality. AWS Glue Data Quality helps reduce the need for manual data quality work by using open-source &lt;strong&gt;&lt;a href="https://github.com/awslabs/deequ"&gt;Deequ&lt;/a&gt;&lt;/strong&gt; to evaluate rules and measure and monitor the data quality of petabyte-scale data lakes. It then recommends data quality rules to get started. You can update recommended rules or add new rules. If facing any issues with data quality, you can configure actions to alert users.&lt;/p&gt;

&lt;p&gt;You can validate the data quality of Amazon Redshift, Apache Iceberg, Apache HUDI, and Delta Lake datasets that are cataloged in the AWS Glue Data Catalog. The quality results are published to Amazon EventBridge, simplifying how users are alerted and integrating data quality results with other applications.&lt;/p&gt;

&lt;p&gt;AWS Glue Data Quality is generally available in all AWS Regions where AWS Glue is available. To learn more: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-data-quality-generally-available/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-data-quality-generally-available/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Redshift Serverless now supports query scheduling and Single sign-on support
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 7, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon Redshift Serverless now allows scheduling of SQL queries. With scheduling, you can automate time sensitive or long running queries. You can utilize the scheduled queries with Amazon Redshift Query Editor V2 or Amazon Redshift Data API.&lt;/p&gt;

&lt;p&gt;Amazon Redshift Serverless supports now also Single sign-on with Identity Providers (IdP). How it works is that you can pass a list of database roles granted to a user based on his/her IdP group membership. Redshift administrator then configures the Identity Provider(IdP) to pass in database roles by adding specific principal tags as SAML attributes. The sign-on support can be used with Amazon Redshift Query Editor V2, JDBC/ODBC clients, and Data API.&lt;/p&gt;

&lt;p&gt;The features are available in all regions that support Amazon Redshift Serverless. Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-redshift-query-scheduling-single-sign-on/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-redshift-query-scheduling-single-sign-on/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon QuickSight now supports APIs to automate and accelerate assets deployment
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 7, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is a long-awaited feature! QuickSight has earlier been it's own island inside an AWS Account, and it has been extremely difficult to automate QuickSight assets' deployment from one environment to another. It has been possible through API or CLI, but requiring heavy-duty coding in managing all dependencies between assets and the environments. &lt;/p&gt;

&lt;p&gt;So now it is possible to export and import a QuickSight asset with all required dependencies. The feature supports all essential QuickSight assets such as dashboards, analysis, datasets including ingestion schedules, datasources, themes, and VPC configurations. You can even select whether to export the assets as plain JSON or as CloudFormation templates.&lt;/p&gt;

&lt;p&gt;The new APIs are available with the Amazon QuickSight Enterprise edition in following AWS Regions where QuickSight is available: US East (N. Virginia and Ohio), US West (Oregon), Canada, Sao Paulo, Europe (Frankfurt, Ireland and London), Asia Pacific (Mumbai, Seoul, Singapore, Sydney and Tokyo).&lt;/p&gt;

&lt;p&gt;Read the announcement here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-quicksight-apis-automate-accelerate-assets-deployment/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-quicksight-apis-automate-accelerate-assets-deployment/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Athena for Apache Spark now supports custom Java libraries
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 8, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon Athena for Apache Spark was first released at re:Invent 2022 conference. Amazon Athena for Apache Spark is a feature of Amazon Athena lets you run interactive analytics on Apache Spark in under a second to analyze petabytes of data. So basically it is Athena but turbo-charged. With the new release, you can now include your own Java libraries and modules as JAR files in Spark workloads to connect to different data sources and run advance calculations using user defined functions to perform feature exploration.&lt;/p&gt;

&lt;p&gt;The release includes also a set of reference connector packages for Amazon CloudWatch logs, CloudWatch metrics and Amazon DynamoDB so that you can use data from the services in your insights.&lt;/p&gt;

&lt;p&gt;The new features are currently supported in 9 AWS regions where Amazon Athena for Apache Spark is available: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Mumbai). To learn more: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-apache-spark-custom-java-libraries/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-apache-spark-custom-java-libraries/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Athena for Apache Spark now supports Apache Hudi, Apache Iceberg, and Delta Lake
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 8, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon Athena for Apache Spark now supports open-source data lake storage frameworks Apache Hudi 0.13, Apache Iceberg 1.2.1, and Linux Foundation Delta Lake 2.0.2. These frameworks simplify incremental data processing of large data sets using ACID (atomicity, consistency, isolation, durability) transactions and make it simpler to store and process large data sets in your data lakes.&lt;/p&gt;

&lt;p&gt;Apache Iceberg, Apache Hudi and Delta Lake support is available in all AWS regions where Amazon Athena for Apache Spark is available. Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-apache-spark-hudi-iceberg-delta-lake/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-apache-spark-hudi-iceberg-delta-lake/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Kinesis Data Firehose adds support for data stream delivery to Amazon Redshift Serverless
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 19, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This release is part of the "Zero-ETL" initiative announced by AWS CEO Adam Selipsky at re:Invent 2022. He stated that AWS is putting their efforts to connect the various AWS services so that builders can concentrate on creating value instead of spending their time trying to get services integrated.&lt;/p&gt;

&lt;p&gt;With the new release, Amazon Kinesis Data Firehose can now deliver streaming data directly to Amazon Redshift Serverless. With few clicks, you can more easily ingest, transform, and reliably deliver streaming data into Amazon Redshift Serverless without building and managing your own data ingestion and delivery infrastructure.&lt;/p&gt;

&lt;p&gt;Amazon Kinesis Data Firehose with Amazon Redshift Serverless is generally available in the regions here under Redshift Serverless API section.&lt;/p&gt;

&lt;p&gt;Read more from the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-kinesis-data-firehose-data-stream-delivery-redshift-serverless/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-kinesis-data-firehose-data-stream-delivery-redshift-serverless/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Glue now can detect 250 sensitive entity types from over 50 countries
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 23, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sensitive data detection feature in AWS Glue can now detect over 250 sensitive entity types from 50 countries out-of-the-box - all Nordic countries included!&lt;/p&gt;

&lt;p&gt;Sensitive data detection feature in AWS Glue identifies a variety of sensitive data elements like social security numbers, credit card numbers, names, driver license numbers and other entities. Once detected, customers can take actions to redact the sensitive information before writing records into their data repositories. Customers can also create custom detectors to detect entities specific to their organizations.&lt;/p&gt;

&lt;p&gt;This feature is available in the commercial Regions as AWS Glue. Check the supported countries here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-250-entity-types-50-countries/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-glue-250-entity-types-50-countries/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS announces Amazon Aurora MySQL zero-ETL integration with Amazon Redshift (Public Preview)
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 28, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes, another Zero-ETL announcement!&lt;/p&gt;

&lt;p&gt;Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is now available in public preview. The feature enables near real-time analytics and machine learning (ML) on petabytes of transactional data stored in Amazon Aurora MySQL-Compatible Edition. Data written into Aurora is available in Amazon Redshift within seconds, so you can quickly act on it without having to build and maintain complex data pipelines. Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is available for Amazon Aurora Serverless v2 and Provisioned as well as Amazon Redshift Serverless and RA3 instance types.&lt;/p&gt;

&lt;p&gt;Check the details: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-aurora-mysql-zero-etl-integration-redshift-public-preview/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-aurora-mysql-zero-etl-integration-redshift-public-preview/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Athena now supports querying restored data in S3 Glacier
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 29, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is fun! You can now use Amazon Athena to query data stored in Amazon S3 Glacier (Glacier Flexible Retrieval &amp;amp; Deep Archive storage classes supported). With this launch, you can use Athena to directly query restored data in the S3 Glacier for use cases such as log analytics and long-term trend analysis, saving you time by removing the need to move and duplicate data.&lt;/p&gt;

&lt;p&gt;This feature is available with Athena Engine V3 in all Amazon Athena supported regions. To learn more: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-querying-restored-data-s3-glacier/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-athena-querying-restored-data-s3-glacier/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon OpenSearch Service now supports OpenSearch version 2.7
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 10, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can now run features from open source OpenSearch version 2.7 in Amazon OpenSearch Service. Key improvements include introduction of a unified schema for OpenSearch, ability to add map visualisations to Dashboard panels, ability to filter geospatial data. The new version also includes support for five new security log types.&lt;/p&gt;

&lt;p&gt;Read the annoncement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-opensearch-service-opensearch-version-2-7/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-opensearch-service-opensearch-version-2-7/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon Redshift announces automatic mounting of AWS Glue Data Catalog
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 25, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon Redshift released automatic mounting of AWS Glue Data Catalog, making it easier for customers to run queries in their data lakes. So no need to anymore create an external schema in Amazon Redshift to use the data lake tables cataloged in AWS Glue Data Catalog. Now, you can query data lake tables directly from Amazon Redshift Query Editor v2 or your favorite SQL editors. Again, a release that makes the life of data specialists much more fun!&lt;/p&gt;

&lt;p&gt;Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-redshift-automatic-mounting-aws-glue-data-catalog/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-redshift-automatic-mounting-aws-glue-data-catalog/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Glue Studio now supports Amazon Redshift Serverless
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 25, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS Glue Studio now supports Amazon Redshift Serverless as a data source or target out-of-the-box. Earlir, only Amazon Redshift clusters were supported out-of-the-box in AWS Glue Studio. As serverless edition of Redshift takes the space among customers, this update is for sure well anticipated.&lt;/p&gt;

&lt;p&gt;To learn more, here's the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-glue-studio-amazon-redshift-serverless/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-glue-studio-amazon-redshift-serverless/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon EMR Serverless now supports retrieving secrets from AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 27, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A small but very important update. No more playing around with passwords and other secrets with your Amazon EMR Serverless. You can now get to the good side by utilizing AWS Secrets Manager for secrets like passwords, API keys an so forth.&lt;/p&gt;

&lt;p&gt;Read here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-emr-serverless-retrieving-secrets-aws-secrets-manager/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-emr-serverless-retrieving-secrets-aws-secrets-manager/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon SageMaker announces a new direct integration with Salesforce Data Cloud
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Aug 4, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;August started with an announcement on Amazon SageMaker having a direct integration with Salesforce Data Cloud. What it means is that now you can without any extra hassle access Salesforce Data Cloud from SageMaker with OAuth-2.0-based authentication to build, train and deploy ML models on SageMager. So you can easily train ML models with SalesForce data, and turbo-charge Salesforce Einstein with ML-driven wisdom.&lt;/p&gt;

&lt;p&gt;Salesforce Data Cloud direct integration is supported in all AWS regions where SageMaker is available. To learn more, read the announcement here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-sagemaker-direct-integration-salesforce-data-cloud/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-sagemaker-direct-integration-salesforce-data-cloud/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS IAM Identity Center integration is now generally available for Amazon QuickSight
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Aug 14, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is again one of those announcements that many have been waiting for! As said already earlier, QuickSight has been a quite isolated island inside AWS account, having its own user and group management. With this update, QuickSight administrators can now configure QuickSight to use IAM Identity Center to enable their users to login using their existing credentials. Administrators can select IAM Identity Center to configure QuickSight with their organization’s supported identity provider or with the IAM Identity Center identity store without requiring additional single sign-on configuration in QuickSight. Furthermore, they can use their identity provider groups to assign QuickSight roles (administrator, author and reader) to users.&lt;/p&gt;

&lt;p&gt;This new feature is available in all AWS Regions where QuickSight and IAM Identity Center are available. Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/08/aws-iam-identity-center-integration-amazon-quicksight/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/08/aws-iam-identity-center-integration-amazon-quicksight/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Serverless architecture &amp;amp; app development
&lt;/h2&gt;
&lt;h3&gt;
  
  
  AWS Lambda adds support for Ruby 3.2
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 7, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS Lambda now supports Ruby 3.2 as both a managed runtime and a container base image. The new Ruby version brings new features such as endless methods, a new Data class, improved pattern matching, and performance improvements. The Ruby 3.2 runtime is available in all regions where Lambda is available. The announcement can be found here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-lambda-support-ruby-3-2/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-lambda-support-ruby-3-2/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon SQS announces support for dead-letter queue redrive via AWS SDK or CLI
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 8, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon Simple Queue Service (SQS) announced support for dead-letter queue redrive via AWS SDK or Command Line Interface (CLI). The new feature is an enhanced capability to improve the dead-letter queue management by giving users a possibility to move messages from the dead-letter queue, and programmatically manage the lifecycle of the unconsumed messages at scale.&lt;/p&gt;

&lt;p&gt;To programmatically automate dead-letter queue message redrive workflows, customers can now use the following actions: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;StartMessageMoveTask&lt;/em&gt;, to start a new message movement task from the dead-letter queue;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;CancelMessageMoveTask&lt;/em&gt;, to cancel the message movement task;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;ListMessageMoveTasks&lt;/em&gt;, to get 10 most recent message movement tasks for a specified source queue.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See SQS Documentation for more information. &lt;/p&gt;

&lt;p&gt;Dead-letter queue redrive via AWS SDK and CLI is available in all AWS Regions where Amazon SQS is available. Announcement can be found here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-sqs-dead-letter-queue-redrive-aws-sdk-cli/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-sqs-dead-letter-queue-redrive-aws-sdk-cli/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Step Functions adds integration for 7 services including Amazon VPC Lattice
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 15, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The more integrations the better. AWS Step Functions really has a momentum, and now they released seven new integrations available through SDK. Overall, Step Functions have over 12,000 API actions from over 320 AWS services. That is really impressive and brings considerable advantage when building solutions that connect different AWS services. The new integrations include services such as Amazon VPC Lattice, Amazon CloudWatch Internet Monitor, AWS IoT TwinMaker, and Amazon OpenSearch Ingestion.&lt;/p&gt;

&lt;p&gt;These enhancements are now generally available in all regions where AWS Step Functions is available. Please read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-step-functions-7-services-vpc-lattice/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-step-functions-7-services-vpc-lattice/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Step Functions launches Versions and Aliases
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 22, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yet another major update for AWS Step Functions. AWS Step Functions announced in July 2023 the availability of Versions and Aliases, improving resiliency for deployments of serverless workflows. The new set of capabilities makes it easier to set up continuous deployment, to help you iterate faster and release safely into production. You can now maintain multiple versions of your workflows, track which version was used for each execution, and create aliases that route traffic between workflow versions. You can deploy your workflows gradually using industry standard techniques such as blue-green and canary style deployments with fast rollbacks to your Step Functions workflows, increasing deployment safety and reducing downtime and risk.&lt;/p&gt;

&lt;p&gt;More info here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-step-functions-versions-aliases/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-step-functions-versions-aliases/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Announcing general availability for watchOS and tvOS support on AWS Amplify Library for Swift
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 27, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Normally I tend to bypass news about UI development but this one really caught my eye. You can now use AWS Amplify to build applications for Apple Watch and Apple TV! In late June 2023, they announced general availability of watchOS and tvOS support for AWS Amplify for Swift (&amp;gt;= v2.12.0). This launch enables developers to build cloud-connected apps for Apple Watch (watchOS) and Apple TV (tvOS) devices, in addition to iOS and macOS platforms.&lt;/p&gt;

&lt;p&gt;Learn more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/watchos-tvos-aws-amplify-library-swift/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/watchos-tvos-aws-amplify-library-swift/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon ECS now launches tasks faster alongside tasks with prolonged shutdown
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 30, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon ECS is a platform that takes care of running containerized services called tasks. If a task becomes unhealthy, it is stopped and a new task is launched based on your configurations. Sometimes shutdown of a task takes a long time, and new task launches could get blocked on the instance. To overcome the situation, Amazon ECS now enables faster task launches on container instances that have tasks with prolonged shutdown. This enables customers to scale their workloads faster and improve infrastructure utilization.&lt;/p&gt;

&lt;p&gt;Previously, to enable higher task provisioning throughput, ECS optimistically considered instance resources (e.g. cpu, memory, ports) as free for launching new tasks whenever a running task transitioned to the stopping state. In cases when a stopping task takes a long time to shutdown, new tasks launches could get blocked on the instance. This happened because ECS Agent waited for all stopping tasks to shutdown before starting new tasks. With the new release from the end of June 2023, ECS Agent can start new tasks on an instance if requisite resources are available even if there are tasks pending shutdown, enabling faster task launches and improving infrastructure utilization.&lt;/p&gt;

&lt;p&gt;The new experience is available for customers using Amazon ECS on EC2 or ECS Anywhere in all AWS regions on Amazon ECS Optimized AMIs with ECS Agent version 1.73.0 or later. To learn more, read the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-ecs-tasks-faster-prolonged-shutdown/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-ecs-tasks-faster-prolonged-shutdown/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Lambda now detects and stops recursive loops in Lambda functions
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 13, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is a small but yet very important update. I'm sure everyone who has by mistake developed a solution that recursively calls the same Lambda over and over again, has included this feature in their evening prayers.&lt;/p&gt;

&lt;p&gt;AWS Lambda can now detect and stop recursive loops in Lambda functions. Lambdas are really popularly used to process events from sources like Amazon SQS and Amazon SNS. However, in certain scenarios, due to resource misconfiguration or code defect, a processed event may be sent back to the same service or resource that invoked the Lambda function. This can cause an unintended recursive loop, and result in unintended usage and costs for customers. With this launch, Lambda will stop recursive invocations between Amazon SQS, AWS Lambda, and Amazon SNS after 16 recursive calls.&lt;/p&gt;

&lt;p&gt;When facing the recursive calls situation, Lambda will stop the 17th invocation and sends the event to a Dead-Letter Queue or on-failure destination, if configured. Customers will also receive an AWS Health Dashboard notification with troubleshooting steps.&lt;/p&gt;

&lt;p&gt;Please see more details and available regions here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-lambda-detects-recursive-loops-lambda-functions/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-lambda-detects-recursive-loops-lambda-functions/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Fargate enables faster container startup using Seekable OCI
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 17, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nice development again from ECS team! Customers running applications on Amazon ECS with AWS Fargate can now leverage Seekable OCI (SOCI), a technology open sourced by AWS that helps applications deploy and scale out faster by enabling the containers to start without waiting for the entire container image to be downloaded.&lt;/p&gt;

&lt;p&gt;In many cases, waiting for the entire container image to download from container image repository is unnecessary as in many cases only a small portion of it is needed for startup. SOCI reduces this wait time by lazily loading the image data in parallel to application startup, enabling containers to start with only a fraction of the image.&lt;/p&gt;

&lt;p&gt;To SOCI-enable your container images, start from this announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-fargate-container-startup-seekable-oci/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-fargate-container-startup-seekable-oci/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon SNS now supports mobile push notifications in twelve new AWS regions
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 20, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Mobile client communications just got more comprehensive as Amazon SNS mobile push notifications are now available in twelve additional AWS regions, including Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Jakarta), Asia Pacific (Osaka), Canada (Central), Europe (London), Europe (Milan), Europe (Paris), Europe (Stockholm), Middle East (Bahrain), Middle East (UAE), and US East (Ohio). With this expansion, Amazon SNS now supports the ability to send mobile push notifications from 24 regions.&lt;/p&gt;

&lt;p&gt;Amazon SNS can send mobile push notifications on your behalf to mobile devices and desktops using one of the following supported push notification services: Amazon Device Messaging (ADM), Apple Push Notification Service (APNs) for iOS and Mac OS X, Baidu Cloud Push (Baidu), Firebase Cloud Messaging (FCM) to Android devices, Microsoft Push Notification Service for Windows Phone (MPNS), and Windows Push Notification Services (WNS). &lt;/p&gt;

&lt;p&gt;Read the news: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-sns-mobile-notifications-twelve-regions/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-sns-mobile-notifications-twelve-regions/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Lambda adds support for Python 3.11
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 27, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yet another Lambda programming language update: AWS Lambda now supports creating serverless applications using Python 3.11. Developers can use Python 3.11 as both a managed runtime and a container base image, and AWS will automatically apply updates to the managed runtime and base image as they become available.&lt;/p&gt;

&lt;p&gt;The Python 3.11 runtime is available in all Regions where Lambda is available, except for China and GovCloud Regions. Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-lambda-python-3-11/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-lambda-python-3-11/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Announcing preview of JSON protocol support for Amazon SQS
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 28, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is huge! The grand old SQS is turning from XML to JSON! At the end of July 2023, Amazon SQS announced a preview of JSON protocol support, enabling lower latency and improved performance for SQS customers. Based on AWS performance tests for a 5KB message payload, JSON protocol for Amazon SQS reduces end-to-end message processing latency by up to 23% and reduces application client side CPU and memory usage.&lt;/p&gt;

&lt;p&gt;Amazon SQS customers can take advantage of lower latency when using the specified AWS SDK version. The specified SDK version achieves these latency gains by upgrading the default communication protocols to JSON wire protocol when they make SQS API requests. Customers can upgrade their AWS SDK to specified SDK version to use JSON protocol. Customers can also revert back to the AWS Query protocol by changing the SDK version. &lt;/p&gt;

&lt;p&gt;For more information, here's the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/json-protocol-support-amazon-sqs/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/json-protocol-support-amazon-sqs/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon EventBridge Scheduler adds schedule deletion after completion
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Aug 2, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon EventBridge Scheduler can invoke more than 270 AWS services and over 6,000 API operations, and scales out enabling scheduling of millions of tasks. No wonder it has become a de-facto scheduler for different time-based or recurring solutions on AWS platform. EventBridge Scheduler's new delete upon completion helps manage and clean-up schedules that have completed its last invocation. It removes the need for manual processes or custom code to delete completed schedules saving you time and making it easier to scale.&lt;/p&gt;

&lt;p&gt;If interested, start with the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-eventbridge-scheduler-deletion-completion/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-eventbridge-scheduler-deletion-completion/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  AWS management &amp;amp; DevOps &amp;amp; IaC
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Single Region Terraform support now available for AWS Control Tower Account Factory
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 8, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS Control Tower is a great tool for managing AWS Organizations with multiple organization units, AWS accounts and related guardrails. With the new release, AWS Control Tower now offers a possibility to configure account templates with Terraform and utilize those templates when provisioning new or existing accounts from AWS Control Tower.&lt;/p&gt;

&lt;p&gt;To get started, you can use the AWS-provided Terraform Reference Engine on GitHub that conﬁgures the code and infrastructure required for the Terraform open source engine. After the one-time setup, customers can define their account requirements using Terraform and deploy them to their accounts as part of the well-defined account factory workflow. &lt;/p&gt;

&lt;p&gt;Read more from the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/single-region-terraform-control-tower-account-factory/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/single-region-terraform-control-tower-account-factory/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Control Tower adds 10 new AWS Security Hub controls
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 12, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Another news related to Control Tower! AWS has added 10 new AWS Security Hub detective controls to the AWS Control Tower controls library. These new controls target services such as Amazon APIGateway, AWS CodeBuild, Amazon Elastic Compute Cloud, Amazon Elastic Load Balancer, Amazon Redshift, Amazon SageMaker, and AWS WAF. These new controls help you meet control objectives, such as establish logging and monitoring, limiting network access and encrypting data at rest, enhancing your governance posture. &lt;/p&gt;

&lt;p&gt;With this addition, AWS Control Tower now supports over 170 detective controls from AWS Security Hub. Read more from the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-control-tower-new-aws-security-hub-controls/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-control-tower-new-aws-security-hub-controls/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Announcing general availability of AWS Control Tower's integration with Security Hub
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 19, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And one more to the AWS Control Tower! In July 2023 AWS announced the general availability of the integration between AWS Control Tower and AWS Security Hub. You can enable over 170 Security Hub detective controls that map to related control objectives from AWS Control Tower. With the new release, AWS Control Tower now detects when you disable a control from Security Hub which results in a ‘Drifted’ control state. With this drift detection capability, it is simpler for you to monitor the deployment state of your controls and take appropriate actions to manage the security posture of your AWS Control Tower environment.&lt;/p&gt;

&lt;p&gt;The drift detection capability for Security Hub controls requires updating to the new version of the AWS Control Tower Landing Zone 3.2. The new Landing Zone verion also includes updates to the Region Deny control for multiple AWS services. &lt;/p&gt;

&lt;p&gt;Read all about the announcement here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-control-tower-account-integration-security-hub/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-control-tower-account-integration-security-hub/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS CloudFormation accelerates dev-test cycle with new ChangeSets parameter
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 20, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Sometimes a small update is actually a BIG one. AWS CloudFormation launched a new parameter &lt;em&gt;OnStackFailure&lt;/em&gt; for the CreateChangeSet API that allows customers to control the rollback behavior of ChangeSets. Customers use ChangeSets to preview the impact of a stack operation on active resources. With this launch, customers can modify the actions that CloudFormation will take when ChangeSet execution is unsuccessful.&lt;/p&gt;

&lt;p&gt;Customers can set OnStackFailure to &lt;em&gt;ROLLBACK&lt;/em&gt;, &lt;em&gt;DELETE&lt;/em&gt;, or &lt;em&gt;DO_NOTHING&lt;/em&gt;, where &lt;em&gt;ROLLBACK&lt;/em&gt; is the default option for OnStackFailure and it reverts the stack to its last stable state if ChangeSet execution fails. When setting OnStackFailure to &lt;em&gt;DELETE&lt;/em&gt;, CloudFormation deletes the new stack if ChangeSet execution fails. This eliminates the need for manual clean-up of stacks and allows customers to retry stack creation with CI/CD actions. &lt;em&gt;DO_NOTHING&lt;/em&gt; preserves the state of the stack if ChangeSet execution fails.&lt;/p&gt;

&lt;p&gt;To learn more about OnStackFailure, click here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-cloudformation-accelerates-dev-test-cycle-changesets-parameter/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-cloudformation-accelerates-dev-test-cycle-changesets-parameter/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS CodeBuild now supports GitHub Actions
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 7, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS CodeBuild customers can now use GitHub Actions during the building and testing of software packages. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces ready-to-deploy software packages. Customers’ CodeBuild projects are now able to leverage many of the pre-built actions available in GitHub’s marketplace. With CodeBuild’s integration with GitHub Actions, you can now extend your buildspec definition to invoke third-party solutions. There is no need to author and maintain custom integrations, or learn how integrate others’ solutions into your build process. &lt;/p&gt;

&lt;p&gt;Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-codebuild-github-actions/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-codebuild-github-actions/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Amazon CodeCatalyst now supports workflows triggered by GitHub pull requests
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 19, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Another AWS DevOps tool integrating with GitHub! AWS announced in July 2023 their support for starting Amazon CodeCatalyst workflows based on pull request events in linked GitHub repositories. When a workflow is triggered by a GitHub-based pull request, users will also be able to see the name of the PR that triggered it in the CodeCatalyst workflows UI, and click a link that takes them directly to the pull request in GitHub.&lt;/p&gt;

&lt;p&gt;To learn more, see the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-codecatalyst-workflows-triggered-github-pull-requests/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-codecatalyst-workflows-triggered-github-pull-requests/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Control Tower launches additional proactive controls
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 24, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Trust is good, control is better! In July 2024, AWS announced a launch of 28 new proactive controls in AWS Control Tower. This launch enhances AWS Control Tower’s governance capabilities with services such as Amazon CloudWatch, Amazon Neptune, Amazon ElastiCache, AWS Step Functions, and Amazon DocumentDB. Read more here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/aws-control-tower-proactive-controls/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/aws-control-tower-proactive-controls/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Accelerate your CloudFormation authoring experience with looping function
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jul 26, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This might be the biggest announcement for AWS CloudFormation fans for time being. AWS CloudFormation announced at the end of July 2023 looping capability with Fn::ForEach intrinsic function. With Fn::ForEach, you can replicate parts of your templates with minimal lines of code. &lt;/p&gt;

&lt;p&gt;To use Fn::ForEach you have to declare AWS::LanguageExtensions transform. The language extensions transform expands the functionality of the base CloudFormation JSON/YAML template language. With this launch, you can use Fn::ForEach in your Resources, Resource properties, Conditions, and Outputs sections of your templates. Here's an example of CloudFormation YAML script that creates four different SNS Topics with different TopicNames:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;AWSTemplateFormatVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2010-09-09&lt;/span&gt;
&lt;span class="na"&gt;Transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS::LanguageExtensions'&lt;/span&gt;
&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fn::ForEach::Topics'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;TopicName&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Success&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Failure&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Timeout&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Unknown&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SnsTopic${TopicName}'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS::SNS::Topic'&lt;/span&gt;
        &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;TopicName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;TopicName&lt;/span&gt;
          &lt;span class="na"&gt;FifoTopic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the whole announcement here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/07/accelerate-cloudformation-authoring-experience-looping-function/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/07/accelerate-cloudformation-authoring-experience-looping-function/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS CodePipeline now supports GitLab
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Aug 14, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yet one more AWS DevOps service that integrate with popular 3rd party platform. You can now use your GitLab.com source repository to build, test, and deploy code changes using AWS CodePipeline. Connect your GitLab.com account using AWS CodeStar Connections, and use the connection in your pipeline to automatically start a pipeline execution on changes in your repository.&lt;/p&gt;

&lt;p&gt;More here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/08/aws-codepipeline-supports-gitlab/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/08/aws-codepipeline-supports-gitlab/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS WAF now supports Header Order match statement for request inspection
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 5, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AWS WAF now supports the Header Order match statement, enabling customers to specify the order in which HTTP headers appear in a request. With this feature, customers can further strengthen their access control measures by verifying additional dimensions of request metadata.&lt;/p&gt;

&lt;p&gt;There is no additional cost for using this feature, however, standard AWS WAF charges still apply. It is available in all AWS Regions where AWS WAF is available and for each supported service, including Amazon CloudFront, Application Load Balancer, Amazon API Gateway, AWS AppSync, and Amazon Cognito. To learn more, see here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-waf-header-order-match-statement-request-inspection/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-waf-header-order-match-statement-request-inspection/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS KMS now supports importing asymmetric and HMAC keys
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 5, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can now import asymmetric and HMAC keys into AWS Key Management Service (AWS KMS) and use them within supported KMS-integrated AWS services and your own applications. Importing your own key gives you direct control over the generation, lifecycle management, and durability of your keys. You can control the availability of your imported keys by setting an expiration period, or deleting and re-importing them at any time. These controls help you meet your specific compliance requirements if you must generate and store copies of keys outside of AWS.&lt;/p&gt;

&lt;p&gt;Importing your own keys to AWS KMS can also be useful in situation where keys need to exist in multiple environments, including hybrid (on-premise) and multi-cloud workflows. This lets you safely migrate workloads to AWS while expanding options on how you authorize, audit, and protect keys through AWS KMS.&lt;/p&gt;

&lt;p&gt;Check more details at: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-kms-importing-asymmetric-hmac-keys/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-kms-importing-asymmetric-hmac-keys/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS introduces container image signing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 6, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;From early days of June 2023, AWS Signer and Amazon Elastic Container Registry (ECR) launched image signing, a new feature that enables you to sign and verify container images. You can now use AWS Signer to validate that only container images you have approved are deployed in your Amazon Elastic Kubernetes Service (EKS) clusters.&lt;/p&gt;

&lt;p&gt;For more information: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-container-image-signing/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-container-image-signing/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS announces AWS Payment Cryptography
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 12, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An interesting new service release touching eCommerce space! AWS announced in June 2023 a new service called AWS Payment Cryptography. This service simplifies your implementation of cryptography operations used to secure data in payment processing applications for debit, credit, and stored-value cards in accordance with various payment card industry (PCI), network, and ANSI standards and rules. Financial service providers and processors can replace their on-premises hardware security modules (HSMs) with this elastic service and move their payments-specific cryptography and key management functions to the cloud.&lt;/p&gt;

&lt;p&gt;AWS Payment Cryptography is currently available only in the following US Regions: US East (N. Virginia) and US West (Oregon).&lt;/p&gt;

&lt;p&gt;Read more about the service launch here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-payment-cryptography/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-payment-cryptography/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon Verified Permissions is now generally available
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 13, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Announced originally back in AWS re:Invent 2022, AW has now released the general availability of Amazon Verified Permissions, service for fine-grained authorization and permissions management for applications that you build. Verified Permissions uses &lt;strong&gt;&lt;a href="https://www.cedarpolicy.com/"&gt;Cedar&lt;/a&gt;&lt;/strong&gt;, an open-source language for access control, allowing you to define permissions as easy-to-understand policies. Use Verified Permissions to support role - and attribute-based access control in your applications.&lt;/p&gt;

&lt;p&gt;Read more about Amazon Verified Permissions here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-verified-permissions-generally-available/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-verified-permissions-generally-available/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS IAM Identity Center now supports automated user provisioning from Google Workspace
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Jun 13, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is big news for all organization using Google Workspace! It has earlier been possible to integrate Google Workspace to AWS IAM Identity Center and single sign-on to AWS services with Google identities, but managing those identities between Google and AWS has required either manual administrative work or additional custom integration service to be developed.&lt;/p&gt;

&lt;p&gt;The new integration features help administrators simplify AWS access management across multiple accounts while maintaining familiar Google Workspace experiences for end users as they sign in. IAM Identity Center and Google Workspace use now Google auto-provisioning to securely provision users into IAM Identity Center, saving administrative time.&lt;/p&gt;

&lt;p&gt;Read more about the new feature here: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/06/aws-iam-identity-center-automated-user-provisioning-google-workspace/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/06/aws-iam-identity-center-automated-user-provisioning-google-workspace/&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Load Balancer now supports security groups
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Released: Aug 10, 2023&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Network Load Balancers (NLB) now supports security groups, enabling you to filter the traffic that your NLB accepts and forwards to your application. Using security groups, you can configure rules to help ensure that your NLB only accepts traffic from trusted IP addresses, and centrally enforce access control policies. This improves your application's security posture and simplifies operations.&lt;/p&gt;

&lt;p&gt;To learn more, please read the announcement: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/08/network-load-balancer-supports-security-groups/"&gt;https://aws.amazon.com/about-aws/whats-new/2023/08/network-load-balancer-supports-security-groups/&lt;/a&gt; &lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>serverless</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Mastering AWS deployments with Terragrunt</title>
      <dc:creator>pmalmirae</dc:creator>
      <pubDate>Fri, 16 Sep 2022 07:22:36 +0000</pubDate>
      <link>https://dev.to/aws-builders/mastering-aws-deployments-with-terragrunt-45pj</link>
      <guid>https://dev.to/aws-builders/mastering-aws-deployments-with-terragrunt-45pj</guid>
      <description>&lt;p&gt;&lt;em&gt;Terraform offers a robust, declarative way to describe your cloud infrastructure as code. And unlike some other IaC tools, Terraform also does a decent job comparing differences between your current version of IaC code, the last deployment stored in Terraform state, and the current state of the deployed cloud resources. But when it comes to managing and deploying multiple copies of the same infrastructure, an additional tool is needed. And that tool has the name Terragrunt.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this article, I assume that you understand the basics of Terraform. If you’d like to check the Terraform basics first, this article is for you (article in Finnish): &lt;a href="https://www.nordhero.com/posts/google-cloud-terraformilla/" rel="noopener noreferrer"&gt;https://www.nordhero.com/posts/google-cloud-terraformilla/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Alright. So first, let’s get two acronyms right before moving forward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IaC&lt;/strong&gt; means Infrastructure as Code. The idea is that you don’t have to manually log in to the AWS Console and set up the infrastructure by selecting services in the console and clicking the systems up. The manual approach could be acceptable if you only had one environment for testing purposes. But if you would need to set it up again, or if you had more than one environment that should have the same resources and configurations (e.g., development, testing, and production environments), you shouldn’t try to manage those manually. Instead, you probably would like to use an IaC tool like Terraform to set up the infrastructure configurations as code to be deployed quickly and repeatably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DRY&lt;/strong&gt; means Don’t Repeat Yourself. The downside of a declarative language like Terraform is that it’s not that easy to manage variations of the same code if you need the deploy the same infrastructure with a few different flavors depending on the use case. You quickly end up making multiple copies of the infrastructure code to manage various similar kinds of deployments. And that’s what Terragrunt is here to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to keep Terraform code DRY?
&lt;/h2&gt;

&lt;p&gt;The key idea of Terragrunt is to write needed infrastructure code only once utilizing Terraform (TF files) and to separate environment-specific values as variables to be defined in Terragrunt configuration files (HCL files). I prefer a folder structure where I have the Terraform infrastructure code in the project’s infrastructure folder and Terragrunt configurations in the deployments folder.&lt;/p&gt;

&lt;p&gt;A typical Terragrunt configuration setup is to save environment-specific configuration files in a three-level folder structure that describes the AWS Accounts, AWS Regions under the accounts, and deployable environments under the regions. Each folder level can contain Terragrunt configuration files for account, region, or environment-specific variables.&lt;/p&gt;

&lt;p&gt;The following figure illustrates an imaginary deployments folder setup where one production account has production environments both in the us-east-1 (N. Virginia) and eu-central-1 (Frankfurt) regions and an additional demo environment in the eu-central-1 region. There’s also a staging environment on its own account utilizing the eu-central-1 region and a dev account with a similar setup. As I’m working with both feature development and performance testing, I have two environments deployed on my sandbox account. The sandbox-pekka-perftest environment has production-kind infrastructure resources configured on it — for example, having larger Fargate clusters or heavier EC2 instances with provisioned IOPS SSD volumes. As the performance testing environment has more resources, it also generates more costs. Therefore it is easy for me to tear down the stack when the test session ends and re-deploy it again when needed without affecting the standard sandbox environment residing on the same account and region.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7b6eypdt23ekffbiop1m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7b6eypdt23ekffbiop1m.jpg" alt="An example deployment folder structure for Terragrunt"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;An example deployment folder structure for Terragrunt&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Install Terraform and Terragrunt
&lt;/h2&gt;

&lt;p&gt;First, install Terraform on your desktop. I have macOS, and I like to use the Homebrew package manager (&lt;a href="https://brew.sh/" rel="noopener noreferrer"&gt;https://brew.sh/&lt;/a&gt;) for the job, so the commands below use &lt;strong&gt;brew&lt;/strong&gt;. There are multiple ways to install the Terraform on different operating systems, and you can find the right ingredients from &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;https://www.terraform.io/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here are the magic commands for macOS/Homebrew installation of Terraform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew tap hashicorp/tap
brew install hashicorp/tap/terraform
terraform -version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get a Terraform version info, you have succeeded in the installation.&lt;/p&gt;

&lt;p&gt;And next one goes for Terragrunt (more installation options at &lt;a href="https://terragrunt.gruntwork.io/):" rel="noopener noreferrer"&gt;https://terragrunt.gruntwork.io/):&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install terragrunt
terragrunt -version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get the Terragrunt version number, you are good to go to the next phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  The infrastructure with Terraform IaC
&lt;/h2&gt;

&lt;p&gt;Let’s first create our infrastructure code. In our example I have a small stack including only one S3 Bucket with data encryption utilizing a customer-managed KMS Key. I have included only a few lines of Terraform code here to point out how Terraform works with Terragrunt. If you wish to try out with my example code, you can find the whole Terraform code example in the &lt;a href="https://github.com/pmalmirae/terragrunt-demo" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First, create a new folder on your computer for our project (you can name it as you wish) and create a folder &lt;strong&gt;infrastructure&lt;/strong&gt; in the project root. Under the newly created folder, add all needed Terraform files. For example here is the code for the S3 bucket configurations (in file &lt;strong&gt;infrastructure/s3.tf&lt;/strong&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/******************************
S3 bucket with encryption and public access block configurations 
******************************/
# The S3 bucket
resource "aws_s3_bucket" "demo_bucket" {
  bucket_prefix = "${var.name}-${var.environment}-demo-bucket"
}
# Let's make the bucket private
resource "aws_s3_bucket_acl" "demo_bucket_acl" {
  bucket = aws_s3_bucket.demo_bucket.id
  acl    = "private"
}
/******************************
More configurations in the actual file, please check the Github repo.
******************************/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you see, I have used Terraform variables &lt;strong&gt;name&lt;/strong&gt; and &lt;strong&gt;environment&lt;/strong&gt; in the bucket’s prefix. That will let us automatically change the bucket prefix per environment. And as you might already guess, those variables are managed with Terragrunt.&lt;/p&gt;

&lt;p&gt;In addition to utilizing variables in Terraform code, we need to define the variables from Terraform perspective. Create a file named &lt;strong&gt;infrastructure/vars.tf&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/******************************
Variables to be used with the infrastructure code
******************************/
variable "name" {
  type        = string
  description = "Name of the company or the platform to build, etc." 
}
variable "environment" {
  type        = string
  description = "Name of the environment/stack"
}
/******************************
More variables in the actual file, please check the Github repo
******************************/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next to the main course!&lt;/p&gt;

&lt;h2&gt;
  
  
  Set up Terragrunt configurations
&lt;/h2&gt;

&lt;p&gt;Let’s start the Terragrunt part by creating a folder named deployments in the project root. As you remember from the DRY chapter, a typical Terragrunt configuration set has a three-level folder structure: &lt;em&gt;account/region/environment&lt;/em&gt;. So, create the folder structure under the &lt;strong&gt;deployments&lt;/strong&gt; folder for your first environment. In my example, I have created a folder structure &lt;strong&gt;deployments/sandbox-pekka/eu-central-1/sandbox-pekka&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;deployments/terragrunt.hcl&lt;/strong&gt; file is a key configuration file for Terragrunt. Please go ahead and create the file with the following contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/******************************
TERRAGRUNT CONFIGURATION
******************************/
locals {
  # Load account, region and environment variables 
  account_vars      = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  region_vars       = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  environment_vars  = read_terragrunt_config(find_in_parent_folders("env.hcl"))
  # Extract the variables we need with the backend configuration
  aws_region      = local.region_vars.locals.aws_region
  environment     = local.environment_vars.locals.environment
  state_bucket    = local.environment_vars.locals.state_bucket
  dynamodb_table  = local.environment_vars.locals.dynamodb_table
}
/******************************
Configure the Terragrunt remote state to utilize a S3 bucket and state lock information in a DynamoDB table. 
And encrypt the state data.
******************************/
remote_state {
  backend   = "s3"
  generate  = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config    = {
    bucket         = "${local.state_bucket}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "${local.aws_region}"
    encrypt        = true
    dynamodb_table = "${local.dynamodb_table}"
  }
}
/******************************
Combine all account, region and environment variables as Terragrunt input parameters.
The input parameters can be used in Terraform configurations as Terraform variables.  
******************************/
inputs = merge(
  local.account_vars.locals,
  local.region_vars.locals,
  local.environment_vars.locals,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;deployments/terragrunt.hcl&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;loads account, region, and environment level variables (we will create those files next)&lt;/li&gt;
&lt;li&gt;extracts the variables that are needed in Terragrunt backend configurations&lt;/li&gt;
&lt;li&gt;configures the Terragrunt backend to utilize the state bucket and state lock table&lt;/li&gt;
&lt;li&gt;merges all variables as input parameters to be fed to Terraform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, let’s create the needed files for Terragrunt variables. Please create the following .hcl files and replace the folder names and variable values with your own &lt;em&gt;account/region/environment&lt;/em&gt; information:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;deployments/sandbox-pekka/account.hcl&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Set AWS Account -wide variables
locals {
  account_name   = "sandbox-pekka"
  aws_account_id = "REPLACE_WITH_YOUR_ACCOUNT_ID"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;deployments/sandbox-pekka/eu-central-1/region.hcl&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Set common variables for the AWS Region
locals {
  aws_region = "eu-central-1"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;deployments/sandbox-pekka/eu-central-1/sandbox-pekka/env.hcl&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Set common variables for the environment
locals {
  name           = "nordhero"
  environment    = "sandbox-pekka"
  state_bucket   = "nordhero-terragrunt-demo-state-sandbox-pekka" # Replace with your preferred unique S3 bucket name 
  dynamodb_table = "nordhero-terragrunt-demo-locks-sandbox-pekka" # Replace with your preferred dynamodb table name
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;name&lt;/strong&gt; and &lt;strong&gt;environment&lt;/strong&gt; variables will be utilized in the Terraform code created in the chapter &lt;em&gt;The infrastructure with Terraform IaC&lt;/em&gt;. And the &lt;strong&gt;state_bucket&lt;/strong&gt; and &lt;strong&gt;dynamodb_table&lt;/strong&gt; values will be used to create the Terragrunt state bucket and state lock table for the environment.&lt;/p&gt;

&lt;p&gt;And one last thing to the Terragrunt configurations. Create one more folder and file: &lt;strong&gt;deployments/sandbox-pekka/eu-central-1/sandbox-pekka/infra/terragrunt.hcl&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/******************************
TERRAGRUNT CONFIGURATIONS
******************************/
/******************************
Include the root terragrunt.hcl configurations gathering together
the needed variables and backend configurations
******************************/
include "root" {
  path = find_in_parent_folders()
}
locals {
  # Expose the base source path
  base_source = "${dirname(find_in_parent_folders())}/..//infrastructure"
}
# Set the location of Terraform configurations
terraform {
  source = local.base_source
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the Terragrunt configuration file that we will execute later on. What it does is that it first takes in the common &lt;strong&gt;terragrunt.hcl&lt;/strong&gt; configurations we created in the &lt;strong&gt;deployments&lt;/strong&gt; folder and then configures the Terraform infrastructure folder to use with the deployment. As you see, it is possible to configure different versions of Terraform code, having different &lt;strong&gt;base_source&lt;/strong&gt; to use with different environments (not that DRY approach, thou). Or, more importantly, you could split your Terraform code into multiple modules and select which modules to deploy to this particular environment. For example, if your full stack includes the OpenSearch service (formerly ElasticSearch), but you don’t need OpenSearch in your development sandbox, you could choose to deploy all other modules but not to deploy the module that contains OpenSearch configurations.&lt;/p&gt;

&lt;p&gt;Now we are ready. You should now have the following kind of folder structure and following files in your &lt;strong&gt;deployments&lt;/strong&gt; folder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployments/terragrunt.hcl&lt;/li&gt;
&lt;li&gt;deployments/sandbox-pekka/account.hcl&lt;/li&gt;
&lt;li&gt;deployments/sandbox-pekka/eu-central-1/region.hcl&lt;/li&gt;
&lt;li&gt;deployments/sandbox-pekka/eu-central-1/sandbox-pekka/env.hcl&lt;/li&gt;
&lt;li&gt;deployments/sandbox-pekka/eu-central-1/sandbox-pekka/infra/terragrunt.hcl&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploying with Terragrunt
&lt;/h2&gt;

&lt;p&gt;Deploying the infrastructure with Terragrunt is very similar to deploying with Terraform. You have the same commands in use. The main difference is that you need to run the Terragrunt commands from the respective &lt;strong&gt;deployments/your_account/your_region/your_env/infra&lt;/strong&gt; folder you wish to deploy. So please cd to your infra folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd deployments/sandbox-pekka/eu-central-1/sandbox-pekka/infra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before running any Terragrunt commands, we must ensure we have successfully connected to the right AWS Account with AWS CLI. If you don’t already have the AWS CLI configured, please follow the instructions: &lt;a href="http://docs.aws.amazon.com/cli/latest/userguide/" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/cli/latest/userguide/&lt;/a&gt;. After configuring AWS CLI, test the connection by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a response containing your UserId, AWS Account Id, and an IAM Role ARN if successfully connected. Check once more that the Account is the one you desire to deploy infrastructure to and that you have the same information in your &lt;strong&gt;deployments/your_account/account.hcl&lt;/strong&gt; file.&lt;/p&gt;

&lt;p&gt;Next, we are ready to rock ‘n roll. Let’s first initialize the Terragrunt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terragrunt init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When running the init for the first time, Terragrunt recognizes that the S3 state bucket does not yet exist and asks whether Terragrunt should create the bucket for you. Please allow Terragrunt to create the state bucket. Terragrunt will automatically create also the state lock DynamoDB table at the same time. When everything is ready, you should receive a green response stating that the backend has been successfully configured to use S3 and that Terraform has been successfully initialized to use the hashicorp/aws provider plugin.&lt;/p&gt;

&lt;p&gt;Next, let’s plan our deployment and save the plan in the &lt;strong&gt;tfplan&lt;/strong&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terragrunt plan -out tfplan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terragrunt will now describe the resources it plans to create. Please check that there are six resources to be added and that no errors or warnings have been raised. If everything looks ok, we can next deploy the infrastructure plan saved in the &lt;strong&gt;tfplan&lt;/strong&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terragrunt apply "tfplan"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should now get a green message stating “Apply complete!” with the number of resources created, changed and destroyed.&lt;/p&gt;

&lt;p&gt;Congratulations! You have now successfully set up Terragrunt and deployed your first infrastructure stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the infrastructure to another region
&lt;/h2&gt;

&lt;p&gt;Now to the desserts. The benefits of Terragrunt start accumulating when setting up the next copy of the infrastructure. Let’s assume we would like to set up the same infrastructure for the same account but in a different region. The greatness is that we don’t need to touch the &lt;strong&gt;infrastructure/*.tf&lt;/strong&gt; files at all.&lt;/p&gt;

&lt;p&gt;What we need to do is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the current &lt;strong&gt;deployments/your_account/your_region&lt;/strong&gt; folder and rename the copied folder with the new region name, e.g. &lt;strong&gt;eu-north-1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Edit the &lt;strong&gt;deployments/your_account/new_region/region.hcl&lt;/strong&gt; file and replace &lt;strong&gt;aws_region&lt;/strong&gt; value with the new region name&lt;/li&gt;
&lt;li&gt;Edit &lt;strong&gt;deployments/your_account/new_region/your_environment/env.hcl&lt;/strong&gt; file and replace the &lt;strong&gt;state_bucket&lt;/strong&gt;, and &lt;strong&gt;dynamodb_table&lt;/strong&gt; values with new bucket and table names to store the state of the new environment&lt;/li&gt;
&lt;li&gt;Cd to the &lt;strong&gt;deployments/your_account/new_region/your_environment/infra&lt;/strong&gt; folder and repeat the &lt;strong&gt;terragrunt init/plan/apply&lt;/strong&gt; commands&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now try it yourself!&lt;/p&gt;

&lt;h2&gt;
  
  
  Last lines
&lt;/h2&gt;

&lt;p&gt;That was a nice ride! We needed to create a bunch of configuration files. Still, in a typical development project where you have multiple sandbox environments and a deployment pipeline with development, staging, production, and demo environments, it starts to pay off quickly. And in real life, you would probably like to automate the deployment pipeline from your source code repository so that, depending on the repository branch, you would automatically select the deployment folder to use with Terragrunt deployment.&lt;/p&gt;

&lt;p&gt;NordHero is there to help you set up the infrastructure, manage the multi-environment platforms and automate the deployments with your selected GitOPS platform. Give us a call/email/LinkedIn message if you would like to hear more!&lt;/p&gt;

&lt;p&gt;P.S. You can download the whole demo project in the &lt;a href="https://github.com/pmalmirae/terragrunt-demo" rel="noopener noreferrer"&gt;Github repository&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>iac</category>
      <category>terragrunt</category>
      <category>terraform</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
