<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Kneipp</title>
    <description>The latest articles on DEV Community by Daniel Kneipp (@danielkneipp).</description>
    <link>https://dev.to/danielkneipp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F557810%2F08b38316-9fa0-4493-8966-275fc2d2a621.png</url>
      <title>DEV Community: Daniel Kneipp</title>
      <link>https://dev.to/danielkneipp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danielkneipp"/>
    <language>en</language>
    <item>
      <title>Global Service on AWS</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Tue, 30 Jan 2024 20:29:21 +0000</pubDate>
      <link>https://dev.to/aws-builders/global-service-on-aws-1e2b</link>
      <guid>https://dev.to/aws-builders/global-service-on-aws-1e2b</guid>
      <description>&lt;p&gt;In a &lt;a href="https://dev.to/aws-builders/global-endpoint-for-a-multi-region-service-1oae"&gt;previous post&lt;/a&gt; I showed how you can have a multi-region service running while keeping response times low using an architectural pattern called &lt;a href="https://aws.amazon.com/blogs/architecture/improving-performance-and-reducing-cost-using-availability-zone-affinity/" rel="noopener noreferrer"&gt;Availability Zone Affinity&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, the previous design has a considerable issue: it doesn't perform a regional failover. In other words, if an entire region goes down, the service will become inoperable for the customers closer to that specific region.&lt;/p&gt;

&lt;p&gt;To overcome this problem, this project shows how &lt;a href="https://aws.amazon.com/global-accelerator/" rel="noopener noreferrer"&gt;Global Accelerator&lt;/a&gt; can be used to provide a single point of entry to your service with static IPs available globally.&lt;/p&gt;

&lt;p&gt;The code of this project is available &lt;a href="https://github.com/DanielKneipp/aws-global-endpoint" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Working with Global Accelerator
&lt;/h2&gt;

&lt;p&gt;So, to improve the previous design, we will add on top it a new entrypoint using Global Accelerator (GA). This AWS service offers a global fixed endpoint with two static IPs.&lt;/p&gt;

&lt;p&gt;When a web client uses this endpoint, its traffic is sent to the nearest point of presence of the AWS Edge network, and from there it goes through the AWS backbone instead of going through the Internet all the way to the intended resource (which can be a load balancer or an EC2 instance).&lt;/p&gt;

&lt;p&gt;GA is used by several &lt;a href="https://aws.amazon.com/global-accelerator/customers/" rel="noopener noreferrer"&gt;customers&lt;/a&gt;. Let's take &lt;a href="https://www.okta.com/" rel="noopener noreferrer"&gt;Okta&lt;/a&gt; as an example. Okta follows a multi-tenant architecture and has subdomains for their customers, and you can see for LinkedIn the GA endpoint available as a &lt;code&gt;CNAME&lt;/code&gt; record as shown in the image below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxqim2rel0i9jp2hakjf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyxqim2rel0i9jp2hakjf.png" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Feel free to test it on other &lt;a href="https://www.okta.com/customers/" rel="noopener noreferrer"&gt;Okta customers&lt;/a&gt; such as Zoom for example, to see a different GA endpoint.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AWS also provides a webpage that allows you to see the differences in response times from different regions when you use GA as opposed to going via the public Internet to reach an AWS endpoint: &lt;a href="https://speedtest.globalaccelerator.aws/" rel="noopener noreferrer"&gt;https://speedtest.globalaccelerator.aws/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the Design
&lt;/h2&gt;

&lt;p&gt;GA alone can provide the same features Route53 offers with latency-based and failover records. So it could replace all of that. However, in the code, we will keep everything deployed previously to allow some comparisons between the approaches.&lt;/p&gt;

&lt;p&gt;In summary, in the place of Route53, using GA in the design looks something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dfsa463hp743p6enxgd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dfsa463hp743p6enxgd.png" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To understand step by step how the design was made, please visit the &lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns" rel="noopener noreferrer"&gt;&lt;code&gt;aws-route53-global-dns&lt;/code&gt;&lt;/a&gt; to learn more.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A new file has been added at &lt;code&gt;aws-route53-global-dns/terraform/ga.tf&lt;/code&gt; with all the relevant code there. This change was made in a separate branch so we can keep track of changes made and leave the previous project untouched.&lt;/p&gt;

&lt;p&gt;GA follows a component hierarchy of listener -&amp;gt; endpoint group -&amp;gt; endpoint.&lt;/p&gt;

&lt;p&gt;Listeners define the port and network protocol to listen to and which endpoint groups should receive the traffic.&lt;/p&gt;

&lt;p&gt;Endpoint group describes a regional group of endpoints, which can be Application Load Balancers, EC2 instances, or in this case, Network Load Balancers (NLBs). For each endpoint you can set a weight which is used to define how the traffic to that endpoint group will be balanced.&lt;/p&gt;

&lt;p&gt;In the code you can see endpoints defined as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;endpoint_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;client_ip_preservation_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;endpoint_id&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services_eu&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"eu1"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;nlb_arn&lt;/span&gt;
  &lt;span class="nx"&gt;weight&lt;/span&gt;                         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;endpoint_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;client_ip_preservation_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;endpoint_id&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services_eu&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"eu2"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;nlb_arn&lt;/span&gt;
  &lt;span class="nx"&gt;weight&lt;/span&gt;                         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the above configuration, we are defining that the primary endpoint in the EU (&lt;code&gt;eu1&lt;/code&gt;) should receive 255/256 of the traffic. Meanwhile, the secondary endpoint used for failover received 1/256.&lt;/p&gt;

&lt;p&gt;An endpoint with weight 0 doesn't receive traffic if another endpoint group has healthy endpoints. In other words, if we had the &lt;code&gt;eu2&lt;/code&gt; weight set to 0 and &lt;code&gt;eu1&lt;/code&gt; stopped working, the traffic would failover to the other region, and not the &lt;code&gt;eu2&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;⚠️ Note: failover cluster receives a small portion of traffic (1/256 = 0,39%), which makes the design a &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html#dns-failover-types-active-active" rel="noopener noreferrer"&gt;active-active setup&lt;/a&gt;, in contrast to the active-passive configuration we had before. This has the benefit of ensuring that the failover cluster is always operational by receiving a portion of customer traffic at all times.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As another implementation detail: as of now, GA doesn't support &lt;a href="https://docs.aws.amazon.com/global-accelerator/latest/dg/preserve-client-ip-address.html" rel="noopener noreferrer"&gt;client IP preservation&lt;/a&gt; when the traffic is being forwarded to a NLB with a TLS listener. That is why you see &lt;code&gt;client_ip_preservation_enabled=false&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Only this addition is enough to be able to test GA without impacting the existing infrastructure, which shows the benefit of having a progressive design that allows improvement by composition with minimal change to existing components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the Design
&lt;/h2&gt;

&lt;p&gt;As mentioned before, a new branch &lt;code&gt;global_accelerator&lt;/code&gt; has been created on the &lt;code&gt;aws-route-53-global-dns&lt;/code&gt; project with the changes required to add the GA.&lt;/p&gt;

&lt;p&gt;In order to deploy the whole thing, just do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;aws-route53-global-dns/terraform/
tf apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deployment can take several minutes. For more information regarding the deployment procedure, please refer to &lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns?tab=readme-ov-file#deploy" rel="noopener noreferrer"&gt;this more detailed description&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The previous domain names are still working so we can test them and compare the differences. To hit the GA you can use the domain name such as &lt;code&gt;service.dkneipp.com&lt;/code&gt;. To reach the closest primary NLB you can use &lt;code&gt;www.service.dkneipp.com&lt;/code&gt;, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c6nuw4buu6bcv0xrb0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c6nuw4buu6bcv0xrb0w.png" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💡 From the domain name using the Global Accelerator, we can see the two &lt;a href="https://aws.amazon.com/global-accelerator/features/#Static_anycast_IP_addresses" rel="noopener noreferrer"&gt;AnyCast IPs&lt;/a&gt;. So, even in the event of a regional failover, the IPs of your service don't change, which allows customers to define network policies for your service only based on IPs if required.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: now &lt;code&gt;service.dkneipp.com&lt;/code&gt; works. As opposed to the limitation of &lt;code&gt;CNAME&lt;/code&gt; records, the &lt;code&gt;A&lt;/code&gt; record of type &lt;code&gt;alias&lt;/code&gt; used for the GA endpoint can overlap with the &lt;code&gt;SOA&lt;/code&gt; record of the zone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, let's do some testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failover
&lt;/h3&gt;

&lt;p&gt;To simulate an issue in one of the web servers, the instance is removed from the associated target group (as shown &lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns?tab=readme-ov-file#test-failover" rel="noopener noreferrer"&gt;here&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;After that, you can see that external web clients are seamlessly redirected to the secondary web server, as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frau622qk09iebakr2113.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frau622qk09iebakr2113.png" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is the command used to perform the test shown above:&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;2 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; curl &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s1"&gt;'Total: %{time_total}s\n'&lt;/span&gt; &lt;span class="s1"&gt;'https://service.dkneipp.com'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; +%T &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;This was also performed in the &lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns" rel="noopener noreferrer"&gt;previous project&lt;/a&gt;. The interesting addition is the cross-region failover. And after all the web servers in the region are taken down:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu65lc7jcnup1yifcpzpr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu65lc7jcnup1yifcpzpr.png" width="800" height="763"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see the failover happening automatically, and the web server on the other region starts to respond to the traffic (now with much higher response times, but still with the service operational). &lt;em&gt;However, in this case, the failover was not transparent, and the end user would have experienced issues for around 17 seconds.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And finally, once the primary web server is live again, the recovery is also automatic after a transition window, as seen below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl2w1vyyq0qfiibjfjh2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhl2w1vyyq0qfiibjfjh2.png" width="800" height="1009"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Response times
&lt;/h3&gt;

&lt;p&gt;In order to get more interesting statistics (like average with standard deviation and percentiles) over response times, I've created a small utility in Go that can be used to get those from response times of &lt;code&gt;GET&lt;/code&gt; requests to a specified URL.&lt;/p&gt;

&lt;p&gt;The code of the utility is in &lt;code&gt;http-latency-test/&lt;/code&gt;. Binaries for MacOS on ARM and x86 Linux have been built already, and you can use the &lt;code&gt;Makefile&lt;/code&gt; to build a binary from the source if required.&lt;/p&gt;

&lt;p&gt;The utility accepts the following arguments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-count&lt;/code&gt;: Max number of requests. Pass 0 to keep it running forever;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-sleep&lt;/code&gt;: The amount of time in milliseconds to wait between requests (default is 500ms);&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-url&lt;/code&gt;: The endpoint to make the request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, a simple test can be &lt;code&gt;./http-latency-test --url https://service.dkneipp.com --count 1000 --sleep 100&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With this tool, a test was performed to compare the response times of GA compared to hitting the closest NLB directly. This was done for both Europe and South America regions and the results are shown below.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Europe&lt;/th&gt;
&lt;th&gt;South America&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2w4x4uuafzlagge97u5.png" width="800" height="520"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3mrn3148rtrdzeqwjtbw.png" width="800" height="550"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The interesting thing to point out is that the Global Accelerator has the same or better response times than hitting the NLB directly. For the European region, we can see an improvement of 22% in the response times on average! 🤩&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;However, this varies by region and also depends on several networking factors, such as the location of the web client and its Internet connection conditions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As mentioned before, this improvement is due to the network path that is taken by the packets when the request is made. When using the NLB, the traffic goes through the internet before reaching the AWS resource. However, when using GA, the traffic goes through the AWS backbone as much as possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleanup
&lt;/h3&gt;

&lt;p&gt;A simple &lt;code&gt;terraform destroy&lt;/code&gt; should delete all 127 resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project shows how you can use the AWS Global Accelerator to make a service globally available, resilient to failures in availability zones and entire regions, while also keeping response times low for users worldwide.&lt;/p&gt;

&lt;p&gt;AWS Global Accelerator can be used for many other use-cases, such as &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/using-aws-global-accelerator-to-achieve-blue-green-deployments/" rel="noopener noreferrer"&gt;Blue-Green deployments&lt;/a&gt;, custom routing to build &lt;a href="https://docs.aws.amazon.com/global-accelerator/latest/dg/about-custom-routing-how-it-works.html" rel="noopener noreferrer"&gt;sessions for online games&lt;/a&gt;, or even just to provide a static IP and endpoint to customers without having to rely on DNS.&lt;/p&gt;

&lt;p&gt;I encourage you to have a look if you manage AWS environments and don't know about this service.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>networking</category>
      <category>dns</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Global Endpoint For a Multi-region Service</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Fri, 19 Jan 2024 18:45:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/global-endpoint-for-a-multi-region-service-1oae</link>
      <guid>https://dev.to/aws-builders/global-endpoint-for-a-multi-region-service-1oae</guid>
      <description>&lt;p&gt;To have a highly available web application requires several components working together. Automated health checks, failover mechanisms, backup infrastructure, you name it. And when it comes to keeping response times low globally, the complexity increases even further.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns" rel="noopener noreferrer"&gt;This project&lt;/a&gt; presents progressively how to go from a simple application available on the internet to a full-fledged setup with automated failover, together with user traffic segmented per region to keep response times low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Progressive Architecture
&lt;/h2&gt;

&lt;p&gt;Here we have three different infrastructure designs of the same service with different levels of complexity.&lt;/p&gt;

&lt;p&gt;The first one is a simple web server running on an EC2 machine.&lt;/p&gt;

&lt;p&gt;The second is a more reliable design, with another web server running in passive mode. This second web server would only respond to traffic in case the first one becomes inoperable.&lt;/p&gt;

&lt;p&gt;Finally, the last one replicates the previous design in a different region, to provide the same service to a customer base located in a geographically distance place than the previous setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  As simple as it gets
&lt;/h3&gt;

&lt;p&gt;This design, as the section implies, is as simple as it gets. Essentially, we have just a two-tier application running. We have three main components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A DNS record pointing to an &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html" rel="noopener noreferrer"&gt;AWS Network Load Balancer (NLB)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;This NLB is deployed on the public subnet to receive traffic from the internet. This first layer/tier allows the web server to not be publicly exposed to the internet, as well as load balancing between multiple servers in case more were available&lt;/li&gt;
&lt;li&gt;Finally, we have the web server itself receiving traffic only from the NLB on a private subnet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv17xmb404qcmb8dl2lpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv17xmb404qcmb8dl2lpa.png" alt="Simple Architecture" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything is deployed on the same &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones" rel="noopener noreferrer"&gt;availability zone (AZ)&lt;/a&gt;, which represents one or more AWS data centers geographically close to each other. This is to ensure fast communication between the load balancer and the web server. This also avoids &lt;a href="https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/" rel="noopener noreferrer"&gt;costs related to data transfer between AZs on AWS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The obvious problem with this design is that if the web server becomes out of service, or if the AZ becomes inoperable, the application stops responding to user requests. To overcome this, we will move to the next design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Going active-passive
&lt;/h3&gt;

&lt;p&gt;Here we replicate the same infrastructure in a different AZ. The key difference is that this secondary deployment should not receive user traffic unless the primary one stops working. This is also known as an &lt;a&gt;active-passive deployment&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguw1csdbqwbg1jjcgwyf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fguw1csdbqwbg1jjcgwyf.png" alt="Highly Available Architecture" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this scenario we use AWS Route53 records of type &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html#dns-failover-types-active-passive-one-resource" rel="noopener noreferrer"&gt;&lt;code&gt;failover&lt;/code&gt;&lt;/a&gt;. Those records work in pairs of primary and secondary records. As the name implies, the primary record should point to the server that responds to traffic under normal circumstances.&lt;/p&gt;

&lt;p&gt;The secondary DNS record will only take precedence if the primary server is not working properly, or, in other words, it is unhealthy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;And how can a server be detected as unhealthy?&lt;/em&gt; Using &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html" rel="noopener noreferrer"&gt;health checks&lt;/a&gt;. Those can be configured in a way that the load balancer will perform regular checks to confirm if the web server is working properly.&lt;/p&gt;

&lt;p&gt;If those checks fail above a certain threshold, the target (the ec2 instance is labeled as unhealthy). If a certain amount of targets are considered unhealthy, the endpoint of the load balancer itself is considered unhealthy.&lt;/p&gt;

&lt;p&gt;This status is reported to Route53 (when using records of type &lt;code&gt;alias&lt;/code&gt;), which uses this information to decide if the DNS requests should be resolved to the secondary load balancer instead, consequently, performing the failover.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ We are also making use of making use of the &lt;a href="https://aws.amazon.com/blogs/architecture/improving-performance-and-reducing-cost-using-availability-zone-affinity/" rel="noopener noreferrer"&gt;Availability Zone Affinity&lt;/a&gt; architectural pattern to, as mentioned previously, improve response times and reduce &lt;a&gt;costs&lt;/a&gt;. This way, traffic that reaches an AZ never leaves it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, now we have a more reliable design with an automated failover mechanism between distinct data centers. However, as a single-region deployment, users in a geographically distant region will suffer from slow response times. Although a &lt;a href="https://aws.amazon.com/what-is/cdn/" rel="noopener noreferrer"&gt;CDN&lt;/a&gt; exists for this use case, it is used for static assets, not for dynamic APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-region setup
&lt;/h3&gt;

&lt;p&gt;To achieve a multi-region architecture, while retaining the previous features, the strategy is still the same: replicate the previous design, this time in a different region.&lt;/p&gt;

&lt;p&gt;However, in this case, the way to connect the two is via &lt;code&gt;CNAME&lt;/code&gt; DNS record of type &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html" rel="noopener noreferrer"&gt;latency&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhymh5sqz4atpelu5ujvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhymh5sqz4atpelu5ujvq.png" alt="Complete Architecture" width="800" height="770"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The records are assigned to different AWS regions and latency is measured between users and AWS regions. The record of the region with the lowest latency is used to resolve the user's DNS requests.&lt;/p&gt;

&lt;p&gt;This design is highly available and provides fast response times to users in different regions of the world. Now, it's time to deploy it!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Note: this design provides resilience against AZ failure, but not regional outages. If the entire region fails, users of that region won't have the service operational.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;Now that we have discussed the architectural design, it is time to implement it. To have access to the source code, please go the &lt;a href="https://github.com/DanielKneipp/aws-route53-global-dns" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure as code
&lt;/h3&gt;

&lt;p&gt;Here we use terraform to allow us to define all the infrastructure via code. This allows us to perform the design replications mentioned previously with relative ease. We can define pieces of the infrastructure as modules and just reuse them as many times as we need.&lt;/p&gt;

&lt;p&gt;Here is a brief description of the main directories and files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── 📁 regional_domain/ -&amp;gt; Per-region DNS config module
├── 📁 service/         -&amp;gt; NLB + EC2 + web server module
├── 📄 dns.tf           -&amp;gt; Hosted zone and DNS module usage
├── 📄 locals.tf        -&amp;gt; Variables used for replicating server resources
├── 📄 services.tf      -&amp;gt; Loop over the variables to deploy the servers
└── 📄 vpc.tf           -&amp;gt; Network config for public/private subnets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The specifications of the service deployments are defined in &lt;code&gt;./terraform/locals.tf&lt;/code&gt; and, for each region and AZ, a service is defined. As shown below, a public subnet is passed for the NLB, and a private one is passed for the server. The subnets define in which AZ the service will be deployed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;sa1&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sa1-service"&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnet&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnet&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The name defines what you will see as a response if that server specifically responds to the request.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those variables are looped over in &lt;code&gt;services.tf&lt;/code&gt; by region via &lt;code&gt;for_each = local.services.&amp;lt;REGION&amp;gt;&lt;/code&gt;. This is a nice example of how, when using some terraform features, we can easily replicate infrastructure with next to no code duplication.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dns.tf&lt;/code&gt; defines which service deployment is the primary and secondary. DNS records are deployed on pairs of primary/secondary via the &lt;code&gt;regional_domain/&lt;/code&gt; module, together with a record of type latency associated with the region the module is being deployed to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;elb_target_primary&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;domain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services_sa&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sa1"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;nlb_domain_name&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;services_sa&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sa1"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;nlb_zone_id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those are the main components of the code. Feel free to dive into the modules to see how they are implemented. Now let's jump into how to deploy this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy
&lt;/h3&gt;

&lt;p&gt;The very first thing needed for this demo is a domain name. If you already have one, remember to configure the Registrar of your domain name to use Route53 as the DNS server once the infrastructure has been deployed&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In my case, I added the name servers assigned to the Hosted Zone as &lt;code&gt;NS&lt;/code&gt; records for the &lt;code&gt;locals.domain_name&lt;/code&gt; on Cloudflare.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And if you don't have one, remember you can also buy one from AWS itself as seen from the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foyx4aft78x50jhl4rscy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foyx4aft78x50jhl4rscy.png" alt="Buy domain name from AWS" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Just remember that this will create a Hosted Zone automatically for the domain bought. You will need to change it to make use of the new one that will be created by this project&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In order to run this project you essentially only need terraform. However, I highly suggest installing it via &lt;a href="https://asdf-vm.com/guide/getting-started.html" rel="noopener noreferrer"&gt;&lt;code&gt;asdf&lt;/code&gt;&lt;/a&gt;, as it allows you to automate the installation of several other tools and keep multiple versions of them installed at the same time.&lt;/p&gt;

&lt;p&gt;Once &lt;code&gt;asdf&lt;/code&gt; is installed, terraform can be installed in its correct version via&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;asdf plugin-add terraform https://github.com/asdf-community/asdf-hashicorp.git
asdf &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This project obviously also requires that you have properly configured access to your AWS account. If you have configured the credentials using a &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-format-profile" rel="noopener noreferrer"&gt;file with a profile name&lt;/a&gt;, the only thing needed is to change the profile name in the &lt;code&gt;providers.tf&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;And lastly, change the domain name defined in &lt;code&gt;locals.tf&lt;/code&gt; to your own.&lt;/p&gt;

&lt;p&gt;With that, you can run the following command&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd terraform &amp;amp;&amp;amp; terraform init &amp;amp;&amp;amp; terraform apply&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You should see the output:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Plan: 123 to add, 0 to change, 0 to destroy.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The deployment of all resources can take from 5 to around 10 minutes.&lt;/p&gt;

&lt;p&gt;After all has been deployed, you can already try to reach the service via &lt;code&gt;www.&amp;lt;DOMAIN-NAME&amp;gt;&lt;/code&gt;, in my case, &lt;code&gt;www.service.dkneipp.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You should see &lt;code&gt;sa1-service&lt;/code&gt;, or &lt;code&gt;eu1-service&lt;/code&gt; depending on the region you are currently in 😉. From a &lt;code&gt;dig&lt;/code&gt; command we can also identify which region is responding to the request&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Web UI&lt;/th&gt;
&lt;th&gt;DNS Resolution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frqwyikz5ua0yw3bjf01y.png" alt="Web service" width="800" height="391"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqya59m9pbddin0xe05y1.png" alt="Dig" width="800" height="513"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Also, the IP returned by the DNS resolution should match of the NLB of the primary AZ of the region you are closest to.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 There is only one IP returned because the NLB has only one AZ associated. This ensures the traffic always goes to the designated AZ unless a failure happens in that AZ.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And by using another computer closest to the other region, we can see the response from that other region&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Web UI&lt;/th&gt;
&lt;th&gt;DNS Resolution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteha4kfmy57mo29xevf9.png" alt="Web service SA" width="" height=""&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feon63pdl40fjxmttf7gn.png" alt="Dig SA" width="794" height="466"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Test failover
&lt;/h3&gt;

&lt;p&gt;In order to test if the failover from one AZ to the other is happening as expected, remove the instance from the target group associated with the primary NLB of the region you want to test. From the image below it can be seen how the instance can be removed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8hbfpj4t8n2chcpnvqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8hbfpj4t8n2chcpnvqf.png" alt="AWS Console" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also identify the correct target group by checking the listener of the primary NLB. The listener will be forwarding the traffic to the target group that should be changed.&lt;/p&gt;

&lt;p&gt;After this, the NLB has no more healthy targets and it is considered unhealthy. This status will be reported to Route53, which will automatically start resolving the DNS requests to the secondary NLB.&lt;/p&gt;

&lt;p&gt;Wait for around 2 minutes, and you should be able to see the following while trying to access the same &lt;code&gt;www.service.dkneipp.com&lt;/code&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Web UI&lt;/th&gt;
&lt;th&gt;DNS Resolution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fza43aur4pb0e24lj2wmz.png" alt="Web service failover" width="800" height="426"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu8hfqfiya9mmfuv0vmxv.png" alt="Dig failover" width="800" height="513"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now, the secondary server is responding to the traffic, as we can identify from the Web UI response and the fact that the IP of the server changed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Note: since different regions have access to, essentially, different applications, this could be used as a way to promote service upgrades segmented by region&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And now, if you add the instance back to the target group of the primary NLB, in a couple of minutes you should see the previous response back again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleanup
&lt;/h3&gt;

&lt;p&gt;A simple &lt;code&gt;terraform destroy&lt;/code&gt; should delete all 123 resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project shows a step-by-step design evolution from a basic web server available on the web, to a resilient design capable of handling failures in data centers and keeping response times low for users in different parts of the globe.&lt;/p&gt;

&lt;p&gt;A lot more can still be done, of course, like the use of multiple servers on the same AZ to handle more load or start following a microservice approach with Kubernetes or AWS ECS to handle the web server code deployment.&lt;/p&gt;

&lt;p&gt;However, the goal of this project is to show some interesting features we can use from AWS Route53 and NLBs to have a fast, reliable, and cost-effective web server spread in different parts of the world.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>infrastructureascode</category>
      <category>dns</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Your own Stable Diffusion endpoint with AWS SageMaker</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Thu, 13 Oct 2022 20:49:09 +0000</pubDate>
      <link>https://dev.to/aws-builders/your-own-stable-diffusion-endpoint-with-aws-sagemaker-1534</link>
      <guid>https://dev.to/aws-builders/your-own-stable-diffusion-endpoint-with-aws-sagemaker-1534</guid>
      <description>&lt;p&gt;&lt;a href="https://stability.ai/blog/stable-diffusion-public-release" rel="noopener noreferrer"&gt;Stable Diffusion&lt;/a&gt; is the name of a Deep Learning model created by stability.ai that allows you to generate images from their description. In short, you feed a textual scene description to the model and it returns an image that fits that description. What is cool about it is that it can generate very artistic and arguable beautiful images resembling pieces of art like the following&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz2syxl5qvzw8as3ogwq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz2syxl5qvzw8as3ogwq.png" alt="Taken from official Stable Diffusion release page" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fke9w2lgbgcoyit52m078.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fke9w2lgbgcoyit52m078.png" alt="Taken from official Stable Diffusion release page" width="800" height="590"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://github.com/DanielKneipp/aws-sagemaker-stable-diffusion" rel="noopener noreferrer"&gt;aws-sagemaker-stable-diffusion repo&lt;/a&gt; you will find everything needed in order to spin-up your own personal public endpoint with a Stable Diffusion model deployed using AWS SageMaker to show your friends 😎&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Your own Stable Diffusion endpoint with AWS SageMaker

&lt;ul&gt;
&lt;li&gt;TL;DR&lt;/li&gt;
&lt;li&gt;Setting things up&lt;/li&gt;
&lt;li&gt;Trying the model locally&lt;/li&gt;
&lt;li&gt;Going to the cloud&lt;/li&gt;
&lt;li&gt;Initial setup&lt;/li&gt;
&lt;li&gt;Deploy on AWS using SageMaker&lt;/li&gt;
&lt;li&gt;Lambda + API Gateway&lt;/li&gt;
&lt;li&gt;Clean-up&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Assuming you already have &lt;code&gt;asdf&lt;/code&gt; and &lt;code&gt;pyenv&lt;/code&gt; installed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define the bucket name (where the model will be sent) in &lt;code&gt;./terraform/variables.tf&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Here you will need to provide your huggingface credentials in order to confirm&lt;/span&gt;
&lt;span class="c"&gt;# the model license has been accepted&lt;/span&gt;
&lt;span class="nv"&gt;INSTALL_TOOLING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true &lt;/span&gt;bash setup.sh

&lt;span class="nb"&gt;cd &lt;/span&gt;terraform/ &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; terraform init &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; terraform apply &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ../
&lt;span class="nb"&gt;cd &lt;/span&gt;sagemaker/ &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; bash zip-model.sh &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ../
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda/sd-public-endpoint/ &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; bash deploy.sh &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; ../../
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The endpoint that will appear in the output can be used for the inferences. And here's how to use it&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;endpoint&amp;gt;/&lt;/code&gt;              -&amp;gt; Generates a random description and feeds it into the model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;endpoint&amp;gt;/default&lt;/code&gt;       -&amp;gt; Uses the default description "a photo of an astronaut riding a horse on mars"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;endpoint&amp;gt;/&amp;lt;description&amp;gt;&lt;/code&gt; -&amp;gt; Uses the &lt;code&gt;&amp;lt;description&amp;gt;&lt;/code&gt; as input for the model. You can use spaces here.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Setting things up
&lt;/h2&gt;

&lt;p&gt;For this repo, you will need these tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;awscli&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.8.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;terraform&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.3.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;python&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3.9.13&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For python, it's recommended to use &lt;a href="https://github.com/pyenv/pyenv-installer" rel="noopener noreferrer"&gt;&lt;code&gt;pyenv&lt;/code&gt;&lt;/a&gt;, which allows you to install several versions of python at the same time with simple commands like this: &lt;code&gt;pyenv install 3.9.13&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;For the rest, you can use a tool called &lt;a href="https://asdf-vm.com/guide/getting-started.html" rel="noopener noreferrer"&gt;&lt;code&gt;asdf&lt;/code&gt;&lt;/a&gt;, which allows basically the same but for several other tools. If you have it installed, you can install the rest with just &lt;code&gt;asdf install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To be able to clone the repo with the ML model (&lt;code&gt;stable-diffusion-v1-4/&lt;/code&gt;), we will also need &lt;a href="https://git-lfs.github.com/" rel="noopener noreferrer"&gt;&lt;code&gt;git-lfs&lt;/code&gt;&lt;/a&gt; (that allows versioning of large files). The model is defined as a submodule of this repo and after attempting to clone the submodule, you will be requested to provide your huggingface account credentials in order to clone it. This it to confirm that you have accepted the license required to have access to the model weights.&lt;/p&gt;

&lt;p&gt;And finally, &lt;a href="https://zlib.net/pigz/" rel="noopener noreferrer"&gt;&lt;code&gt;pigz&lt;/code&gt;&lt;/a&gt; is also necessary as it allows parallel gzip compression. This makes the model packaging for SageMaker run much faster.&lt;/p&gt;

&lt;p&gt;So, to do all of this, the &lt;code&gt;setup.sh&lt;/code&gt; script can be used for that. It can be used on Mac OS X or Debian-based linux distros. If you already have &lt;code&gt;asdf&lt;/code&gt; and &lt;code&gt;pyenv&lt;/code&gt; installed and want to install the rest, just run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;INSTALL_TOOLING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true &lt;/span&gt;bash setup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, if you already have the tooling, and just want &lt;code&gt;git-lfs&lt;/code&gt; and &lt;code&gt;pigz&lt;/code&gt; with the submodule cloned, please run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bash setup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Trying the model locally
&lt;/h2&gt;

&lt;p&gt;To better understand how to use the model and the inference code we'll use later on, we first should try to run the model locally. Based on the &lt;a href="https://stability.ai/blog/stable-diffusion-public-release" rel="noopener noreferrer"&gt;official release page&lt;/a&gt; you should be able to access the model weights on their &lt;a href="https://huggingface.co/CompVis/stable-diffusion-v1-4" rel="noopener noreferrer"&gt;huggingface page&lt;/a&gt;. Here we will use version &lt;code&gt;v1.4&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📄 In order to have access to the weights, you have to accept their terms of use&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On the page we can see code samples showing how we should be able to run an inference on the model. There you will see examples on how to make a standard inference, however it's recommended to have a NVIDIA GPU with at least 10GB of VRAM. To allow my weaker hardware to make an inference, I choose to test the configuration using &lt;code&gt;float16&lt;/code&gt; precision instead, which is the code that you can find at &lt;code&gt;local/code.py&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ There is also a &lt;code&gt;local/code-mac.py&lt;/code&gt; available for those that want to try it out on a mac. However, bare in mind that one inference can take several minutes (~30 min on a M1 Pro), whereas in a GPU it might take 5 seconds using the low VRAM configuration on a RTX 3070&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For my local computer, I used &lt;a href="https://github.com/NVIDIA/nvidia-docker" rel="noopener noreferrer"&gt;&lt;code&gt;nvidia-docker&lt;/code&gt;&lt;/a&gt; to allow myself to run the code inside a container. This way I don't have to worry about installing the right version of CUDA (relative to pytorch) on my machine.&lt;/p&gt;

&lt;p&gt;So, going to the fun part, to make an inference, you just need to run &lt;code&gt;cd local &amp;amp;&amp;amp; bash build-run.sh&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All script in this repo assume that they are being executed from their own directories&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The whole execution can take several minutes on the first time in order to build the container image with the model inside. And after everything has finished, you should be able to see image generated at &lt;code&gt;local/output/output.png&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You might see something like this on the default description of &lt;code&gt;"a photo of an astronaut riding a horse on mars. VFX, octane renderer"&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc4i0g2lq6raoll4kmf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc4i0g2lq6raoll4kmf2.png" alt="Stable Diffusion local output" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And you can change the description by changing the content of the &lt;code&gt;prompt&lt;/code&gt; variable in &lt;code&gt;local/code.py&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Going to the cloud
&lt;/h2&gt;

&lt;p&gt;Cool, so now that we have the inference code working, is time to put this in the cloud to, ultimately, make it available to others. Let's start by talking about the resource we'll need and the overall architecture.&lt;/p&gt;

&lt;p&gt;Firstly, about how the ML model will be executed, although we could simply spin-up an EC2 instance and attach a web-server to it to receive the images as requests, we will use &lt;a href="https://aws.amazon.com/pt/sagemaker/" rel="noopener noreferrer"&gt;AWS SageMaker&lt;/a&gt;, which allows us to do exactly that and much more in a managed way (it means that several components, e.g. the web-server implementation, will be managed by AWS). SageMaker will manage the EC2 instance with a GPU and will give us a &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;private&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt; endpoint to interact with the model. Since we are using huggingface, both have a nice integration and you can learn more about it from the &lt;a href="https://huggingface.co/docs/sagemaker/inference" rel="noopener noreferrer"&gt;huggingface docs about SageMaker&lt;/a&gt; or the &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html" rel="noopener noreferrer"&gt;AWS docs about huggingface&lt;/a&gt; 🤝.&lt;/p&gt;

&lt;p&gt;However, one thing to notice is that SageMaker will provide a private endpoint only accessible if you have access to the AWS account associated with the resource. Since we want a public endpoint, we need to put on top of it a &lt;a href="https://aws.amazon.com/lambda/?nc1=h_ls" rel="noopener noreferrer"&gt;lambda&lt;/a&gt; that will forward requests from an &lt;a href="https://aws.amazon.com/api-gateway/" rel="noopener noreferrer"&gt;API Gateway&lt;/a&gt;. This combination follows the recommended approach from an official &lt;a href="https://aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/" rel="noopener noreferrer"&gt;AWS blog post&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ One import limitation of API Gateway to notice is the hard limit of &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html#http-api-quotas" rel="noopener noreferrer"&gt;30 seconds timeout&lt;/a&gt; on the request. Because of this, the same low VRAM configuration (&lt;code&gt;float16&lt;/code&gt; precision) was needed in order to guarantee a lower response time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Putting it all together, the architecture goes like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8f4o5d3387on7qvf1e6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8f4o5d3387on7qvf1e6.png" alt="Overall architecture" width="758" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Showing all resources needed and from which directory each resource is managed. Here is a brief description:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Directory&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;lambda&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Has the lambda + API Gateway implementation using a framework called &lt;a href="https://aws.github.io/chalice/#" rel="noopener noreferrer"&gt;chalice&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sagemaker&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Contains the python code to manage the SageMaker Model, Endpoint, and the custom inference code. It also has the script used to pack and send the model to an S3 bucket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;terraform&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Manages the S3 bucket itself and the IAM roles required by the lambda code (to access the SageMaker endpoint) and for the Sagemaker endpoint (to access the model on the S3 bucket)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Initial setup
&lt;/h3&gt;

&lt;p&gt;Before anything, we need to deploy the roles and the S3 bucket mentioned in order to set the stage to the rest of the resources. You can configure the bucket name used in the file &lt;code&gt;terraform/variables.tf&lt;/code&gt;. After that, you can run (assuming that you already have the access to your AWS account properly configured):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;terraform
terraform init
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In total 11 resources should be created. In this process, a couple of &lt;code&gt;.txt&lt;/code&gt; will be created. Those files will be used by other parts of this repo to get the value of AWS ARNs and the bucket name used&lt;/p&gt;

&lt;p&gt;A brief explanation of the roles is as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Name on IAM&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Endpoint Access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;lambda-sagemaker-access&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;This role basically allows the lambda to execute properly (with the &lt;code&gt;AWSLambdaBasicExecutionRole&lt;/code&gt; managed policy) and also allows it to invoke a SageMaker endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Full Access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sagemaker-admin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;This one has the &lt;code&gt;AmazonSageMakerFullAccess&lt;/code&gt; managed policy attached to it, which allows, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-create-execution-role" rel="noopener noreferrer"&gt;among other things&lt;/a&gt;, for the SageMaker endpoint to access the S3 bucket to load the model, and also to publish it own logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Deploy on AWS using SageMaker
&lt;/h3&gt;

&lt;p&gt;With the IAM roles and the S3 bucket in place, now it the time to create the SageMaker endpoint itself. For that, we use the python &lt;a href="https://github.com/aws/sagemaker-huggingface-inference-toolkit" rel="noopener noreferrer"&gt;package from AWS&lt;/a&gt; which allow us to use Transformers models with the huggingface SDK.&lt;/p&gt;

&lt;p&gt;However, since we will use diffusion type of model, and not transformers, the default inference code will need to be overwritten to use the &lt;a href="https://github.com/huggingface/diffusers" rel="noopener noreferrer"&gt;&lt;code&gt;diffusers&lt;/code&gt;&lt;/a&gt; package (also from huggingface), just like we do in the &lt;code&gt;local/code.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In order to overwrite it, the &lt;a href="https://github.com/aws/sagemaker-huggingface-inference-toolkit#-user-defined-codemodules" rel="noopener noreferrer"&gt;package readme&lt;/a&gt; has some general information about it, and also there is an example in &lt;a href="https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb" rel="noopener noreferrer"&gt;this jupyter notebook&lt;/a&gt;. We are doing what is necessary via the files inside &lt;code&gt;sagemaker/code&lt;/code&gt;, which has the inference code following SageMaker requirements, and a &lt;code&gt;requirements.txt&lt;/code&gt;, that has the necessary dependencies that will be installed when the endpoint gets created&lt;/p&gt;

&lt;p&gt;With the inference code ready, it's time to ship the model with the code to the created S3 bucket (as the SageMaker endpoint will access the model via this bucket). For this, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;sagemaker/
bash zip-model.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Keep in mind that this process will send around 4.2GB of data to S3. Just remember that cost you are willing to pay 😉&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With that, we should have the what we need in the bucket. Now, let's create the SageMaker endpoint. To manage the endpoint, as stated before, we use the &lt;a href="https://github.com/aws/sagemaker-huggingface-inference-toolkit" rel="noopener noreferrer"&gt;SageMaker toolkit for huggingface&lt;/a&gt;. As such, we have three python scripts that serve as functions to create, use and delete the endpoint.&lt;/p&gt;

&lt;p&gt;So let's go ahead and create the endpoint with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements  &lt;span class="c"&gt;# To install the sagemaker pkg&lt;/span&gt;
python sagemaker-create-endpoint.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this the endpoint will be created, and also a file &lt;code&gt;endpoint-name.txt&lt;/code&gt; so that the other python scripts can keep its reference. When the endpoint is ready, you will be able to see it in the AWS console like the following image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeqtzpnqpx531am4nx8d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkeqtzpnqpx531am4nx8d.png" alt="AWS console with SageMaker endpoints" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we should be ready to run an inference. To do it, execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bash sagemaker-use-endpoint.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we should get a cool image like this one in &lt;code&gt;sagemaker/output/image.jpg&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xc6b2h9fkn1lmqdzgv6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xc6b2h9fkn1lmqdzgv6.png" alt="Darth vader dancing on top of the millennium falcon" width="512" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it's nice that we just this amount of code, now we have a deployment of a ML model server with metrics, logs, health checks, etc. available, as we can see in the following images.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metrics&lt;/th&gt;
&lt;th&gt;Logs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi56y85lyy4onjugn7mpq.png" alt="AWS SageMaker metrics" width="800" height="473"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1vc30uy14i24tedb0b1.png" alt="AWS SageMaker logs" width="800" height="473"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Yes, to get that cool image from Darth Vader, lots of attempts were required 😅&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So far so good, we have a python script capable of make inference on a the Stable Diffusion model, but this python script interacts directly to your AWS resources. To achieve public access, it's time to work on the Lambda + API Gateway&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda + API Gateway
&lt;/h3&gt;

&lt;p&gt;For this part we had couple of options to choose from, like &lt;a href="https://www.serverless.com/framework/docs/providers/aws/events/http-api" rel="noopener noreferrer"&gt;serverless&lt;/a&gt; and &lt;a href="https://github.com/Miserlou/Zappa?ref=thechiefio" rel="noopener noreferrer"&gt;Zappa&lt;/a&gt;, both being serverless frameworks that would allow us to deploy a serverless app (fulfilling the lambda + API Gateway combo).&lt;/p&gt;

&lt;p&gt;However, for this project we will go with &lt;a href="https://aws.github.io/chalice/index.html" rel="noopener noreferrer"&gt;chalice&lt;/a&gt;, which is a serverless framework for python designed from AWS itself that resembles the API design of &lt;a href="https://flask.palletsprojects.com/en/2.2.x/" rel="noopener noreferrer"&gt;Flask&lt;/a&gt;. Since I never heard about it before I'll give it a try.&lt;/p&gt;

&lt;p&gt;The code for the serverless app it's all inside &lt;code&gt;lambda/&lt;/code&gt;. To manage the &lt;code&gt;chalice&lt;/code&gt; app, do&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;lambda
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cd &lt;/span&gt;sd-public-endpoint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, you will have the chalice tool installed and will be able to deploy the app with&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
bash deploy.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script will get the proper IAM role and deploy the app.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ You need to install the dependencies of the serverless app because the dependencies are packaged locally and sent with the app code when publishing the lambda&lt;/p&gt;

&lt;p&gt;⚠️ Just a reminder about that API Gateway has a non-configurable 30 seconds timeout that might impact your inferences depending on the configuration of the Model. However it seems there is a &lt;a href="https://stackoverflow.com/a/71778537/4097211" rel="noopener noreferrer"&gt;workaround for API Gateway&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And now you should have a public endpoint to access the Model, you should be able to use the app like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Page&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The home page, just for testing&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffzhn4yxqer2cc9ylgxel.png" alt="App Index" width="800" height="465"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/inference/&amp;lt;text&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;You can type any text replacing &lt;code&gt;&amp;lt;text&amp;gt;&lt;/code&gt; and that will become the input for the model, Go wild 🦄 (and you can use spaces)&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpqniymw02bru5djizogo.gif" alt="App manual inference" width="" height=""&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/api/inference/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;And that were the fun actually begins!! If you leave the &lt;code&gt;&amp;lt;text&amp;gt;&lt;/code&gt; empty, a &lt;a href="https://pypi.org/project/essential-generators/" rel="noopener noreferrer"&gt;random sentence generator&lt;/a&gt; will be used to automatically generate the input for the Model. Let programs "talk" to each other and you will be able to use the sentence used below the image&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxfk9b3jfh4ghlmddrb4.png" alt="App Random Inference" width="800" height="473"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Clean-up
&lt;/h2&gt;

&lt;p&gt;To clean up everything (and avoid unwanted AWS costs 💸), you can do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete the serverless app&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda/sd-public-endpoint/
chalice delete

&lt;span class="c"&gt;# Delete the SageMaker resources&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../../sagemaker
python sagemaker-delete-endpoint.py
bash delete-model.sh

&lt;span class="nb"&gt;cd&lt;/span&gt; ../terraform/
terraform destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you will only need to delete the related CloudWatch Log groups via the console. Those will be &lt;code&gt;/aws/lambda/sd-public-endpoint-dev&lt;/code&gt; and &lt;code&gt;/aws/sagemaker/Endpoints/huggingface-pytorch-inference&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So here create our own personal cloud deployment of the &lt;a href="https://www.businessinsider.com/stable-diffusion-stability-ai-1b-funding-round-midjourney-dalle-openai-2022-10?international=true&amp;amp;r=US&amp;amp;IR=T" rel="noopener noreferrer"&gt;popular&lt;/a&gt; Stable Diffusion ML model and played a little with several cloud resources in the process.&lt;/p&gt;

&lt;p&gt;The architecture works fine with play around, but an &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html" rel="noopener noreferrer"&gt;asynchronous implementation&lt;/a&gt; might become necessary to allow inferences that take more than 30 seconds, and also for batch processing.&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>deeplearning</category>
      <category>aws</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>App with self-contained infrastructure on AWS</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Sun, 02 Oct 2022 20:39:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/app-with-self-contained-infrastructure-on-aws-l22</link>
      <guid>https://dev.to/aws-builders/app-with-self-contained-infrastructure-on-aws-l22</guid>
      <description>&lt;p&gt;Here is an example app with self-contained infrastructure-as-code. With less places to go, the developers has access to not only to the app, but also relevant parts of the infrastructure used to deploy it, allowing him/her to evaluate and change the deployment configuration.&lt;/p&gt;

&lt;p&gt;The repo with the implementation is available at &lt;a href="https://github.com/DanielKneipp/aws-self-infra-app" rel="noopener noreferrer"&gt;https://github.com/DanielKneipp/aws-self-infra-app&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
App with self-contained infrastructure on AWS

&lt;ul&gt;
&lt;li&gt;Context&lt;/li&gt;
&lt;li&gt;Overall Architecture&lt;/li&gt;
&lt;li&gt;Platform Side&lt;/li&gt;
&lt;li&gt;
Giving Github access to AWS

&lt;ul&gt;
&lt;li&gt;How to get thumbprint value&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Giving AWS App Runner Access to Github&lt;/li&gt;

&lt;li&gt;Wrap-up&lt;/li&gt;

&lt;li&gt;Development Side&lt;/li&gt;

&lt;li&gt;Code quality checks&lt;/li&gt;

&lt;li&gt;Sync infrastructure&lt;/li&gt;

&lt;li&gt;

Trigger the app deployment

&lt;ul&gt;
&lt;li&gt;Why API-triggered deployment?&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Putting everything together&lt;/li&gt;

&lt;li&gt;Update procedure&lt;/li&gt;

&lt;li&gt;About how to refine the AWS role for Github&lt;/li&gt;

&lt;li&gt;Clean-up&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;More often than not, we see in companies developers struggling understating how their apps are actually being deployed. Usually they're limited to their local deployments with tools like &lt;code&gt;docker-compose&lt;/code&gt; and testing environments, which can diverge significantly form production environments (where they some times don't have access to).&lt;/p&gt;

&lt;p&gt;At the same time, the platform/infrastructure team responsible to prepare the production environments, don't have specific knowledge about individual apps, forcing them to apply sensible defaults that might work, but won't be optimized configurations for each app.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This creates the infamous Dev and Ops silos, where those two teams struggle to communicate to achieve production-ready and optimized deployments.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;DevOps practices should help avoid this exact situation. Through automation, here is shown how an application deployment could be managed by developers directly from their own app repository.&lt;/p&gt;

&lt;p&gt;This brings the infrastructure and deployment configuration closer to where the know-how about the app is: the development team. This way they can take ownership and control not only the app itself, but how it should deployed, including which cloud resources it needs in an independent fashion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overall Architecture
&lt;/h2&gt;

&lt;p&gt;The way we're going to achieve that is by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preparing a base infrastructure that allows automation pipelines to manage resources in the cloud&lt;/li&gt;
&lt;li&gt;Creating a pipeline that allows the management of the necessary cloud resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows us to have a clear separation of duties between teams (platform team manages cloud governance and development team manages the workloads), while given the development team arguably all control of their app (code and infrastructure).&lt;/p&gt;

&lt;p&gt;Translating this strategy to the directories we have in this repo, this is what we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---------------+    +------------------+
| Development   |    | Platform         |
|               |    |                  |
|   +---------+ |    |   +------------+ |
|   | aws     | |    |   | terraform  | |
|   +---------+ |    |   +------------+ |
|               |    |                  |
|   +---------+ |    +------------------+
|   | app     | |
|   +---------+ |
|               |
|   +---------+ |
|   | .github | |
|   +---------+ |
|               |
+---------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The breakdown is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Directory&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;terraform/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Contains the the necessary infrastructure to allow the Github Workflow (our automation tool for this example) to access and manage resources in AWS (our cloud provider). It usually would be in an independent repo managed by the platform team, but we will keep it here to for simplicity.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;aws/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Has the Cloudformation stack definition (the cloud resources management tool used by the Development, which can, and in this case will, diverge from the tool used by the platform team). There we can see the cloud resources used by the app (an &lt;a href="https://aws.amazon.com/pt/apprunner/" rel="noopener noreferrer"&gt;AWS App Runner&lt;/a&gt;), compute resources required, listening port, how it should be built and executed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The app itself, which is just a python Front-end app that has one static index page and one dynamic page.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.github/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Contains the Github workflow used to sync the code in  &lt;code&gt;main&lt;/code&gt;  branch (app and its infrastructure) with the AWS account.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In a nutshell, the way that it works is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Github workflow assumes an AWS role provided by the platform team&lt;/li&gt;
&lt;li&gt;The workflow deploys and keeps the Cloudformation stack with the App Runner service up to date&lt;/li&gt;
&lt;li&gt;And if something change in the app code, the workflow also trigger the service redeployment via AWS API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are more of a visual learner, here is a diagram of how the whole integration is suppose to work, together with the separation of duties:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n8sw14n43kr6qnrpmzc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n8sw14n43kr6qnrpmzc.png" alt="Architecture diagram" width="711" height="672"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So let's jump into the implementation details!&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform Side
&lt;/h2&gt;

&lt;p&gt;As state before, the platform responsibility will be to provide a way for the Github workflow to mange the app's infrastructure. And also, one thing that hasn't being mentioned, App Runner needs to have access to the code repo in order to deploy the app. Both will be addressed by the platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Giving Github access to AWS
&lt;/h3&gt;

&lt;p&gt;Following security best practices, we will use a AWS role assumed by the workflow to interact with the cloud resources. This will avoid the usage of static credentials that could be leaked and have the time-consuming manual task of having to be rotated regularly.&lt;/p&gt;

&lt;p&gt;To give permissions to the workflow without the use of any kind of static credentials, we will set-up on AWS an identity provider that will allow the workflows of this repo (&lt;code&gt;main&lt;/code&gt; branch specifically) to assume a role on AWS. This restriction is defined as a role condition with the verification of the attribute value of &lt;code&gt;token.actions.githubusercontent.com:sub&lt;/code&gt; (&lt;code&gt;terraform/iam-gh.tf&lt;/code&gt; file). Pattern can be used to give a more broad permission to all branches or to all repos of the owner. You can go to the &lt;a href="https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services" rel="noopener noreferrer"&gt;Github docs&lt;/a&gt; to know more about the OIDC configurations (and &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; is also the AWS documentation about the subject)&lt;/p&gt;

&lt;p&gt;With the role defined, it's only a matter of defining a policy that will allow the workflow to do its job and associate it with the role. In this example, we're giving very broad permissions (basically Cloudformation and App Runner administrator permissions) to keep the policy small and simple.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;However, in practice, you would want to restrict this policy to specific actions (maybe deny &lt;code&gt;Delete*&lt;/code&gt; actions) and to resources of a specific app (using patter matching to only allow the manipulation of a cloudformation stack that has a specific name)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To put everything into action, just run &lt;code&gt;cd terraform/ &amp;amp;&amp;amp; terraform apply&lt;/code&gt;. 6 resources will be created in total. The one value you will need is the output &lt;code&gt;gh_role_arn&lt;/code&gt;, which is the ARN of the role and will be used by the workflow for authentication. You will be able to see the ARN after applying the code, or also you can run &lt;code&gt;terraform output gh_role_arn&lt;/code&gt; after the fact. This value needs to be shared with the Development team so they can configure the workflow with the properly.&lt;/p&gt;

&lt;h4&gt;
  
  
  How to get thumbprint value
&lt;/h4&gt;

&lt;p&gt;One configuration that worth a bit of explanation of is Thumbprint that needs to be specified for the OIDC identity provider. That is basically the fingerprint of the certificate of Github. Basically, AWS needs this in order to trust on Github as an identity provider (given the current certificate used by Github's domain).&lt;/p&gt;

&lt;p&gt;AWS has a &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc_verify-thumbprint.html" rel="noopener noreferrer"&gt;step-by-step guide&lt;/a&gt; on how to get this value. And here is the automate script for the process described there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The issuer can be obtained at https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect&lt;/span&gt;
&lt;span class="nv"&gt;issuer_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'https://token.actions.githubusercontent.com/.well-known/openid-configuration'&lt;/span&gt;
&lt;span class="nv"&gt;servername&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;issuer_url&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.jwks_uri'&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'s|^[^/]*//||'&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'s|/.*$||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;cert_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; | openssl s_client &lt;span class="nt"&gt;-servername&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;servername&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-showcerts&lt;/span&gt; &lt;span class="nt"&gt;-connect&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;servername&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;:443 2&amp;gt;/dev/null&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;last_cert&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;cert_info&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'/-----BEGIN CERTIFICATE-----/{s=""} {s=s$0"\n"} /-----END CERTIFICATE-----/{cert=s} END {print cert}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;last_cert&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | openssl x509 &lt;span class="nt"&gt;-fingerprint&lt;/span&gt; &lt;span class="nt"&gt;-noout&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"="&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'s/://g'&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;'[:upper:]'&lt;/span&gt; &lt;span class="s1"&gt;'[:lower:]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Giving AWS App Runner Access to Github
&lt;/h3&gt;

&lt;p&gt;So far that we allowed Github to talk with AWS. Now it's time to allow AWS to talk with Github. This is something specific for &lt;a href="https://aws.amazon.com/apprunner/?nc1=h_ls" rel="noopener noreferrer"&gt;App Runner&lt;/a&gt;, which is just one of many ways to deploy a production-ready web application on AWS. With automatic auto-scaling, logging management and ingress configuration with public domain, we will go with this one for today 😅&lt;/p&gt;

&lt;p&gt;App Runner allows us to deploy services using container images or directly from the code. For the sake of simplicity, we will go with the latter in this example to simplify the resources required on AWS by the development team and the upgrade procedure for the app.&lt;/p&gt;

&lt;p&gt;To allow App Runner to access the code on Github, "AWS Connector for Github" needs to be installed in your Github account. In order to do that, follow the below steps&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step number&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Go to App Runner console page and click on "Create an App Runner service"&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyunsl2fdb5d9om1lqiw7.png" alt="Github connection - step 1" width="800" height="553"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Select "Source code repository" and "Add new" to create a connection to a Github account or organization&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vluudra4mwbebgaetx5.png" alt="Github connection - step 2" width="800" height="553"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;A new windows will popup. There you can give the connection a name and select "Install another" to initiate the installation of the AWS app for Github&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ghleu9w8qrrekt55lwj.png" alt="Github connection - step 3" width="800" height="903"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;After logged in your Github account, review and authorize the connector to perform its duties&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feaeb3jbzjyluf4y49059.png" alt="Github connection - step 4" width="800" height="903"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;After authorizing the app, confirm its permissions and confirm the AWS app installation&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifjnmdbrvmr6khobaayb.png" alt="Github connection - step 5" width="800" height="1060"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;The Github connector has been installed and now it can be selected. Select it and press "Next" to close the window&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vysyxlgmh1p56s8mi8p.png" alt="Github connection - step 6" width="800" height="579"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Now go the Github connections on the top-left of the previous page&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pm3wxxv2tqqzj14qepz.png" alt="Github connection - step 7" width="800" height="591"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;There you can see the ARN of the connector, &lt;em&gt;which will also be required by the development team&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbntu5cncnr5ntvkj6ov.png" alt="Github connection - step 8" width="800" height="591"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: this process could by partially automated by the use of a terraform resource called &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/apprunner_connection" rel="noopener noreferrer"&gt;&lt;code&gt;aws_apprunner_connection&lt;/code&gt;&lt;/a&gt;, however, the connector installation on your Github account would still be required to be performed manually (steps 4 and 5)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Wrap-up
&lt;/h3&gt;

&lt;p&gt;After those two actions, three pieces of information need to be shared with the development team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The AWS region name available&lt;/li&gt;
&lt;li&gt;The ARN of the role that will be used by the Github workflow&lt;/li&gt;
&lt;li&gt;The ARN of the Github connector created for App Runner&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Development Side
&lt;/h2&gt;

&lt;p&gt;With the base infrastructure ready to go, now it's time to talk about the automated pipeline with Github Actions. In summary, it has three main functionalities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run code quality checks with &lt;code&gt;pre-commit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sync infrastructure changes made on &lt;code&gt;aws/app-template.yaml&lt;/code&gt; with the existing cloud resources&lt;/li&gt;
&lt;li&gt;Trigger the app redeployment if changes happened inside &lt;code&gt;app/&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The workflow only runs on the &lt;code&gt;main&lt;/code&gt; branch, but you can expand the logic steps to the process you and your team might be following.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's talk about each individual functionality&lt;/p&gt;

&lt;h3&gt;
  
  
  Code quality checks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://pre-commit.com/" rel="noopener noreferrer"&gt;&lt;code&gt;pre-commit&lt;/code&gt;&lt;/a&gt; is a really good tool to avoid mistakes before committing (hence, the name). Locally the developer should run &lt;code&gt;pre-commit install&lt;/code&gt; after cloning the repo and installing the &lt;code&gt;pre-commit&lt;/code&gt; tool. After this, automatic checks will be execute right after running a &lt;code&gt;git commit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In this repo we have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some standard checks for empty new line as end of file and trailing white spaces. These will be executed on all files of the repo.&lt;/li&gt;
&lt;li&gt;A python formatter called &lt;code&gt;black&lt;/code&gt;, since our front-end app is made in python.&lt;/li&gt;
&lt;li&gt;A linter for our AWSCloudformation stack called &lt;a href="https://github.com/aws-cloudformation/cfn-lint" rel="noopener noreferrer"&gt;&lt;code&gt;cfn-lint&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Security checks for the Cloudformation stack using &lt;a href="https://github.com/stelligent/cfn_nag" rel="noopener noreferrer"&gt;&lt;code&gt;cfn-nag&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;In the future a cost estimation could be made using the &lt;a href="https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudformation/estimate-template-cost.html" rel="noopener noreferrer"&gt;aws cli&lt;/a&gt;. However, App Runner is not currently supported by the cost estimator. Potential command for future reference: &lt;code&gt;aws cloudformation estimate-template-cost --template-body file://aws/app-template.yaml --parameters "ParameterKey=RepoUrl,ParameterValue='',UsePreviousValue=true" "ParameterKey=GithubConnArn,ParameterValue='',UsePreviousValue=true"&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Normally those checks are executed in the local machine right before the commit, however, since in some cases the developer forgets to run &lt;code&gt;pre-commit install&lt;/code&gt;, those checks are also executed in the workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sync infrastructure
&lt;/h3&gt;

&lt;p&gt;This is the moment where the Cloudformation stack gets updated (in case of a change and the stack already exists) or created (in case there is no stack with the specified name).&lt;/p&gt;

&lt;p&gt;In order to achieve this, AWS credentials need to be properly configured. Here we use a handy Github action called &lt;a href="https://github.com/aws-actions/configure-aws-credentials" rel="noopener noreferrer"&gt;&lt;code&gt;configure-aws-credential&lt;/code&gt;&lt;/a&gt;, from AWS itself. You can also read more about the &lt;a href="https://github.com/aws-actions/configure-aws-credentials" rel="noopener noreferrer"&gt;many methods of authentication&lt;/a&gt; available. This step requires the &lt;code&gt;AWS_REGION&lt;/code&gt; and &lt;code&gt;AWS_ROLE_ARN&lt;/code&gt; secrets to be properly configured in the repo, both of which that should be shared by the platform team.&lt;/p&gt;

&lt;p&gt;With the authentication in order, it's time to trigger the Cloudformation creation/update. &lt;a href="https://github.com/aws-actions/aws-cloudformation-github-deploy" rel="noopener noreferrer"&gt;&lt;code&gt;aws-cloudformation-github-deploy&lt;/code&gt;&lt;/a&gt; action is used for that. Although the action was archived, it stills works just fine, so I'll keep it for now. This action will deploy our &lt;code&gt;aws/app-template.yaml&lt;/code&gt; stack, which has the configuration to build and run the service, as well as its resource usage.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⭐️ Notice that this empowers the developers to configures many more things about the deployment (please, see &lt;a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-apprunner-service.html" rel="noopener noreferrer"&gt;AWS docs&lt;/a&gt; to see other App Runner configurations). Also, since this is Cloudformation template, with given enough permissions, the development team could specify many more cloud resources that the app might need, like dedicated message queues, S3 buckets, Databases, etc.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Trigger the app deployment
&lt;/h3&gt;

&lt;p&gt;In case the python code has changed, AWS cli tool is installed to trigger the redeployment of the App Runner service, essentially, updating it.&lt;/p&gt;

&lt;p&gt;The script first get the ARN of the App Runner service from one of the outputs of the Cloudformation Stack. With that it triggers the redeployment&lt;/p&gt;

&lt;h4&gt;
  
  
  Why API-triggered deployment?
&lt;/h4&gt;

&lt;p&gt;You might be wondering why the App Runner service hasn't been configured with automatic deployments enabled? &lt;em&gt;Because if we had enable it, every change on the infrastructure code would yeld a failed workflow execution.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;With automatic deployments enabled, the App Runner service would enter in a state of "Operation in progress" (which can take several minutes) on every code change. If the workflow tries to change the App Runner service (part of the infrastructure code) while the service is in that state, Cloudformation receives an error like &lt;code&gt;"Resource handler returned message: "Service cannot be updated in the current state: OPERATION_IN_PROGRESS. ..."&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So, in order to avoid that, the app redeployment/update can only happen after an infrastructure change has concluded. This way the infrastructure can be changed without worrying about if the service is in a "Operation in progress" state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting everything together
&lt;/h2&gt;

&lt;p&gt;Now we have everything in place. Just pushing this repo to Github (with the proper name based on the AWS role), and the workflow should be able to create the Cloudformation stack. If everything goes according to the plan, you should be able to see the Cloudformation console similar to this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zoyp2c70hrv0jsmtxaa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zoyp2c70hrv0jsmtxaa.png" alt="AWS Cloudformation" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And going to the "Outputs", you should be able to see the url of the app. If the app is running properly, you will see:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Page Description&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Index page&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzylucuohavui2dkgeh6.png" alt="App home" width="800" height="547"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic page accessed via custom path&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4ex8zw39667oleflo0r.png" alt="App dynamic page" width="800" height="547"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And you can also see general information about the app just deployed in the AWS App Runner console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef8be2l2hdzryztee1sa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef8be2l2hdzryztee1sa.png" alt="AWS App Runner" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There you be able to see lots of useful information, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logs of the deployment/update procedure, as well as the logs of the app itself&lt;/li&gt;
&lt;li&gt;Metrics of app usage like request count (with status code), compute resource usage and request latency, to name a few&lt;/li&gt;
&lt;li&gt;Activity log with the changes applied to the service, allowing later auditing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And since those logs and metrics are integrated with Cloudwatch, you can later filter query and post-process that telemetry data as you wish. Here is an example of the cpu usage of several different instances of the app deployed on Cloudwatch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdfe171wfpxk7b6w10ep.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdfe171wfpxk7b6w10ep.png" alt="AWS metrics" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And to see the app logs, you can do the following:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Go to the available Log Groups on CloudWatch. There you will spot log groups associated with the &lt;code&gt;my-app-service&lt;/code&gt; app: &lt;code&gt;application&lt;/code&gt; and &lt;code&gt;service&lt;/code&gt;. &lt;code&gt;service&lt;/code&gt; is related build process performed on deployment/update events. We want to access the &lt;code&gt;application&lt;/code&gt; logs&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzcgdt15ybov5m75rf2x.png" alt="AWS logs - step 1" width="800" height="473"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;There you can add a new column to sort by creation date the streams available. This way you will easily get the latest logs&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7wzuqdvlbk9jlygr3x0.png" alt="AWS logs - step 2" width="800" height="473"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Clicking on on of those streams you should be able to see the logs of the python web server&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4idpc8f2knpipgus9va.png" alt="AWS logs - step 3" width="800" height="473"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: The log groups created by App Runner don't have a retention period. Remember to configure a retention period on your log groups to avoid unnecessary recurrent costs 😉&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because of those features and others that could be enabled (like tracing), App Runner shows itself to be a quick way to deploy reliable and maintainable web apps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Update procedure
&lt;/h3&gt;

&lt;p&gt;And what needs to be done in order to update the running app? Just change the python or html code and &lt;strong&gt;push it to the &lt;code&gt;main&lt;/code&gt; branch&lt;/strong&gt;! The workflow will do the rest. The developer only needs view access to the AWS console. Any write interaction is made either by the workflow or the platform team.&lt;/p&gt;

&lt;h3&gt;
  
  
  About how to refine the AWS role for Github
&lt;/h3&gt;

&lt;p&gt;Here's a quick tip about how to improve the role created for the Github workflow. With a more permissive role, you can see on CloudTrail exactly what actions the role perform on normal operations. Filtering the events by the user name &lt;code&gt;GithubActions&lt;/code&gt;, you can determine each individual event generated by the role, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcv088ktrti4318t0vl3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmcv088ktrti4318t0vl3.png" alt="AWS CloudTrail" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From there you can get the actions that need to be allowed and the resources associated with those actions. This will allow you to specify a well-refined role that follows the &lt;a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege" rel="noopener noreferrer"&gt;least privilege principle&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clean-up
&lt;/h2&gt;

&lt;p&gt;In order to clean-up this while demo, just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delete the Cloudformation stack created&lt;/li&gt;
&lt;li&gt;Delete the CloudWatch log groups associated with the app&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;cd terraform &amp;amp;&amp;amp;terraform destroy&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also delete the AWS connector on Github and the Github connection on the App Runner console if you don't intent to use that integration again.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>github</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Monitoring C++ Applications</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Wed, 31 Aug 2022 00:45:45 +0000</pubDate>
      <link>https://dev.to/danielkneipp/monitoring-c-applications-4152</link>
      <guid>https://dev.to/danielkneipp/monitoring-c-applications-4152</guid>
      <description>&lt;p&gt;In this document is described in general terms what is expected from a monitoring solution, together with suggestions of tools that could be used for C++ applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://opentelemetry.io/docs/concepts/signals/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; has a broad and generic terminology to define concepts around possible types of telemetry data. Here we will focus on traces, metrics, logs and something that is not covered there: crash reports.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Monitoring C++ Applications

&lt;ul&gt;
&lt;li&gt;Overall Architecture&lt;/li&gt;
&lt;li&gt;
Tools and services available

&lt;ul&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Traces&lt;/li&gt;
&lt;li&gt;Crash reports&lt;/li&gt;
&lt;li&gt;Visualizing&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Overall Architecture
&lt;/h2&gt;

&lt;p&gt;Generally speaking, any software application generate the following kinds of data for monitoring purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logs&lt;/strong&gt;: text record with metadata and potentially semantic information about the action being performed by the software&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics&lt;/strong&gt;: numeric measurement that can describe useful information (e.g. execution count or timer) about an action or event being triggered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traces&lt;/strong&gt;: metadata that can be correlated between applications that interact between themselves (used to create distributed profilers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash reports&lt;/strong&gt;: artifacts generated by the application when a unrecoverable failure happens (usually in form of memory dumps)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each kind of data usually has a service (or an independent functionality of a service) that supports and stores it for future queries and analyzes. And depending on the tool used, the mechanism to obtain the data can vary. However, usually the services in the market follow a similar approach where the data is pushed, for the exception of logs, which has another software (called agent) that runs separately and is responsible to get the logs from a source (e.g. a file in the filesystem) to the destined service.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: this is a simple analyses of just an independent application being monitored. Here it's not covered the possibilities that the container orchestration systems like Kubernetes have&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The general architecture of what was described can be shown by the following diagram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                                                           +-----------------------------------------+
                                                           | Company Infrastructure                  |
                                                           |                                         |
+-----------------------------------+                      |     +-----------------+                 |
| Machine                           |                      |     |                 |                 |
|                                   |             +--------+-----&amp;gt;   Logs Server   |                 |
|                 +-------------+   |             |        |     |                 |                 |
|                 |             |   | Push        |        |     +-----------------+                 |
|         Pull    |  Log agent  +---+-------------+        |                                         |
|        +--------&amp;gt;             |   |                      |     +------------------------------+    |
|        |        +-------------+   |                      |     |                              |    |
|        |                          |               +------+-----&amp;gt;  Distributed Tracing System  |    |
|   +----v--+                       | Push traces   |      |     |                              |    |
|   |       +-----------------------+---------------+      |     +------------------------------+    |
|   |  App  |                       |                      |                                         |
|   |       +-----------------------+---------------+      |     +------------------+                |
|   +-----+-+                       | Push metrics  |      |     |                  |                |
|         |                         |               +------+-----&amp;gt;  Metrics Server  |                |
|         |                         |                      |     |                  |                |
|         |                         |                      |     +------------------+                |
+---------+-------------------------+                      |                                         |
          |                                                |     +----------------------------+      |
          |                                                |     |                            |      |
          +------------------------------------------------+-----&amp;gt;  Crash Reporting System    |      |
                                      Push minidumps       |     |                            |      |
                                                           |     +----------------------------+      |
                                                           |                                         |
                                                           |                                         |
                                                           +-----------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For traces, metrics and crash reports, changes in the code are needed to to send the data to the related services. And for logs, as said before, an agent is responsible for that. So, the application only needs to sends the logs to a file, for example, and the agent properly configured will do the rest.&lt;/p&gt;

&lt;p&gt;With this approach, an application running in a customers environment can publish telemetry data without having to expose its internal network by allowing external incoming traffic. All traffic is outgoing.&lt;/p&gt;

&lt;p&gt;For each component present diagram, there are several (and I do mean, &lt;strong&gt;several&lt;/strong&gt; 😅) tools and services available in the market that can help accomplish this monitoring architecture. To keep the discussion brief, here I'll mention just a few for each kind of data we wan't to track.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools and services available
&lt;/h2&gt;

&lt;p&gt;The focus will be given to the OpenTelemetry standard and for tools in the Grafana ecosystem. This will allow the application to be compatible with a variety of tools and services in the market, while allowing easy visualization and management of almost all telemetry data (with the exception of the crash reports, that will be discussed later).&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;It's recommended to use a logging formatter to add severity, timestamp, correlation ids, among other metadata to allow correlation the logs with other telemetry data . Also is good to format the logs in a specific format (e.g. json) to be able to quickly query the logs afterwards without too much preprocessing rules.&lt;/p&gt;

&lt;p&gt;For that the &lt;a href="https://github.com/open-telemetry/opentelemetry-cpp/blob/main/examples/common/logs_foo_library/foo_library.cc" rel="noopener noreferrer"&gt;OpenTelemetry SDK&lt;/a&gt; could be used.&lt;/p&gt;

&lt;p&gt;As for the Logs Server, &lt;a href="https://grafana.com/oss/loki/" rel="noopener noreferrer"&gt;Grafana Loki&lt;/a&gt; could be used (also &lt;a href="https://www.elastic.co/pt/what-is/elk-stack" rel="noopener noreferrer"&gt;ELK stack&lt;/a&gt; and &lt;a href="https://www.datadoghq.com/" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt; are well known options with not only logging support, but traces and metrics as well)&lt;/p&gt;

&lt;p&gt;As the agent must be compatible with the Logs server, we need to follow &lt;a href="https://grafana.com/docs/loki/latest/clients/" rel="noopener noreferrer"&gt;Grafana's documentation&lt;/a&gt; There shows several options to choose from like Promtail or Fleunt Bit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Following Grafana's ecosystem, &lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt; is a widely used metrics server and &lt;a href="https://github.com/open-telemetry/opentelemetry-cpp/tree/main/examples/prometheus" rel="noopener noreferrer"&gt;OpenTelemetry has a nice example&lt;/a&gt; on how to use it already&lt;/p&gt;

&lt;h3&gt;
  
  
  Traces
&lt;/h3&gt;

&lt;p&gt;And here again we can leverage a tool from the Grafana ecosystem and OpenTelemetry. With &lt;a href="https://grafana.com/oss/tempo/" rel="noopener noreferrer"&gt;Tempo&lt;/a&gt; as the Distributed Tracing System, we can use OpenTelemetry (&lt;a href="https://github.com/open-telemetry/opentelemetry-cpp/blob/main/examples/simple/main.cc" rel="noopener noreferrer"&gt;example&lt;/a&gt;) to send traces to it since &lt;a href="https://grafana.com/docs/tempo/latest/getting-started/#1-instrumentation" rel="noopener noreferrer"&gt;Tempo is compatible&lt;/a&gt; to the OpenTelementry standard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crash reports
&lt;/h3&gt;

&lt;p&gt;So, for crash reports it gets more interesting. There is nothing on OpenTelemetry or Grafana that is dedicated to handle this kind of telemetry data. We are still able to publish &lt;a href="https://opentelemetry.io/docs/reference/specification/error-handling/" rel="noopener noreferrer"&gt;traces with error information&lt;/a&gt; for exceptions that can be handled, but to get information about crashes like &lt;code&gt;segfault&lt;/code&gt;s, another tool must be used.&lt;/p&gt;

&lt;p&gt;One interesting option is &lt;a href="https://sentry.io/for/c-plus-plus/" rel="noopener noreferrer"&gt;Sentry&lt;/a&gt;. It also has integration with &lt;a href="https://docs.sentry.io/platforms/native/guides/qt/" rel="noopener noreferrer"&gt;Qt-based applications&lt;/a&gt;, which is a library widely used by C++ GUIs&lt;/p&gt;

&lt;p&gt;Another onr is &lt;a href="https://raygun.com/documentation/language-guides/cpp/crash-reporting/installation/" rel="noopener noreferrer"&gt;Raygun&lt;/a&gt;. Although it doesn't have an SDK itself, it shows how you can integrate your software with &lt;a href="https://github.com/google/breakpad" rel="noopener noreferrer"&gt;Google's breakpad&lt;/a&gt; and send the crash report via an http request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Both options have their own GUIs, so you won't access them on the same place you would access the rest of the telemetry data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Visualizing
&lt;/h3&gt;

&lt;p&gt;And as closing point, To visualize all this data (except crash reports), Grafana itself can be used to query, manage and create dashboards, alerts, etc. from this data. And with everything propely configured correlation ids could be used to allow the developer to grab all kinds of telemetry data related to a user interaction to get a better idea of how the application is being used and how performant it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;For sure this is not the only way to achieve a good observability level of your application, but it's one with a good synergy between the tools and services chosen with a succinct tech stack&lt;/p&gt;

</description>
      <category>monitoring</category>
      <category>cpp</category>
    </item>
    <item>
      <title>Share a GPU between pods on AWS EKS</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Thu, 04 Nov 2021 22:29:51 +0000</pubDate>
      <link>https://dev.to/danielkneipp/share-a-gpu-between-pods-on-aws-eks-1519</link>
      <guid>https://dev.to/danielkneipp/share-a-gpu-between-pods-on-aws-eks-1519</guid>
      <description>&lt;p&gt;On this post we discuss the necessary IaC (Infrastructure as Code) files to provision an EKS cluster capable of sharing a single GPU between multiple pods (code available &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu" rel="noopener noreferrer"&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;If you ever tried to use GPU-based instances with AWS ECS, or on EKS using the default &lt;a href="https://github.com/NVIDIA/k8s-device-plugin" rel="noopener noreferrer"&gt;Nvidia plugin&lt;/a&gt;, you would know that it's not possible to make a task/pod shared the same GPU on an instance. If you want to add more replicas to your service (for redundancy or load balancing), you would need one GPU for each replica.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;And this doesn't seem to be going to change in the near future for ECS (see this &lt;a href="https://github.com/aws/containers-roadmap/issues/327#issuecomment-580455803" rel="noopener noreferrer"&gt;feature request&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;GPU-based instances are expensive, and despite the fact that some Machine Learning frameworks (e.g. Tensorflow) are pre-configured to use the entire GPU by default, that's not always the case. ML services can be configured to make independent inferences by request instead of batch processing, and this would require just a fraction of a 16GiB of VRAM that comes with some instances.&lt;/p&gt;

&lt;p&gt;Currently, GPU-based instances only publish to ECS/EKS the amount of GPUs they have. This means that a task/pod can only request a GPU, but not the amount of resources of GPU (like it's possible with CPU and RAM memory). The solution is to make the instance publish the amount of GPU resources (processing cores, memory, etc.) so that a pod can request only a fraction of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;This project (available &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu" rel="noopener noreferrer"&gt;here&lt;/a&gt;) uses the k8s device plugin described by &lt;a href="https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/" rel="noopener noreferrer"&gt;this AWS blog post&lt;/a&gt; to make GPU-based nodes publish the amount of GPU resource they have available. Instead of the amount of VRAM available or some abstract metric, this plugin advertises the amount of pods/processes that can be connected to the GPU. This is controlled by what is called by NVIDIA as &lt;a href="https://docs.nvidia.com/deploy/mps/index.html" rel="noopener noreferrer"&gt;Multi-Process Service&lt;/a&gt; (MPS).&lt;/p&gt;

&lt;p&gt;MPS manages workloads submitted by different processes to allow them to be scheduled and executed concurrently in a GPU. On Volta and newer architectures we can also limit the &lt;a href="https://docs.nvidia.com/deploy/mps/index.html#topic_5_2_5" rel="noopener noreferrer"&gt;amount of threads&lt;/a&gt; a process can use from the GPU to limit the shareability of resources and ensure some Quality of Service (QoS) level.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu" rel="noopener noreferrer"&gt;Here&lt;/a&gt; we put it all together to deliver an infrastructure and deployment lifecycle which all can be managed by terraform. Integrally, here is the list of tools needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;terraform&lt;/code&gt;: for infrastructure provisioning and service deployment (including the &lt;code&gt;DaemonSet&lt;/code&gt; for the device plugin and the &lt;code&gt;Deployment&lt;/code&gt; for testing);&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;packer&lt;/code&gt;: to create an instrumented AMI for GPU usage monitoring in CloudWatch&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;asdf&lt;/code&gt;: really handy tool used to install other tools in a version-controlled way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest will come along in the next steps ;)&lt;/p&gt;

&lt;p&gt;At the end, you should have an infrastructure with the following features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;✔️ EKS cluster with encrypted volumes and secrets using KMS&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;✔️ All workers resides on private subnets and access the control plane only from within the VPC (no internet communication)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;✔️ Ip whitelist configured for accessing the k8s api from the internet&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;✔️ Instrumented instances with GPU usage monitored in Cloudwatch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;✔️ Nodes can be access with AWS SSM Session Manager (no &lt;code&gt;ssh&lt;/code&gt; required).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installing the tooling
&lt;/h3&gt;

&lt;p&gt;The first tool to be installed is &lt;code&gt;asdf&lt;/code&gt;. With it, all the other will come after easily. &lt;code&gt;asdf&lt;/code&gt; can be installed following &lt;a href="https://asdf-vm.com/guide/getting-started.html" rel="noopener noreferrer"&gt;this guide&lt;/a&gt; from its documentation page. After that, you should be able to run the following list of commands to install the rest of the tooling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;asdf plugin-add terraform https://github.com/asdf-community/asdf-hashicorp.git
asdf plugin-add pre-commit git@github.com:jonathanmorley/asdf-pre-commit.git
asdf plugin-add tflint https://github.com/skyzyx/asdf-tflint
asdf plugin-add https://github.com/MetricMike/asdf-awscli.git

asdf install
pre-commit install
tflint --init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This project also comes with &lt;code&gt;pre-commit&lt;/code&gt; configured to serve as a reference on how terraform-based projects can be configured to check of syntax and linting errors even before a commit is made (so that you don't have to wait for some CI pipeline).&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the AMI
&lt;/h3&gt;

&lt;p&gt;For details about how the AMI is create and what comes with it, I highly suggest you to my &lt;a href="https://github.com/DanielKneipp/aws-ami-gpu-monitoring" rel="noopener noreferrer"&gt;other repo&lt;/a&gt; that explains in detail how the AMI works and what IAM permissions it requires.&lt;/p&gt;

&lt;p&gt;From &lt;a href="https://github.com/DanielKneipp/aws-ami-gpu-monitoring" rel="noopener noreferrer"&gt;that repo&lt;/a&gt;, the only thing changed is the base AMI, which in this case an AMI tailored for accelerated hardware on EKS was used. The list of compatible AMIs for EKS can be obtained in &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html" rel="noopener noreferrer"&gt;this link&lt;/a&gt; updated regularly by AWS. Also, the AMI from AWS comes with &lt;a href="https://github.com/aws/containers-roadmap/issues/593" rel="noopener noreferrer"&gt;SSM agent&lt;/a&gt; in it, so no need to change anything regarding that.&lt;/p&gt;

&lt;p&gt;The following commands will create an AMI named &lt;code&gt;packer-gpu-ami-0-1&lt;/code&gt;, which should be picked automatically by the terraform code of the cluster. &lt;em&gt;All &lt;code&gt;terraform&lt;/code&gt; and &lt;code&gt;packer&lt;/code&gt; commands assume that you already have configured your AWS credentials properly&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd ami/
packer build .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  About the infrastructure
&lt;/h3&gt;

&lt;p&gt;The cluster and network resources are defined together in the &lt;code&gt;cluster&lt;/code&gt; directory. Here is a small description of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;main.tf&lt;/code&gt;: defines the versions and configuration of the main providers, as well as set values for variables that can be used on other files (e.g. name of the cluster);&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;vpc.tf&lt;/code&gt;: encompass the network configuration where the EKS cluster will be provisioned. It doesn't contain a subnet for the &lt;code&gt;us-east-1e&lt;/code&gt; because, at the time of this writing, there were no &lt;code&gt;g4dn.xlarge&lt;/code&gt; available at this availability zone;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;eks.tf&lt;/code&gt;: contains the cluster definition using managed workers. &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu/blob/master/cluster/eks.tf#L54" rel="noopener noreferrer"&gt;Here&lt;/a&gt; is also where is defined the &lt;code&gt;node-label&lt;/code&gt; &lt;code&gt;k8s.amazonaws.com/accelerator&lt;/code&gt;, important to tell the device-plugin where it should be deployed;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;kms.tf&lt;/code&gt;: here we have the definition of the Costumer Managed Keys (CMKs) alongside the policies necessary to make them work for the encryption of the volumes of the cluster nodes and k8s secrets;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;iam.tf&lt;/code&gt;: has the permissions necessary in order to make the Session Manager access work and to allow the nodes to publish metrics on CloudWatch regarding CPU, RAM, swap, disk and GPU usage (go &lt;a href="https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started-instance-profile.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; to know more the permissions for Session Manager and &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-iam-roles-for-cloudwatch-agent.html" rel="noopener noreferrer"&gt;here&lt;/a&gt; to learn more about permissions required by CloudWatch Agent);&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;aws-virtual-gpu--device-plugin.tf&lt;/code&gt;: Generated from the &lt;code&gt;yaml&lt;/code&gt;file of the same name obtained from the &lt;a href="https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/" rel="noopener noreferrer"&gt;AWS blog post&lt;/a&gt;. Some modifications needed to me made in order to make this &lt;code&gt;DaemonSet&lt;/code&gt; work. Here they are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The image &lt;code&gt;nvidia/cuda:latest&lt;/code&gt; doesn't exist anymore as the tag &lt;code&gt;latest&lt;/code&gt; is now deprecated (&lt;a href="https://hub.docker.com/r/nvidia/cuda/" rel="noopener noreferrer"&gt;source&lt;/a&gt;). Because of that, the image &lt;code&gt;nvidia/cuda:11.4.2-base-ubuntu20.04&lt;/code&gt; is being used instead.&lt;/li&gt;
&lt;li&gt;The number of &lt;code&gt;vgpu&lt;/code&gt; configured for the container &lt;code&gt;aws-virtual-gpu-device-plugin-ctr&lt;/code&gt; was modified from its default of &lt;code&gt;16&lt;/code&gt; to &lt;code&gt;42&lt;/code&gt; because NVIDIA architectures after the Volta can handle up to &lt;code&gt;42&lt;/code&gt; connections to the MPS (&lt;a href="https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_5_1" rel="noopener noreferrer"&gt;source&lt;/a&gt;). This has been done to increase how much fractioned the GPU can get. Theoretically (&lt;em&gt;not tested&lt;/em&gt;) 42 pods could share the same GPU (if they don't surpass the amount of VRAM available). At this point, limitations of instance networking are more restricting than GPU shareability.&lt;/li&gt;
&lt;li&gt;Because this &lt;code&gt;vgpu&lt;/code&gt; configuration can have different limits depending on the architecture of the GPU, the plugin also was configured to be deployed on &lt;code&gt;g4dn.xlarge&lt;/code&gt; instances (see how &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu/blob/master/cluster/aws-virtual-gpu-device-plugin.tf#L104" rel="noopener noreferrer"&gt;here&lt;/a&gt;) where the architecture is now (Turing) and this demo was tested on.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro tip&lt;/strong&gt;: If you want to convert k8s &lt;code&gt;yaml&lt;/code&gt; files to &lt;code&gt;.tf&lt;/code&gt;, you can use &lt;code&gt;k2tf&lt;/code&gt; (&lt;a href="https://github.com/sl1pm4t/k2tf" rel="noopener noreferrer"&gt;repo&lt;/a&gt;) that is able to convert the resource types of the &lt;code&gt;yaml&lt;/code&gt; top their appropriated counterparts of the k8s provider for terraform. To install it, just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget https://github.com/sl1pm4t/k2tf/releases/download/v0.6.3/k2tf_0.6.3_Linux_x86_64.tar.gz
tar zxvf k2tf_0.6.3_Linux_x86_64.tar.gz k2tf
sudo mv k2tf /usr/local/bin/
rm k2tf_0.6.3_Linux_x86_64.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, you should be able to convert a &lt;code&gt;yaml&lt;/code&gt; manifest with a simple command like &lt;code&gt;cat file.yaml | k2tf &amp;gt; file.tf&lt;/code&gt;. This has been done for &lt;code&gt;cluster/aws-virtual-gpu-device-plugin.yaml&lt;/code&gt; and &lt;code&gt;app/app.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Provisioning the infrastructure
&lt;/h3&gt;

&lt;p&gt;To provision all of this, the following command should be sufficient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd cluster/
terraform init
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;apply&lt;/code&gt; should show &lt;code&gt;Plan: 59 to add, 0 to change, 0 to destroy.&lt;/code&gt;. If that's the case, hit &lt;code&gt;yes&lt;/code&gt; and go grab a cup of of coffee as this can take dozens of minutes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After the resources be provisioned, you might want to run &lt;code&gt;terraform apply -refresh-only&lt;/code&gt; to refresh your local state as the creation of some resource change the state of others within AWS. Also, state differences on &lt;code&gt;metadata.resource_version&lt;/code&gt; of k8s resources almost always show up after an &lt;code&gt;apply&lt;/code&gt;. This seems to be related to &lt;a href="https://github.com/hashicorp/terraform-provider-kubernetes/issues/1087" rel="noopener noreferrer"&gt;this issue&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you should see an EKS cluster with the following workloads:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh5o6cqbxcb7781z7tjn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjh5o6cqbxcb7781z7tjn.png" alt="eks-default-workloads" width="800" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  About the app
&lt;/h3&gt;

&lt;p&gt;The app is a &lt;code&gt;Deployment&lt;/code&gt; also obtained from the &lt;a&gt;AWS blog post&lt;/a&gt; that spawns 3 replicas of a resnet model in the cluster. &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu/blob/master/app/app.tf#L65" rel="noopener noreferrer"&gt;This line&lt;/a&gt; defines "how much" GPU it needs. Because of this requirement, k8s will not schedule a pod of this deployment to a node that doesn't have a GPU.&lt;/p&gt;

&lt;p&gt;This deployment is configured to use 20% of the GPU memory (using a tensorflow feature &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu/blob/master/app/app.tf#L50" rel="noopener noreferrer"&gt;here&lt;/a&gt;). Based on this VRAM usage, we need to configure how many of the 48 process slots from MPS of an instance we wan't to reserve. Let's use &lt;code&gt;ceil&lt;/code&gt; to be conservative, so &lt;code&gt;ceil(48 * 0.2) = 10&lt;/code&gt;. With this we should be able to schedule even 4 replicas in the same instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying the app
&lt;/h3&gt;

&lt;p&gt;Since we're using the same tool for infrastructure management and app deployment, now we leverage this by following the exact same procedure to deploy the app.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd app/
terraform init
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now you should be seeing the resnet workload deployed like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvvr5ukoqo38bzz4u5wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwvvr5ukoqo38bzz4u5wn.png" alt="eks-resnet-workloads" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, we can see the on CloudWatch the amount of VRAM used in that instance to confirm that more than one replica&lt;br&gt;
is actually allocating resources there. To know more about the new metrics available in ClodWatch published by instances using this custom AMI, please go &lt;a href="https://github.com/DanielKneipp/aws-ami-gpu-monitoring" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcmbn4r0jvgzoxzcdlbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcmbn4r0jvgzoxzcdlbs.png" alt="cw-vram-usage" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, what about we scale the deployment to &lt;code&gt;4&lt;/code&gt; replicas? Please, go to &lt;a href="https://github.com/DanielKneipp/aws-eks-share-gpu/blob/master/app/app.tf#L19" rel="noopener noreferrer"&gt;this line&lt;/a&gt; and change the amount of replicas from &lt;code&gt;3&lt;/code&gt; to &lt;code&gt;4&lt;/code&gt; and run another &lt;code&gt;tf apply&lt;/code&gt;. After some time (~3-5 minutes) you should be able to see the VRAM usage of that instance increasing a bit more, like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feeaiev2wjwbvouzk75wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feeaiev2wjwbvouzk75wj.png" alt="cw-vram-usage-after-scale" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Clean-up
&lt;/h3&gt;

&lt;p&gt;Leveraging again the fact we interact mostly with terraform, clean everything should be as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd app/
tf destroy

cd ../cluster/
tf destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The order matters because you can't delete the EKS cluster before removing the resources allocated in it, otherwise you should get error messages from AWS API about &lt;em&gt;resource&lt;/em&gt; still being used.&lt;/p&gt;

&lt;p&gt;Also, don't forget follow the clean-up procedure of the &lt;a href="https://github.com/DanielKneipp/aws-ami-gpu-monitoring" rel="noopener noreferrer"&gt;AMI repo&lt;/a&gt; to delete the created AMI and avoid EBS costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Todo next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Implement/test autoscaling features making a load test to resnet&lt;/li&gt;
&lt;li&gt;[ ] Enable and use &lt;a href="https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/" rel="noopener noreferrer"&gt;IRSA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Add &lt;a href="https://www.infracost.io/" rel="noopener noreferrer"&gt;Infracost&lt;/a&gt; on pre-commit config&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;Here we've implemented a complete infrastructure for an EKS cluster with shared GPU-based instances.&lt;/p&gt;

&lt;p&gt;Please, fell free to reach out to me on my &lt;a href="https://github.com/DanielKneipp" rel="noopener noreferrer"&gt;github&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/daniel-kneipp/" rel="noopener noreferrer"&gt;linkedin&lt;/a&gt; accounts for suggestions or questions. ✌️&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Instrumenting AMIs for GPU monitoring on CloudWatch</title>
      <dc:creator>Daniel Kneipp</dc:creator>
      <pubDate>Sun, 01 Aug 2021 21:12:30 +0000</pubDate>
      <link>https://dev.to/aws-builders/instrumenting-amis-for-gpu-monitoring-on-cloudwatch-105m</link>
      <guid>https://dev.to/aws-builders/instrumenting-amis-for-gpu-monitoring-on-cloudwatch-105m</guid>
      <description>&lt;p&gt;If you have used provisioned instances on AWS before, you know that the default metrics monitored are kind of limited. You only have access to CPU utilization, network transfer rates and disk reads/writes. By default, you don't have the monitoring of some basic information, like RAM and filesystem usage (which can be a valuable information to prevent an instance malfunction due to lack of resources).&lt;/p&gt;

&lt;p&gt;In case of GPU-accelerated applications (like Machine Learning apps), this problem goes even further, since you also don't have any access to GPU metrics, which is critical to guarantee the reliability of the system (e.g., the total GPU memory consumption can lead to the crash of any application running on the GPU).&lt;/p&gt;

&lt;p&gt;I've created a project (available &lt;a href="https://github.com/DanielKneipp/aws-ami-gpu-monitoring" rel="noopener noreferrer"&gt;here&lt;/a&gt;) showing how we can create an AMI with CloudWatch agent for RAM and filesystem monitoring, and a custom service called &lt;code&gt;gpumon&lt;/code&gt; to gather GPU metrics and send them to AWS CloudWatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;

&lt;p&gt;In the project we have two main directories like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├── packer  ==&amp;gt; AMI creation
└── tf      ==&amp;gt; AMI usage example
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first one contains all the necessary files to create the the AMI based on Amazon Linux 2 using a tool called &lt;code&gt;packer&lt;/code&gt;. The second one has infrastructure as code in &lt;code&gt;terraform&lt;/code&gt; to provision an instance using the new created AMI for testing purposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  AMI creation
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;packer&lt;/code&gt; is a great tool to achieve Infrastructure as Code principles on AMI creation step. It has capabilities to provision an instance with the base AMI specified, run scripts through ssh, start the process of AMI creation, and clean everything up (e.g. instance, ebs volume, ssh key pair) afterwards.&lt;/p&gt;

&lt;p&gt;The file &lt;code&gt;packer/gpu.pkr.hcl&lt;/code&gt; contains the specification of the AMI. There we can find the base AMI, the instance used to create the AMI, the storage configuration, and the scripts used to configure the instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Base AMI
&lt;/h3&gt;

&lt;p&gt;In order to make my life a bit easier, I tried to to look for AMIs that already have NVIDIA drivers installed, so that I don't have to install it myself. Looking through the &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#preinstalled-nvidia-driver" rel="noopener noreferrer"&gt;AWS documentation about installing NVIDIA drivers&lt;/a&gt;, we can see that there are options already in the marketplace of AMIs with pre-shipped NVIDIA drivers. Among the options, we're going to use the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-64e4rx3h733ru?qid=1627738530182&amp;amp;sr=0-3&amp;amp;ref_=srh_res_product_title" rel="noopener noreferrer"&gt;Amazon Linux 2&lt;/a&gt;, because it already comes with the AWS Systems Manager agent, which we will use latter on.&lt;/p&gt;

&lt;p&gt;A couple of notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You don't need to subscribe to the marketplace product in order to have access to the AMI currently selected. However, you will need to subscribe to have access to the AMI id of new releases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You will &lt;strong&gt;need a GPU-based instance&lt;/strong&gt; to build the AMI (as it's required by the marketplace product specifications). I've tested this project in a new AWS account and it seems that the default limits don't allow the provisioning of GPU-based instances (G family). &lt;code&gt;packer&lt;/code&gt; will show an error if that's your case as well. If it is, you can request a limit increase &lt;a href="http://aws.amazon.com/contact-us/ec2-request" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CloudWatch Agent
&lt;/h3&gt;

&lt;p&gt;The first addon that we're going to make to the base AMI is to install and configure the AWS CloudWatch Agent.&lt;/p&gt;

&lt;p&gt;The process of installation of the agent is well documented by AWS and you can see more details and methods of installation in other Linux distributions &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-commandline-fleet.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The agent configuration is made by &lt;code&gt;.json&lt;/code&gt; file that the agent reads in order to know what metrics to monitor and how to publish them on CloudWatch. You can also see more about it on the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html" rel="noopener noreferrer"&gt;documentation page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The process is automated by the script &lt;code&gt;packer/scripts/install-cloudwatch-agent.sh&lt;/code&gt;. It installs the agent and configure it with some relevant metrics like filesystem, RAM, and swap usage.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note that the agent is configured to publish metrics with a period of 60 seconds. This can incur costs since it's considered and Detailed metric (go to &lt;a href="https://aws.amazon.com/cloudwatch/pricing/" rel="noopener noreferrer"&gt;CloudWatch pricing page&lt;/a&gt; to know more).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Gathering the GPU metrics
&lt;/h3&gt;

&lt;p&gt;AWS already have &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-gpu-monitoring.html" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; talking about ways to monitor GPU usage. There is a &lt;a href="https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-gpu-monitoring-gpumon.html" rel="noopener noreferrer"&gt;brief description&lt;/a&gt; about a tool called &lt;code&gt;gpumon&lt;/code&gt; and also a more extended &lt;a href="https://aws.amazon.com/blogs/machine-learning/monitoring-gpu-utilization-with-amazon-cloudwatch/" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; about it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gpumon&lt;/code&gt; is a (kind of old) &lt;a href="https://s3.amazonaws.com/aws-bigdata-blog/artifacts/GPUMonitoring/gpumon.py" rel="noopener noreferrer"&gt;python script&lt;/a&gt; developed by AWS that makes use of a NVIDIA library called NVLM (NVIDIA Management Library) to gather metrics from the GPUs of the instance and publish them on CloudWatch. In this project the script was turned into a &lt;code&gt;systemd&lt;/code&gt; unit. The script itself was also modified to make the error handling more readable and to capture memory usage correctly.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;gpumon&lt;/code&gt; service resides in &lt;code&gt;packer/addons/gpumon&lt;/code&gt; and the &lt;code&gt;install-cloudwatch-gpumon.sh&lt;/code&gt; automates the installation process. The service is configured to start the python script at boot and restart it stops working for some reason. Since &lt;code&gt;systemd&lt;/code&gt; manages the service, its logs can be seen with &lt;code&gt;journalctl --unit gpumon&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: the python script has only been tested on python2, which &lt;a href="https://www.python.org/doc/sunset-python-2/" rel="noopener noreferrer"&gt;is deprecated&lt;/a&gt;. &lt;code&gt;pip&lt;/code&gt; warns about that on the installation process while you create the AMI. You should keep that in mind if you intend to use this script for any production workload.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  About the GPU memory usage metric gathering
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://s3.amazonaws.com/aws-bigdata-blog/artifacts/GPUMonitoring/gpumon.py" rel="noopener noreferrer"&gt;original script&lt;/a&gt; get the GPU memory usage from the &lt;code&gt;nvmlDeviceGetUtilizationRates()&lt;/code&gt; function. I noticed through some tests that this metric was 0 even though I had data loaded into the GPU.&lt;/p&gt;

&lt;p&gt;From the &lt;a href="https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g540824faa6cef45500e0d1dc2f50b321" rel="noopener noreferrer"&gt;NVIDIA documentation&lt;/a&gt; this function actually &lt;a href="https://docs.nvidia.com/deploy/nvml-api/structnvmlUtilization__t.html#structnvmlUtilization__t" rel="noopener noreferrer"&gt;returns&lt;/a&gt; the amount of memory that is being read/written, which isn't what I wanted. In order to get the amount of GPU memory allocated, &lt;a href="https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g2dfeb1db82aa1de91aa6edf941c85ca8" rel="noopener noreferrer"&gt;&lt;code&gt;nvmlDeviceGetMemoryInfo()&lt;/code&gt;&lt;/a&gt; should be used instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  AMI Usage example
&lt;/h2&gt;

&lt;p&gt;As an example on how to use this AMI, there is also a terraform project that contains the necessary resources to provision an instance and monitor it using the CloudWatch interface.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tf/main.tf&lt;/code&gt; is the root file containing the reference to the module &lt;code&gt;tf/module/monitored-gpu&lt;/code&gt;, which encapsulates the resources such as the instance and IAM permissions.&lt;/p&gt;

&lt;p&gt;This example doesn't required SSH capabilities from the instance. We will use AWS Systems Manager - Session Manager to access of the instance (the base AMI already comes with the SSM agent preinstalled). This method is better because the access is registered into AWS, allowing security auditions on the instance access. Also, there is no credentials nor keys stored in any machine to be leaked.&lt;/p&gt;

&lt;p&gt;The required AWS managed permissions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CloudWatchAgentServerPolicy&lt;/code&gt;: allow the instance to publish CloudWatch metrics;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AmazonSSMManagedInstanceCore&lt;/code&gt; instance access through Session Manager.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to run it
&lt;/h2&gt;

&lt;p&gt;All right, let's go to the fun part! To play with this project we first need to install some dependencies (&lt;code&gt;packer&lt;/code&gt; and &lt;code&gt;terraform&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;A really handy tool that you can use to install and manage multiple versions of tools is &lt;code&gt;asdf&lt;/code&gt;. It helps you keep track use different versions of a variety of tools. With it there is no need for you to uninstall the versions of the tools you may already have. With some simple commands it install the versions needed and make them context aware (the tolling version change automatically after entering in a directory that has a &lt;code&gt;.tool-versions&lt;/code&gt; specified).&lt;/p&gt;

&lt;p&gt;You can go to &lt;a href="https://asdf-vm.com/guide/getting-started.html" rel="noopener noreferrer"&gt;this link&lt;/a&gt; to install &lt;code&gt;asdf&lt;/code&gt;. After that you can simply run the following to have the correct versions of &lt;code&gt;packer&lt;/code&gt; and &lt;code&gt;terraform&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;asdf plugin-add terraform https://github.com/asdf-community/asdf-hashicorp.git
asdf plugin-add packer https://github.com/asdf-community/asdf-hashicorp.git

asdf install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, it's time to build the AMI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd packer
packer init
packer build .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start the process of building the AMI in the &lt;code&gt;us-east-1&lt;/code&gt; region. You can follow the terminal to see what is happening and the logs of the scripts. You can also see the snapshot being taken accessing the AWS console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubm33n7lcpfyo85zpfi8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubm33n7lcpfyo85zpfi8.png" alt="AMI page" width="800" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And get a progress bar in the "Snapshots" page like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqemiuthrh7qz3qx9qon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqemiuthrh7qz3qx9qon.png" alt="EBS Snapshot page" width="800" height="114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The snapshot name tag will appear after the AMI has been created.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AMI creation will be completed when you see something like this on your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
==&amp;gt; amazon-ebs.gpu: Terminating the source AWS instance...
==&amp;gt; amazon-ebs.gpu: Cleaning up any extra volumes...
==&amp;gt; amazon-ebs.gpu: No volumes to clean up, skipping
==&amp;gt; amazon-ebs.gpu: Deleting temporary security group...
==&amp;gt; amazon-ebs.gpu: Deleting temporary keypair...
Build 'amazon-ebs.gpu' finished after 9 minutes 38 seconds.

==&amp;gt; Wait completed after 9 minutes 38 seconds

==&amp;gt; Builds finished. The artifacts of successful builds are:
--&amp;gt; amazon-ebs.gpu: AMIs were created:
us-east-1: ami-09a9fd45137e9129e
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ At this point, you should have an AMI ready to be used!!&lt;/p&gt;

&lt;p&gt;Now it's time to test it! Grab the AMI id (&lt;code&gt;ami-09a9fd45137e9129e&lt;/code&gt; in this case) and paste it, replacing the text &lt;code&gt;"&amp;lt;your-ami-id&amp;gt;"&lt;/code&gt; in the &lt;code&gt;tf/main.tf&lt;/code&gt; file. After the modification, the section of the file that specifies the module should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"gpu_vm"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/monitored-gpu"&lt;/span&gt;

  &lt;span class="nx"&gt;ami&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ami-09a9fd45137e9129e"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, just run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd tf
terraform init
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;terraform&lt;/code&gt; will ask you if you want to perform the actions specified. If, right before the prompt, it shows that it will create 6 resources, like it's being shown right below, you can type &lt;code&gt;yes&lt;/code&gt; to start the resource provisioning.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;...
Plan: 6 to add, 0 to change, 0 to destroy.
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a couple of minutes (roughly 5 minutes), go to the &lt;em&gt;All metrics&lt;/em&gt; page on CloudWatch. You should be able to see two new custom namespaces already: &lt;code&gt;CWAgent&lt;/code&gt; and &lt;code&gt;GPU&lt;/code&gt;. This is the newly created instance publishing its metrics in idle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzftyatlcwrfqnj9h6kj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzftyatlcwrfqnj9h6kj5.png" alt="CW main interface" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see more details about RAM and swap, for example, using the &lt;code&gt;CWAgent&lt;/code&gt; namespace, like the next figure shows. With that you can monitor the boot behavior of the AMI, assess its performance and verify if it's behaving as expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5azcrr02cca54px3vuku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5azcrr02cca54px3vuku.png" alt="CW metrics" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The swap usage is 0 because there is no swap configured in this AMI (you can follow &lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/" rel="noopener noreferrer"&gt;this documentation&lt;/a&gt; in order to add it). The spike of RAM usage you see is a test that I was making 😅.&lt;/p&gt;

&lt;p&gt;Now, let's use this hardware a bit to see the metrics moving. Go to the &lt;em&gt;Instances&lt;/em&gt; tab on the EC2 page, like shown in the next figure. Right-click in the running instance and hit connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvassimnj4wxsmcc472ua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvassimnj4wxsmcc472ua.png" alt="SSM connect 1" width="800" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, go to the &lt;em&gt;Session Manager&lt;/em&gt; tab and hit &lt;em&gt;Connect&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjkhgb83dtq0xv0sf775.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjkhgb83dtq0xv0sf775.png" alt="SSM connect 2" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should now have a shell access through your browser. Running the commands below will clone and build a utility to stress-test the GPU for 5 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo -s
yum install -y git

cd ~
git clone https://github.com/wilicc/gpu-burn.git
make CUDAPATH=/opt/nvidia/cuda

./gpu_burn 600
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can look at CloudWatch to see the impact of the resource usage while &lt;code&gt;gpu-burn&lt;/code&gt; does its thing, as shown in the figure below.&lt;/p&gt;

&lt;p&gt;With these metrics, now it's easy to create alarms to alert you when an anomaly is detected on the resource usage or create autoscaling capabilities for a cluster using custom metrics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkspyq8uaepwzk07q0bo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkspyq8uaepwzk07q0bo.png" alt="GPU stress test" width="800" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Clean up
&lt;/h2&gt;

&lt;p&gt;To finish the party and turn off the lights, just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;run &lt;code&gt;terraform destroy&lt;/code&gt; while at the &lt;code&gt;tf/&lt;/code&gt; directory;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;deregister ami;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9bghbgaxjwfucv23fm4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9bghbgaxjwfucv23fm4.png" alt="Deregister AMI" width="753" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;and delete the EBS snapshot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzrd8oudreiofyysl6mp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgzrd8oudreiofyysl6mp.png" alt="Delete snapshot" width="689" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you, guys! comments and feedback are much appreciated. &lt;/p&gt;

&lt;p&gt;Feel free to reach out to me on &lt;a href="https://www.linkedin.com/in/daniel-kneipp/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://github.com/DanielKneipp" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
