<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yeonggyoo Jeon</title>
    <description>The latest articles on DEV Community by Yeonggyoo Jeon (@loganjeon).</description>
    <link>https://dev.to/loganjeon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F720167%2F8d98da2c-c0f0-4356-b8b0-bb0f55f75a43.jpeg</url>
      <title>DEV Community: Yeonggyoo Jeon</title>
      <link>https://dev.to/loganjeon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/loganjeon"/>
    <language>en</language>
    <item>
      <title>The Struggle to Optimize the Performance of the NVIDIA Triton Inference Server Running on AWS ECS</title>
      <dc:creator>Yeonggyoo Jeon</dc:creator>
      <pubDate>Thu, 23 Apr 2026 15:19:35 +0000</pubDate>
      <link>https://dev.to/aws-builders/the-struggle-to-optimize-the-performance-of-the-nvidia-triton-inference-server-running-on-aws-ecs-42i3</link>
      <guid>https://dev.to/aws-builders/the-struggle-to-optimize-the-performance-of-the-nvidia-triton-inference-server-running-on-aws-ecs-42i3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“Why is it so slow even though I have a GPU?”&lt;/strong&gt; I’d like to share my three-week struggle, which began with this single question.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;While developing the Vision AI service, I chose &lt;strong&gt;Nvidia Triton Inference Server&lt;/strong&gt; as the framework for model serving. Its features—such as multi-framework support, dynamic batching, and ensemble pipelines—were excellent, and I was particularly drawn to its ability to fully leverage NVIDIA GPUs.&lt;/p&gt;

&lt;p&gt;For the deployment environment, I chose &lt;strong&gt;AWS ECS&lt;/strong&gt; over SageMaker. I was already familiar with ECS from previous experience, and there was a requirement to expose Triton’s gRPC endpoints directly. However, once we actually deployed Triton on ECS, we encountered some unexpected issues.&lt;/p&gt;

&lt;p&gt;This post documents the &lt;strong&gt;three main issues we faced and how we resolved them&lt;/strong&gt; during that process.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Environment Overview
&lt;/h2&gt;

&lt;p&gt;First, here is a brief overview of the overall architecture.&lt;/p&gt;

&lt;p&gt;[Triton on ECS Architecture]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpn2fxj59lpw14f4ulw7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpn2fxj59lpw14f4ulw7.png" alt=" " width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main components are as follows. We deploy the Triton container as an ECS service on an ECS cluster composed of GPU instances (&lt;code&gt;g4dn.xlarge&lt;/code&gt;, NVIDIA T4). The model files are stored in S3 and loaded from S3 when Triton starts. We route HTTP (&lt;code&gt;:8000&lt;/code&gt;) and gRPC (&lt;code&gt;:8001&lt;/code&gt;) traffic through ALB and monitor GPU metrics using Prometheus and Grafana.&lt;/p&gt;

&lt;p&gt;The key settings for the ECS Task Definition are as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;“containerDefinitions”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;“name”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“triton-server”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;‘image’:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“nvcr.io/nvidia/tritonserver:&lt;/span&gt;&lt;span class="mf"&gt;23.10&lt;/span&gt;&lt;span class="err"&gt;-py&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“command”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;‘tritonserver’&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--model-repository=s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;://my-bucket/model_repository”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--allow-grpc=&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--grpc-port=&lt;/span&gt;&lt;span class="mi"&gt;8001&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“--allow-http=&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--http-port=&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--allow-metrics=&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“--metrics-port=&lt;/span&gt;&lt;span class="mi"&gt;8002&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;“portMappings”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“containerPort”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“protocol”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“tcp”&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“containerPort”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;‘protocol’:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“tcp”&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“containerPort”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8002&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“protocol”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“tcp”&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;“resourceRequirements”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="err"&gt;“type”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“GPU”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="err"&gt;‘value’:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;“logConfiguration”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;“logDriver”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“awslogs”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;‘options’:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="err"&gt;“awslogs-group”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“/ecs/triton-server”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="err"&gt;“awslogs-region”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“ap-northeast&lt;/span&gt;&lt;span class="mi"&gt;-2&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;“awslogs-stream-prefix”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“triton”&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Issue 1: GPU Not Detected
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptoms
&lt;/h3&gt;

&lt;p&gt;The ECS task appeared to be running normally, but the following warning continued to appear in the Triton logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;W0115 09:23:41.123456 1 backend_manager.cc:295]

Unable to load backend ‘tensorrt’: 
  failed to load library /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so: 
  libcuda.so.1: cannot open shared object file: No such file or directory
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model had loaded, but &lt;strong&gt;inference was running on the CPU&lt;/strong&gt; instead of the GPU. When I ran &lt;code&gt;nvidia-smi&lt;/code&gt; inside the container, there was no output.&lt;/p&gt;

&lt;h3&gt;
  
  
  Root Cause Analysis
&lt;/h3&gt;

&lt;p&gt;The issue lay in the instance configuration of the ECS cluster. Although we were using a GPU instance (&lt;code&gt;g4dn.xlarge&lt;/code&gt;), we were using a standard ECS-Optimized AMI instead of an &lt;strong&gt;ECS-Optimized GPU AMI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To use GPUs in ECS, both of the following conditions must be met.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ECS-Optimized GPU AMI&lt;/td&gt;
&lt;td&gt;An AMI with NVIDIA drivers and &lt;code&gt;nvidia-container-toolkit&lt;/code&gt; pre-installed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;resourceRequirements&lt;/code&gt; in Task Definition&lt;/td&gt;
&lt;td&gt;GPU resources must be explicitly requested for ECS to allocate a GPU to the container&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Since the standard ECS AMI does not have NVIDIA drivers installed, the container was unable to recognize the GPU.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resolution
&lt;/h3&gt;

&lt;p&gt;In the AWS Console, I replaced the Auto Scaling Group AMI for the ECS cluster with &lt;strong&gt;&lt;code&gt;ami-xxxxxxxx&lt;/code&gt; (ECS-Optimized GPU AMI)&lt;/strong&gt;. Here’s how to check the latest GPU AMI ID using the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check the latest ECS-Optimized GPU AMI ID for the current region&lt;/span&gt;
aws ssm get-parameters &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--names&lt;/span&gt; /aws/service/ecs/optimized-ami/amazon-linux-2/gpu/recommended/image_id &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; ap-northeast-2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; “Parameters[0].Value” &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After replacing the AMI and restarting the instance, &lt;code&gt;nvidia-smi&lt;/code&gt; recognized the GPU normally, and Triton successfully loaded the TensorRT backend.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; When using GPUs in ECS, you must use an &lt;strong&gt;ECS-Optimized GPU AMI&lt;/strong&gt;. While it is possible to manually install NVIDIA drivers on a standard ECS AMI, this is not recommended because the drivers may be reset during AMI updates.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Issue 2: Throughput is Only One-Third of Expectations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptoms
&lt;/h3&gt;

&lt;p&gt;After resolving the GPU recognition issue, I ran a load test. The results measured by &lt;code&gt;perf_analyzer&lt;/code&gt; were shocking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# perf_analyzer execution results (Dynamic Batching OFF)
&lt;/span&gt;&lt;span class="py"&gt;Concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8&lt;/span&gt;
  &lt;span class="py"&gt;Throughput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;22.3 infer/sec&lt;/span&gt;
  &lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;178ms&lt;/span&gt;

&lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p95&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;312ms&lt;/span&gt;
  &lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p99&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;445ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inference throughput of 22 requests per second was only one-third of the expected value (~70 req/s). Upon checking the GPU utilization, it was &lt;strong&gt;only 18% on average&lt;/strong&gt;. The GPU was sitting idle despite being available.&lt;/p&gt;

&lt;h3&gt;
  
  
  Root Cause Analysis
&lt;/h3&gt;

&lt;p&gt;The problem was that &lt;strong&gt;Dynamic Batching was disabled&lt;/strong&gt;. Since each inference request was being sent to the GPU individually, we were not utilizing the GPU’s parallel processing capabilities at all.&lt;/p&gt;

&lt;p&gt;GPUs are specialized for parallel matrix operations. The time difference between running inference with &lt;code&gt;batch=1&lt;/code&gt; and &lt;code&gt;batch=8&lt;/code&gt; is not as significant as one might think. In other words, by grouping multiple requests and processing them at once, we can simultaneously increase both GPU utilization and throughput.&lt;/p&gt;

&lt;p&gt;[Comparison of Dynamic Batching Effects]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z4rok9su9khuqlzuw38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z4rok9su9khuqlzuw38.png" width="800" height="1232"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;We added the Dynamic Batching configuration to the model’s &lt;code&gt;config.pbtxt&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;model_repository&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;vision_model&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;config.pbtxt&lt;/span&gt;

&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;vision_model&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;
&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;‘&lt;/span&gt;&lt;span class="n"&gt;onnxruntime&lt;/span&gt;&lt;span class="err"&gt;’&lt;/span&gt;
&lt;span class="n"&gt;max_batch_size&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;

&lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;input_image&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;

&lt;span class="n"&gt;data_type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TYPE_FP32&lt;/span&gt;
    &lt;span class="n"&gt;dims&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="n"&gt;output_detections&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;
    &lt;span class="n"&gt;data_type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TYPE_FP32&lt;/span&gt;
    &lt;span class="n"&gt;dims&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Enable&lt;/span&gt; &lt;span class="n"&gt;Dynamic&lt;/span&gt; &lt;span class="n"&gt;Batching&lt;/span&gt;
&lt;span class="n"&gt;dynamic_batching&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;preferred_batch_size&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;max_queue_delay_microseconds&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Execute&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="n"&gt;after&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="n"&gt;wait&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Number&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;concurrent&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;within&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;limits&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;available&lt;/span&gt; &lt;span class="n"&gt;GPU&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;instance_group&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KIND_GPU&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;max_queue_delay_microseconds: 5000&lt;/code&gt; means the system waits up to 5ms to fill a batch. If this value is too large, latency increases; if it is too small, batch efficiency decreases. For our service, 5ms was the optimal balance between throughput and latency.&lt;/p&gt;

&lt;p&gt;The results after changing the settings are as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# perf_analyzer execution results (Dynamic Batching ON)
&lt;/span&gt;&lt;span class="py"&gt;Concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8&lt;/span&gt;
  &lt;span class="py"&gt;Throughput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;76.1 infer/sec (+241%)&lt;/span&gt;
  &lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;98ms (-45%)&lt;/span&gt;
  &lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p95&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;187ms (-40%)&lt;/span&gt;

&lt;span class="err"&gt;Latency&lt;/span&gt; &lt;span class="py"&gt;p99&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;234ms (-47%)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Throughput improved by &lt;strong&gt;3.5 times&lt;/strong&gt;, from 22 req/s to 76 req/s, and GPU utilization also increased from 18% to 72%.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; Enable Dynamic Batching by default when deploying Triton for the first time. Set &lt;code&gt;preferred_batch_size&lt;/code&gt; based on the model’s &lt;code&gt;max_batch_size&lt;/code&gt; and actual traffic patterns, and adjust &lt;code&gt;max_queue_delay_microseconds&lt;/code&gt; to meet your service’s latency SLA.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Issue 3: ECS tasks periodically crash due to OOM
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Symptoms
&lt;/h3&gt;

&lt;p&gt;After enabling Dynamic Batching and running it for a few days, we observed that ECS tasks were restarting periodically. Checking the CloudWatch logs revealed that the cause was **OOMKilled (Out of Memory)**.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# CloudWatch Logs
[ERROR] Container ‘triton-server’ failed with exit code 137 (OOMKilled)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What was strange was that it was not GPU memory but **CPU memory (RAM)** that was running low. The &lt;code&gt;g4dn.xlarge&lt;/code&gt; instance provides 16GB of RAM, but the Triton container was using over 12GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Root Cause Analysis
&lt;/h3&gt;

&lt;p&gt;The issue lay in how Triton’s &lt;strong&gt;CUDA Unified Memory&lt;/strong&gt; operates. By default, Triton uses CPU memory as a fallback when GPU memory is insufficient. It also caches model weights in CPU memory during model loading.&lt;/p&gt;

&lt;p&gt;In the case of our model, the ONNX file size was approximately 800MB, and Triton was excessively using CPU memory by copying it multiple times internally. In particular, enabling Dynamic Batching resulted in additional intermediate buffers being allocated for batch processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;We resolved the issue by combining three approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Adjusting Memory Limits in the ECS Task Definition&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We explicitly set the soft limit (&lt;code&gt;memoryReservation&lt;/code&gt;) and hard limit (&lt;code&gt;memory&lt;/code&gt;) for the ECS Task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;“name”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“triton-server”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;“memory”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;14336&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;“memoryReservation”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10240&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;“resourceRequirements”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“type”:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“GPU”&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;‘value’:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;“&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;”&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Limiting Triton’s GPU Memory Usage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We explicitly specified the GPU memory pool size using the &lt;code&gt;--cuda-memory-pool-byte-size&lt;/code&gt; option.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tritonserver &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-repository&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;s3://my-bucket/model_repository &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cuda-memory-pool-byte-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0:3221225472 &lt;span class="se"&gt;\ &lt;/span&gt;&lt;span class="c"&gt;# Allocate 3GB to GPU 0&lt;/span&gt;

&lt;span class="nt"&gt;--pinned-memory-pool-byte-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1073741824 &lt;span class="se"&gt;\ &lt;/span&gt;&lt;span class="c"&gt;# 1GB of pinned memory&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-grpc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Add memory optimization settings to the model's config.pbtxt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Add&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;model_repository&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;vision_model&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;config.pbtxt&lt;/span&gt;
&lt;span class="n"&gt;optimization&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;cuda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;graphs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Reduce&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="n"&gt;launch&lt;/span&gt; &lt;span class="n"&gt;overhead&lt;/span&gt; &lt;span class="n"&gt;using&lt;/span&gt; &lt;span class="n"&gt;CUDA&lt;/span&gt; &lt;span class="n"&gt;Graphs&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Disable&lt;/span&gt; &lt;span class="n"&gt;CPU&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt; &lt;span class="n"&gt;GPU&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="n"&gt;is&lt;/span&gt; &lt;span class="n"&gt;insufficient&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force&lt;/span&gt; &lt;span class="n"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;instance_group&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KIND_GPU&lt;/span&gt;
    &lt;span class="n"&gt;gpus&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After applying these three measures, memory usage stabilized, and restarts caused by OOM completely disappeared.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; When deploying Triton on ECS, you must &lt;strong&gt;monitor CPU memory usage&lt;/strong&gt; as well as GPU memory. It is particularly important to explicitly control memory usage with the &lt;code&gt;--cuda-memory-pool-byte-size&lt;/code&gt; and &lt;code&gt;--pinned-memory-pool-byte-size&lt;/code&gt; options when serving large models.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Final Performance Comparison
&lt;/h2&gt;

&lt;p&gt;The final performance results after resolving all three issues are summarized below.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Initial (Issue State)&lt;/th&gt;
&lt;th&gt;Final (After Optimization)&lt;/th&gt;
&lt;th&gt;Improvement Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Throughput (req/s)&lt;/td&gt;
&lt;td&gt;22.3&lt;/td&gt;
&lt;td&gt;76.1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+241%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P50 Latency&lt;/td&gt;
&lt;td&gt;178ms&lt;/td&gt;
&lt;td&gt;98ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-45%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99 Latency&lt;/td&gt;
&lt;td&gt;445ms&lt;/td&gt;
&lt;td&gt;234ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-47%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Utilization&lt;/td&gt;
&lt;td&gt;18%&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+300%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OOM Restarts&lt;/td&gt;
&lt;td&gt;2–3 times per week&lt;/td&gt;
&lt;td&gt;0 times&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fully Resolved&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Triton Inference Server is a powerful tool, but to use it effectively in an ECS environment, you need to avoid a few pitfalls. The three issues covered in this article—failed GPU detection, Dynamic Batching not enabled, and CPU memory OOM—are all mentioned in the official documentation, but they are easy to overlook until you actually encounter them.&lt;/p&gt;

&lt;p&gt;In particular, &lt;strong&gt;Dynamic Batching is one of Triton’s most powerful features&lt;/strong&gt;, so it’s unfortunate that it is disabled by default. If you are deploying Triton for the first time, please be sure to check this setting.&lt;/p&gt;

&lt;p&gt;In the next post, we will cover how to use Triton’s &lt;strong&gt;Model Ensemble&lt;/strong&gt; to handle the preprocessing-inference-postprocessing pipeline on the server side.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/optimization.html" rel="noopener noreferrer"&gt;NVIDIA Triton Inference Server — Optimization Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/batcher.html" rel="noopener noreferrer"&gt;NVIDIA Triton Inference Server — Dynamic Batching&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/containers/running-gpu-based-container-applications-with-amazon-ecs-anywhere/" rel="noopener noreferrer"&gt;Running GPU-based container applications with Amazon ECS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/managed-instances-gpu.html" rel="noopener noreferrer"&gt;Use GPUs with Amazon ECS Managed Instances&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/debugging_guide.html" rel="noopener noreferrer"&gt;NVIDIA Triton Inference Server — Debugging Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>tritoninferenceserver</category>
      <category>ecs</category>
      <category>serving</category>
    </item>
    <item>
      <title>A Practical Guide to Building a Vision AI Model Serving Pipeline with AWS CDK</title>
      <dc:creator>Yeonggyoo Jeon</dc:creator>
      <pubDate>Thu, 23 Apr 2026 14:50:54 +0000</pubDate>
      <link>https://dev.to/aws-builders/a-practical-guide-to-building-a-vision-ai-model-serving-pipeline-with-aws-cdk-31i</link>
      <guid>https://dev.to/aws-builders/a-practical-guide-to-building-a-vision-ai-model-serving-pipeline-with-aws-cdk-31i</guid>
      <description>&lt;p&gt;&lt;strong&gt;This article is based on a VAIaaS (Vision AI as a Service) project we developed for production deployment.&lt;/strong&gt; I’ll share our experience building a Vision AI model serving pipeline using a combination of CDK, API Gateway, Lambda, and SageMaker.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Developing a Vision AI model and deploying that model as an actual service are two entirely different challenges. Even a model that achieves high accuracy in a research environment faces numerous engineering challenges when transitioning it into a stable and scalable API service. From infrastructure provisioning, automated model deployment, and request processing pipeline design to cost optimization and operational monitoring—a unified DevOps framework is essential to manage all of these aspects systematically. When using AWS as your cloud infrastructure, &lt;strong&gt;AWS CDK&lt;/strong&gt; is an excellent choice for this purpose.&lt;/p&gt;

&lt;p&gt;In this article, I’ll share an architecture and CDK examples for serving models—specifically Vision models—on the AWS cloud. This isn’t just a simple tutorial; it honestly details the problems encountered in a production environment and the process of solving them.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Overall Architecture of the Vision AI Service
&lt;/h2&gt;

&lt;p&gt;Below is the overall architecture diagram for the tentative “VAIaaS (Vision AI as a Service)” project, designed to serve various types of Vision AI models in an end-to-end manner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftci3d9ewyip80tvi7f51.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftci3d9ewyip80tvi7f51.jpg" alt=" " width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The request flow is described step-by-step as follows.&lt;/p&gt;

&lt;p&gt;When a client sends image data to the &lt;code&gt;POST /v1/analyze&lt;/code&gt; endpoint, Amazon API Gateway receives the request. After the Lambda Authorizer validates the JWT token or API key, the Router Lambda routes the request. The Pre-processing Lambda uploads the image to S3 and performs preprocessing (resizing, normalization) to match the model’s input format. The preprocessed data is sent to the SageMaker real-time endpoint where inference is executed, and the results are converted into a client-friendly JSON format via the Post-processing Lambda and returned.&lt;/p&gt;

&lt;h3&gt;
  
  
  CDK Stack Structure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzdoz7cdqzqm5gn7e6m2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzdoz7cdqzqm5gn7e6m2.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The entire infrastructure is divided into four CDK Stacks.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;th&gt;Key Resources&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;StorageStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;S3 (Input/Output/Model)&lt;/td&gt;
&lt;td&gt;Storage for data and model artifacts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ModelStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SageMaker Model, EndpointConfig, Endpoint&lt;/td&gt;
&lt;td&gt;AI model hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ApiStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API Gateway, Lambda x3&lt;/td&gt;
&lt;td&gt;Request processing pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ObservabilityStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CloudWatch, X-Ray&lt;/td&gt;
&lt;td&gt;Monitoring and tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reason for separating the stacks is to enable &lt;strong&gt;independent deployment and rollback&lt;/strong&gt;. When updating a model, only the &lt;code&gt;ModelStack&lt;/code&gt; needs to be redeployed, and when modifying API logic, only the &lt;code&gt;ApiStack&lt;/code&gt; needs to be updated. This is a critical design decision in a production environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. StorageStack: Configuring the Data Layer
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/storage-stack.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-s3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StorageStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Expose it publicly so that it can be referenced from other stacks&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;inputBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;outputBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;modelBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Input Image Buckets: Cost Optimization with Lifecycle Policies&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputBucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;InputBucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-input-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;removalPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;lifecycleRules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="c1"&gt;// Processed images are automatically deleted after 7 days&lt;/span&gt;
          &lt;span class="na"&gt;expiration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;days&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processed/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;cors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;allowedMethods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HttpMethods&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HttpMethods&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;allowedOrigins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;allowedHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Result Bucket: Cost Optimization with Intelligent-Tiering&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputBucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OutputBucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-output-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;removalPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;intelligentTieringConfigurations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EntireBucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;archiveAccessTierTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;days&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="na"&gt;deepArchiveAccessTierTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;days&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Model Artifact Bucket: Enable Version Control&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelBucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ModelBucket&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-models-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;versioned&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;removalPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical Tip:&lt;/strong&gt; Including &lt;code&gt;${this.account}-${this.region}&lt;/code&gt; in your S3 bucket name can help prevent naming conflicts during multi-region deployments. Additionally, &lt;code&gt;removalPolicy: RETAIN&lt;/code&gt; is an important setting that ensures your data is preserved even if you accidentally delete the stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. ModelStack: Configuring SageMaker Endpoints
&lt;/h2&gt;

&lt;p&gt;Configuring SageMaker endpoints is one of the most complex aspects of the CDK. You must manage the model artifact path, container image, instance type, and autoscaling policy entirely through code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/model-stack.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;sagemaker&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-sagemaker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-s3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ModelStackProps&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;modelBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// e.g., 'v1.2.0'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;endpointName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelStackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpointName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-endpoint-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// SageMaker execution role&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sagemakerRole&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SageMakerRole&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;assumedBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ServicePrincipal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sagemaker.amazonaws.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;managedPolicies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ManagedPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAwsManagedPolicyName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AmazonSageMakerFullAccess&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Grant read permissions for the model bucket&lt;/span&gt;
    &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sagemakerRole&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Define SageMaker model&lt;/span&gt;
    &lt;span class="c1"&gt;// Use PyTorch inference container (AWS Deep Learning Container)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VisionAIModel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-model-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;executionRoleArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sagemakerRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;primaryContainer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// AWS-provided PyTorch inference containers&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`763104351884.dkr.ecr.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;modelDataUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`s3://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/models/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/model.tar.gz`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;SAGEMAKER_PROGRAM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;inference.py&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;SAGEMAKER_SUBMIT_DIRECTORY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/opt/ml/model/code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;MODEL_VERSION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Endpoint Configuration: GPU Instance + Data Capture&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;endpointConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnEndpointConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EndpointConfig&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;endpointConfigName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`vaiaas-config-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;productionVariants&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;variantName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AllTraffic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;instanceType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ml.g4dn.xlarge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// NVIDIA T4 GPU&lt;/span&gt;
          &lt;span class="na"&gt;initialInstanceCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;initialVariantWeight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="c1"&gt;// Inference Data Capture: Utilization for Model Quality Monitoring&lt;/span&gt;
      &lt;span class="na"&gt;dataCaptureConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;enableCapture&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;initialSamplingPercentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;destinationS3Uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`s3://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;modelBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/data-capture/`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;captureOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;captureMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Input&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;captureMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Output&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Real-time endpoint deployment&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Endpoint&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;endpointName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpointName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;endpointConfigName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;endpointConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpointConfigName&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Auto Scaling Settings (Application Auto Scaling)&lt;/span&gt;
    &lt;span class="c1"&gt;// Automatically adjusts between a minimum of 1 and a maximum of 4 instances based on traffic&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scalingTarget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ScalingTarget&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS::ApplicationAutoScaling::ScalableTarget&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;MaxCapacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;MinCapacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ResourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`endpoint/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpointName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/variant/AllTraffic`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ScalableDimension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sagemaker:variant:DesiredInstanceCount&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ServiceNamespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sagemaker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;RoleARN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sagemakerRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;scalingTarget&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDependency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EndpointNameOutput&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpointName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;exportName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VaiaasSageMakerEndpointName&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical Tip:&lt;/strong&gt; &lt;code&gt;ml.g4dn.xlarge&lt;/code&gt; is an instance equipped with an NVIDIA T4 GPU, offering the best price-performance ratio for Vision AI inference. We initially used &lt;code&gt;ml.g4dn.2xlarge&lt;/code&gt;, but after conducting actual load tests, we confirmed that &lt;code&gt;ml.g4dn.xlarge&lt;/code&gt; was sufficient and switched to it. &lt;strong&gt;Be sure to measure your actual model’s memory usage and inference time before selecting an instance type.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. ApiStack: Request Processing Pipeline
&lt;/h2&gt;

&lt;p&gt;The API layer consists of three Lambda functions. Each Lambda function is designed according to the Single Responsibility Principle (SRP), allowing them to be deployed and tested independently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/api-stack.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-apigateway&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-s3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ApiStackProps&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;inputBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;outputBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;sagemakerEndpointName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ApiStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ApiStackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Lambda Common Layer (common utilities, latest version of boto3, etc.)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commonLayer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LayerVersion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CommonLayer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lambda/layers/common&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;compatibleRuntimes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PYTHON_3_11&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Common utilities and dependencies&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// 1. Lambda Authorizer: JWT Validation&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authorizerFn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AuthorizerFunction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PYTHON_3_11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lambda/authorizer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;JWT_SECRET_ARN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:secretsmanager:...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TokenAuthorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;JwtAuthorizer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;authorizerFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;resultsCacheTtl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// Reducing latency through authentication result caching&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Router Lambda: Request Routing and S3 Uploads&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;routerFn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RouterFunction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PYTHON_3_11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lambda/router&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;commonLayer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;INPUT_BUCKET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;OUTPUT_BUCKET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;SAGEMAKER_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sagemakerEndpointName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;memorySize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Tracing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ACTIVE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Enable X-Ray Tracking&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Granting S3 and SageMaker permissions&lt;/span&gt;
    &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;inputBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantReadWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;routerFn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputBucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantWrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;routerFn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;routerFn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addToRolePolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sagemaker:InvokeEndpoint&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`arn:aws:sagemaker:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:endpoint/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sagemakerEndpointName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}));&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. API Gateway Configuration&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VaiaaasApi&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;restApiName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VAIaaS API&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Vision AI as a Service REST API&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;deployOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;stageName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;tracingEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;metricsEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;loggingLevel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MethodLoggingLevel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;// Throttling: 100 requests per second, with a burst of 200 requests&lt;/span&gt;
        &lt;span class="na"&gt;throttlingRateLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;throttlingBurstLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="c1"&gt;// CORS Configuration&lt;/span&gt;
      &lt;span class="na"&gt;defaultCorsPreflightOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;allowOrigins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Cors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALL_ORIGINS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;allowMethods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OPTIONS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;allowHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// POST /v1/analyze endpoint&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analyzeResource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analyze&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;analyzeResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addMethod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LambdaIntegration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;routerFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// The maximum timeout for the API Gateway is 29 seconds&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;authorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;authorizationType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AuthorizationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CUSTOM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;// Request validation: Content-Type header required&lt;/span&gt;
        &lt;span class="na"&gt;requestValidator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RequestValidator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RequestValidator&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;restApi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;validateRequestBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;validateRequestParameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ApiUrl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VAIaaS API Gateway URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. Issues Encountered in Practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue 1: SageMaker Endpoint Cold Start
&lt;/h3&gt;

&lt;p&gt;SageMaker real-time endpoints do not have cold start issues because the instances are always running. However, &lt;strong&gt;during the initial deployment, it takes 5–15 minutes for the endpoint to transition from the &lt;code&gt;Creating&lt;/code&gt; state to the &lt;code&gt;InService&lt;/code&gt; state&lt;/strong&gt;. This wait time is included during CDK deployment, which extends the total deployment time.&lt;/p&gt;

&lt;p&gt;As a solution, we set the CDK’s &lt;code&gt;waitForDeployment&lt;/code&gt; option to &lt;code&gt;false&lt;/code&gt; and adopted a method of polling the endpoint status in a separate deployment pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 2: Timeout When Calling SageMaker from Lambda
&lt;/h3&gt;

&lt;p&gt;The maximum integration timeout for API Gateway is &lt;strong&gt;29 seconds&lt;/strong&gt;. In some cases, Vision AI inference for high-resolution images exceeded this limit.&lt;/p&gt;

&lt;p&gt;We implemented two parallel approaches as a solution. First, we resized images to the model’s input size (e.g., 640×640) in a pre-processing Lambda function to reduce inference time. Second, we introduced an &lt;strong&gt;asynchronous processing pattern&lt;/strong&gt; for requests with long processing times. The client receives a request ID immediately and later polls for the results, which are processed asynchronously via SQS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Issue 3: SageMaker Endpoint Cost Optimization
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ml.g4dn.xlarge&lt;/code&gt; instance costs approximately $0.736 per hour. Running it 24 hours a day results in a monthly cost of about $530. The issue was that costs were incurred even during nighttime hours when there was no traffic.&lt;/p&gt;

&lt;p&gt;We considered &lt;strong&gt;SageMaker Serverless Inference&lt;/strong&gt; as a solution, but since it does not support GPUs, we had to switch to CPU inference. Ultimately, we analyzed traffic patterns and implemented a scheduling strategy that sets the &lt;code&gt;MinCapacity&lt;/code&gt; for autoscaling to 0 during nighttime hours. This reduced monthly costs by approximately 40%.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Deployment and Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CDK Deployment Command
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy the entire stack (for the initial deployment)&lt;/span&gt;
cdk bootstrap aws://ACCOUNT_ID/REGION
cdk deploy &lt;span class="nt"&gt;--all&lt;/span&gt;

&lt;span class="c"&gt;# Deploy a specific stack (when updating a model)&lt;/span&gt;
cdk deploy ModelStack &lt;span class="nt"&gt;--parameters&lt;/span&gt; &lt;span class="nv"&gt;modelVersion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;v1.3.0

&lt;span class="c"&gt;# Review changes before deployment&lt;/span&gt;
cdk diff ApiStack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model Update Workflow
&lt;/h3&gt;

&lt;p&gt;When deploying a new model version, use the &lt;strong&gt;Blue/Green deployment&lt;/strong&gt; strategy. When updating an endpoint, SageMaker prepares new instances while keeping the existing ones running, and switches traffic once the new instances are ready. In the CDK, you can implement this by creating a new &lt;code&gt;EndpointConfig&lt;/code&gt; and updating the &lt;code&gt;endpointConfigName&lt;/code&gt; of the &lt;code&gt;Endpoint&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring Dashboard
&lt;/h3&gt;

&lt;p&gt;Monitor the following metrics on the CloudWatch dashboard.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Alarm Condition&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker &lt;code&gt;ModelLatency&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;5,000ms&lt;/td&gt;
&lt;td&gt;When P99 is exceeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker &lt;code&gt;Invocations&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;Track calls per minute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda &lt;code&gt;Duration&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;25,000ms&lt;/td&gt;
&lt;td&gt;When P95 is exceeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway &lt;code&gt;5XXError&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;When error rate is exceeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker &lt;code&gt;CPUUtilization&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;Auto-scaling trigger&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;While building a pipeline for Vision AI serving using AWS CDK, the most significant insight I gained was that &lt;strong&gt;managing infrastructure as code is not merely a matter of convenience, but a matter of service quality&lt;/strong&gt;. Thanks to CDK, model updates, infrastructure changes, and environment replication all went through a code review process, which greatly improved operational stability.&lt;/p&gt;

&lt;p&gt;In the next post, I plan to cover how to integrate &lt;strong&gt;SageMaker Model Monitor&lt;/strong&gt; into this pipeline to detect model drift in a production environment.&lt;/p&gt;

&lt;p&gt;I hope this post is helpful to those looking to build services using Vision AI models. Please leave any questions or feedback in the comments!&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;AWS CDK Developer Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html" rel="noopener noreferrer"&gt;Amazon SageMaker Developer Guide - Deploy Real-Time Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html" rel="noopener noreferrer"&gt;Amazon SageMaker Serverless Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_sagemaker-readme.html" rel="noopener noreferrer"&gt;AWS CDK API Reference - SageMaker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws-builders/developing-service-with-aws-cdk-3p8l"&gt;Developing service with AWS CDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/welcome.html" rel="noopener noreferrer"&gt;AWS Well-Architected Framework - Machine Learning Lens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>awscdk</category>
      <category>visionai</category>
      <category>devops</category>
      <category>cdk</category>
    </item>
    <item>
      <title>AWS Amplify vs Netlify comparison for hosting static websites</title>
      <dc:creator>Yeonggyoo Jeon</dc:creator>
      <pubDate>Sun, 31 Mar 2024 11:55:40 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-amplify-vs-netlify-comparison-for-hosting-static-websites-3hlg</link>
      <guid>https://dev.to/aws-builders/aws-amplify-vs-netlify-comparison-for-hosting-static-websites-3hlg</guid>
      <description>&lt;h1&gt;
  
  
  Before we get into
&lt;/h1&gt;

&lt;p&gt;When you're working on a software development project, you might need a simple demo webpage or demo app to demonstrate your technology or do a POC.  Static web pages that are organized as a simple Single Page App (SPA) are useful at this time. There are many web development frameworks that help you implement SPA.(Next.js, Gatsby, Hugo, Nuxt, Jekyll, etc...) You can create a demo app with SPA and host it on the web as a simple static web page to ensure accessibility.  There are many services that support various frameworks that can be used for this (Netlify, StackBlitx, AWS Amplify, etc...). There are also services (including frameworks) such as Streamlit and Gradio that allow you to create a SPA-based demo web with Python and host it on the web.  These services specialize in rapid web app development for machine learning models for services (I'll cover them in another article when I get a chance). &lt;/p&gt;

&lt;h1&gt;
  
  
  Configuring the demo app with Next.js
&lt;/h1&gt;

&lt;p&gt;Next.js is an open source framework for developing React-based web applications. It offers a wide range of features such as server-side rendering (SSR), static site generation (SSG), automatic code splitting, image optimization, and more to enable fast and user-friendly web app development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key features of Next.js:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Server-side rendering: Render HTML, CSS, and JavaScript on the server before sending the page to the client to speed up page loading.&lt;/li&gt;
&lt;li&gt;Static site generation: Deploy your web app as a static site by generating HTML, CSS, and JavaScript as static files during the build phase.&lt;/li&gt;
&lt;li&gt;Automatic code splitting: Load only the code you need to speed up page loading.&lt;/li&gt;
&lt;li&gt;Image optimization: Automatically optimize images to improve the performance of your web app.&lt;/li&gt;
&lt;li&gt;SEO-friendly: Server-side rendering makes it easy for search engine crawlers to index your pages.&lt;/li&gt;
&lt;li&gt;Multiple routing options: Along with default routing, we support file-based routing, dynamic routing, catchall routing, and more.&lt;/li&gt;
&lt;li&gt;API routing: Easily create and manage server-side APIs.&lt;/li&gt;
&lt;li&gt;Community support: We have an active community and a wealth of documentation and tutorials.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next.js is right for you if you:&lt;br&gt;
You want to develop fast, user-friendly web apps.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To create SEO-optimized web apps&lt;/li&gt;
&lt;li&gt;To develop React-based web apps&lt;/li&gt;
&lt;li&gt;Framework that offers a wide range of features&lt;/li&gt;
&lt;li&gt;Next.js is a powerful framework that is easy to learn, easy to use, and offers a wide range of features that can greatly simplify web app development.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Learn more about Next.js:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Official documentation: &lt;a href="https://nextjs.org/docs/" rel="noopener noreferrer"&gt;https://nextjs.org/docs/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Tutorials: &lt;a href="https://nextjs.org/learn/" rel="noopener noreferrer"&gt;https://nextjs.org/learn/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Community: &lt;a href="https://discord.gg/nextjs" rel="noopener noreferrer"&gt;https://discord.gg/nextjs&lt;/a&gt;

&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Hume AI's demo app
&lt;/h2&gt;

&lt;p&gt;Hume AI is a company that provides an API that recognizes human emotions from text, voice, and vision using AI technology.   They have a demo app based on Next.js that can service their API, so I tested it using it. In this article, I will build and run this web app locally and finally deploy it to Netlify and AWS Amplify.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App Code Repository : &lt;a href="https://github.com/HumeAI/hume-api-examples" rel="noopener noreferrer"&gt;https://github.com/HumeAI/hume-api-examples&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can try it out here: &lt;a href="https://www.hume.ai" rel="noopener noreferrer"&gt;https://www.hume.ai&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nodejs.org/" rel="noopener noreferrer"&gt;Node&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Download Demo app code
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/HumeAI/hume-api-examples
&lt;span class="nb"&gt;cd &lt;/span&gt;hume-api-examples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Development
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Build
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Development mode will start serving on &lt;code&gt;localhost:3001&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  The choice for hosting static websites
&lt;/h1&gt;

&lt;p&gt;Two popular options that web developers often utilize to host static websites or single page apps (SPAs) are Netlify and AWS Amplify. Both platforms support automatic builds and hosting in Git repositories, making static site deployment very easy. In this article, we'll compare the pros and cons of each based on real-world examples of NextJS app deployments.&lt;br&gt;
We've deployed Hume AI's Example demo app written in Next.js to &lt;strong&gt;Netlify&lt;/strong&gt; and &lt;strong&gt;AWS Amplify&lt;/strong&gt; respectively.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Netlify: &lt;a href="https://6575c5e5fc89902adec12a2c--delicate-dodol-3779df.netlify.app" rel="noopener noreferrer"&gt;https://6575c5e5fc89902adec12a2c--delicate-dodol-3779df.netlify.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Amplify: &lt;a href="https://dev.d1c4gaqa0opsnq.mynextapp.com/" rel="noopener noreferrer"&gt;https://dev.d1c4gaqa0opsnq.mynextapp.com/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Netlify
&lt;/h2&gt;

&lt;p&gt;Netlify brands itself as an "All-in-one platform for automating modern web projects." It offers continuous deployment from Git, serverless functions, a global CDN, domain management, SSL/TLS certificates, and more.&lt;/p&gt;
&lt;h3&gt;
  
  
  Deploy and Hosting
&lt;/h3&gt;

&lt;p&gt;The simplest way to host with Netlify is to upload the output files built with 'Deploy manually' by dragging and dropping or browsing to upload, and you can easily connect to the automatically generated URL to host your web app.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrdvzd1aksvbt5qh5akp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrdvzd1aksvbt5qh5akp.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frggxwak1tapwnklfmmsx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frggxwak1tapwnklfmmsx.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After it finishes running, you'll have a URL hosted by Netlify.&lt;br&gt;
Netify url : &lt;a href="https://6575c5e5fc89902adec12a2c--delicate-dodol-3779df.netlify.app" rel="noopener noreferrer"&gt;https://6575c5e5fc89902adec12a2c--delicate-dodol-3779df.netlify.app&lt;/a&gt; &lt;br&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.Excellent developer experience with instant cache invalidation and atomic deploys
2. Simple HTTPS setup with free Let's Encrypt SSL/TLS certificates
3. Form handling without needing a backend
4. Serverless functions with generous free tier
5. Plugin ecosystem for extending builds
6. GitHub integration with deploy previews
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Paid plans required for features like split testing, analytics, etc.
2. Limited control over CDN caching and behavior
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  AWS Amplify
&lt;/h2&gt;

&lt;p&gt;AWS Amplify is a service that provides an all-in-one solution to build and deploy full-stack serverless web apps, including hosting for single page apps and static sites. It layers on top of other AWS services.&lt;/p&gt;
&lt;h3&gt;
  
  
  Deploy and Hosting
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Requirements
&lt;/h4&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- [Node](https://nodejs.org/)
- AWS Account
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;
  
  
  Install Amplify CLI
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @aws-amplify/cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Configure the Amplify CLI
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;amplify configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Adding Amplify hosting
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;amplify init

? Enter a name &lt;span class="k"&gt;for &lt;/span&gt;the project: mynextapp
? Enter a name &lt;span class="k"&gt;for &lt;/span&gt;the environment: dev
? Choose your default editor: Visual Studio Code &lt;span class="o"&gt;(&lt;/span&gt;or your preferred editor&lt;span class="o"&gt;)&lt;/span&gt;
? Choose the &lt;span class="nb"&gt;type &lt;/span&gt;of app that youre building: javascript
? What javascript framework are you using: react
? Source Directory Path: src
? Distribution Directory Path: out
? Build Command: npm run-script build
? Start Command: npm run-script start
? Do you want to use an AWS profile? Y
? Please choose the profile you want to use: &amp;lt;your profile&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Add hosting with the Amplify add command:
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;amplify add hosting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Deploy the app with the Amplify publish command:
&lt;/h4&gt;

&lt;p&gt;You can integrate Amplify's Framework via Amplify configuration in the Source Code folder without accessing the Web UI, and run it from build to deployment in a single step using the 'amplify publish' command in the CLI environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;amplify publish

✔ Successfully pulled backend environment dev from the cloud.

    Current Environment: dev

┌──────────┬────────────────┬───────────┬───────────────────┐
│ Category │ Resource name  │ Operation │ Provider plugin   │
├──────────┼────────────────┼───────────┼───────────────────┤
│ Hosting  │ amplifyhosting │ Create    │ awscloudformation │
└──────────┴────────────────┴───────────┴───────────────────┘

? Are you sure you want to &lt;span class="k"&gt;continue&lt;/span&gt;? Yes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After it finishes running, you'll have a URL hosted by AWS Amplify.&lt;br&gt;
like 'Amplify url : &lt;a href="https://dev.d1c4gaqa0opsnq.mynextapp.com/" rel="noopener noreferrer"&gt;https://dev.d1c4gaqa0opsnq.mynextapp.com/&lt;/a&gt;'&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Tightly integrated with the AWS ecosystem

&lt;ol&gt;
&lt;li&gt;Custom CDN behaviors and cache policies&lt;/li&gt;
&lt;li&gt;Feature branches and pull request previews&lt;/li&gt;
&lt;li&gt;CI/CD capabilities beyond just hosting&lt;/li&gt;
&lt;li&gt;Built-in monitoring and logging
&lt;/li&gt;
&lt;/ol&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h4&gt;


Cons:
&lt;/h4&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. More complex configuration than Netlify
&lt;li&gt;No free tier, pay-per-use pricing&lt;/li&gt;
&lt;li&gt;SSL certificates require provisioning
&lt;/li&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;


Conclusion
&lt;/h1&gt;


&lt;p&gt;Both &lt;strong&gt;Netlify and AWS Amplify offer powerful solutions for hosting static web pages with unique features and considerations&lt;/strong&gt;. &lt;strong&gt;Netlify is ideal for small to medium-sized projects with simple deployment requirements due to its **simplicity and ease of use&lt;/strong&gt;.  On the other hand, &lt;strong&gt;AWS Amplif&lt;/strong&gt;y offers &lt;strong&gt;advanced customization options and seamless integration with the broader AWS ecosystem&lt;/strong&gt;, making it ideal for larger projects that require scalability and flexibility.  Ultimately, the choice between Netlify and AWS Amplify depends on your project requirements, budget, and familiarity with the respective platforms.&lt;/p&gt;

&lt;p&gt;In the end, the best platform for you depends on your specific needs and skill level. Both Netlify and AWS Amplify offer free trials, so we recommend trying both to see which platform is best for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Information
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Allow insecure connection on web browser setting
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Setting up for insecure content &lt;br&gt;
&lt;/h4&gt;

&lt;p&gt;For Mixed content problem caused by insecure(without ssl) connection with another app manager&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chrome : Settings &amp;gt; Privacy and security &amp;gt; Site settings &amp;gt; Additional content settings &amp;gt; Insecure content &lt;/li&gt;
&lt;li&gt;Add the sites : 

&lt;ul&gt;
&lt;li&gt;[*.]netlify.app&lt;/li&gt;
&lt;li&gt;[*.]amplifyapp.com&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  References
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.netlify.com/jamstack/" rel="noopener noreferrer"&gt;https://www.netlify.com/jamstack/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.netlify.com/get-started/" rel="noopener noreferrer"&gt;https://docs.netlify.com/get-started/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.amplify.aws/javascript/deploy-and-host/frameworks/deploy-nextjs-app/" rel="noopener noreferrer"&gt;https://docs.amplify.aws/javascript/deploy-and-host/frameworks/deploy-nextjs-app/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/HumeAI/hume-api-examples" rel="noopener noreferrer"&gt;https://github.com/HumeAI/hume-api-examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.damirscorner.com/blog/posts/20210528-AllowingInsecureWebsocketConnections.html" rel="noopener noreferrer"&gt;https://www.damirscorner.com/blog/posts/20210528-AllowingInsecureWebsocketConnections.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>netlify</category>
      <category>webdev</category>
      <category>singlepageapp</category>
    </item>
    <item>
      <title>Developing service with AWS CDK</title>
      <dc:creator>Yeonggyoo Jeon</dc:creator>
      <pubDate>Fri, 28 Oct 2022 02:31:17 +0000</pubDate>
      <link>https://dev.to/aws-builders/developing-service-with-aws-cdk-3p8l</link>
      <guid>https://dev.to/aws-builders/developing-service-with-aws-cdk-3p8l</guid>
      <description>&lt;h1&gt;
  
  
  1. What is AWS CDK(Cloud Development Kit)?
&lt;/h1&gt;

&lt;p&gt;AWS CDK is an open-sourced software development framework which possible to design cloud infra with codes.  AWS opened to the public in July 2019.  The &lt;a href="https://aws.amazon.com/cdk/" rel="noopener noreferrer"&gt;official website&lt;/a&gt; defines it as "lets you define your cloud infrastructure as code in one of its supported programming languages".  Unlike the previous IaC, the CDK provides a syntax for configuring cloud resources with default values verified using existing familiar programming languages.  This allows non-professionals to start developing cloud applications by configuring service infra quickly, easily, and in a secure and reusable way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which IaC would I use while I develop with AWS?
&lt;/h3&gt;

&lt;p&gt;In my case, Infrastructure management in service development using the public cloud seems to have gone through the following process.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h3&gt;
  
  
  a. Deploy/Manage service infra using the UI console
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;If you use the AWS Console to configure the infrastructure for service development or hands-on, and o design and deploy the entire service architecture with a mouse click, you would have the question 'Is it really possible to configure a complex system through this process?'.  In addition, it becomes inconvenient when you have to build the same infrastructure elements over and over again or rebuild the entire infrastructure across multiple clusters.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  b. Use the CLI command
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Feeling limited by the click-and-click approach, people usually start deploying services and configuring cloud infrastructure using the AWS CLI as the script of the command line interface.  However, it is hard to cope with retrying in the error situation or coping with the processing the race condition in the multi-task.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  c. Interest into the IaC (Infra as Code)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;You can also learn AWS CloudFormation script, tools that allow you to manage service infrastructure.  Even you can choose Hashicorp's Terraform to describe and manage the infrastructure components in a corresponding format of Terraform script.  However, I was doubtful when I look at those kinds of scripts as IaC and watched the process of the work with it from the side.  From a programmer's view, the previous IaC script with the format of JSON or Yaml seems like too much repeating of the same texts.  Therefore, it was clear that even a small increase in the size of the system would result in the number and files becoming too large to manage.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  d. CDK, the truly programmable IaC
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;I found out about the AWS CDK when I felt that scripting IaC is not my way.  CDK drew interestd me who was developing software for a long time as it is possible to write programming language and design it with software development skills that were getting improved over decades.  Therefore, I thought that it would possible to be more efficient in designing, composing, and managing service infrastructure with CDK.  Furthermore, this idea has become even more solid now that I have completed the API service development project using CDK.&lt;br&gt;&lt;br&gt;
 The comparison with Terraform, the representative of the existing IaC, is attached in the table below.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;



&lt;h4&gt;
  
  
  Terraform VS AWS CDK
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;-&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Terraform&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;CDK&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Programming feature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implement with Yaml or HCL, Need to learn a new language different from the existing ones.  Auto-completion by installing an assist tools(Not perfect).  Difficult to implement safely due to compile errors, ... etc&lt;/td&gt;
&lt;td&gt;5 existing programming languages are possible to use.  Various extensions are possible using OOP.  flexible structure and reusability can be improved by using patterns.  Non-limited expandability by the author's programming ability.  Automation through existing IDE(VSCode).  Safe implementation through compile errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Workload for composing infra&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Many references to configuring infrastructure as an approach from IaaS&lt;/td&gt;
&lt;td&gt;Combining the latest technologies in the container/serverless architecture rather than IaaS.  Fully support IaaS too.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Support for Public Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Support various public cloud&lt;/td&gt;
&lt;td&gt;Specialized in AWS.  High grouth potential because ecosystems such as CDK for Terraform and CDK8S ar being created.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License, Maturity and Stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires an enterprise contract for more than basic feature.  Slightly lowered the stability in deployment as directly with SDK&lt;/td&gt;
&lt;td&gt;Free for CloudFormation, ParameterStore, Matured the stability in deployment using CloudFormation as the backend.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Who's needed for?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Rather than using the AWS Console, you want more flexible, convenient, and robust infrastructure management.&lt;/li&gt;
&lt;li&gt;I want to configure/manage resources using IaC, but I can't afford to manage monotonous and huge yaml files.&lt;/li&gt;
&lt;li&gt;I want to actively utilize my programming-related development skills for IaC writing.&lt;/li&gt;
&lt;li&gt;I want the optimal IaC to configure the service with a serverless architecture in AWS.
Developers with the above concerns can consider CDK as a tool for designing and managing service infrastructure.

&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How is it operate?
&lt;/h3&gt;

&lt;p&gt;Written in a software development language, cloud infrastructure can be modeled as reusable components. Currently, 5 development languages (Typescript, Javascript, Python, Java, C#) are supported.  If you look at the operation process of CDK, the written CDK application is executed with CDK CLI, synthesized into CloudFormation Template, and distributed through AWS CloudFormation.  It might help to understand if you think of it as running through.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93xinpbohft90zacpzyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93xinpbohft90zacpzyy.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
[Deploying infra by writing code]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyicr6cwv3zokgcq3ml5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyicr6cwv3zokgcq3ml5.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  2. How to start?
&lt;/h1&gt;

&lt;p&gt;Since the official Amazon documentation is well done, you can easily access it by referring to the documentation.  You can refer to the Developer Guide and refer to the API Reference for detailed specifications of APIs during actual development.  In the case of CDK API, since AWS services are well abstracted, I think that it is a good approach to refer to the CDK API Reference as a way to learn AWS services.  In addition, a hands-on lab for first-time users is provided through the CDK Workshop page.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developer Guide (&lt;a href="https://docs.aws.amazon.com/cdk/latest/guide/home.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cdk/latest/guide/home.html&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;API Reference (&lt;a href="https://docs.aws.amazon.com/cdk/api/latest/docs/aws-construct-library.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cdk/api/latest/docs/aws-construct-library.html&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;CDK Workshop (&lt;a href="https://cdkworkshop.com/" rel="noopener noreferrer"&gt;https://cdkworkshop.com/&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;



&lt;h2&gt;
  
  
  Development using AWS CDK
&lt;/h2&gt;

&lt;p&gt;cdk is implemented using node. Therefore, no matter what programming language is used for development, node must be installed by default. Install aws-cdk using npm in the environment where node is installed. After initializing the project through 'cdk init', install the necessary node libraries through 'npm install'.  Preparations for cdk development are complete, and now, cdk development through coding and development of the stack are checked for errors in the code through 'cdk list'.  You just have to proceed. During development and debugging, 'list', 'diff', and 'synth' are used, and 'cdk deploy' is repeatedly used when deploying the confirmed code.  If the service is finally terminated or the infrastructure needs to be deleted, you can cleanly delete the installed applications and infrastructure with 'cdk destroy'.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install CDK CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install -g aws-cdk
cdk version
mkdir hello-cdk
cd hello-cdk
cdk init --language [typescript/javascript/python/java/csharp]
npm install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Commands of CDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cdk bootstrap       # Deploy the stack for the CDK Toolkit in your AWS environment
cdk init        # Initializes a new default application in the language of the user's choice
cdk diff        # Identify differences between local AWS CDK code and applications running on AWS 
cdk synth       # Compiling AWS CDK Applications to AWS CloudFormation Templates
cdk deploy      # Deploy the AWS CDK application to a set up AWS account 
cdk destory     # Delete the deployed CDK application 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;&lt;br&gt;
In this article, I will leave it as an introduction and plan a detailed article for developing with CDK next time for more detailed development methods.&lt;br&gt;
For those of you who want to test a little more, please use the reference link above and try the hands-on lab on the CDK Workshop page, it seems to be helpful for understanding.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  3. Feature in development with CDK
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Advantages of AWS development using CDK
&lt;/h3&gt;

&lt;p&gt;CDK is developed using Constructs that abstract AWS resources into high-level components. You can use the basic settings used in the best practices of AWS services just by referring to these Constructs, and further understand each service in depth just by understanding the API documents.&lt;br&gt;
&lt;br&gt;&lt;br&gt;
[AWS CDK Construct]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7911pt9sveciuplehbl7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7911pt9sveciuplehbl7.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;p&gt;On the AWS CDK official page, the advantages of developing using cdk are divided into 4 categories as follows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Easier cloud onboarding&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Even if you're new to using AWS, you can speed up your onboarding to the Cloud. cdk's API abstracts AWS resources into high-level components and is initialized with optimal default settings, so you can configure an appropriate system without an expert.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Faster development process&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Since the infrastructure is defined using the characteristics of the programming language, efficient and fast development is possible depending on how the logic such as OOP, loop, and conditional statements are configured.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Customizable and shareable&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It is possible to design with reusable components suitable for each requirement, and it is possible to quickly expand security, regulations, and compliance with requirements through easy sharing of libraryd components.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;No context switching&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Since code development and distribution are possible in the IDE in the development environment, developers can develop applications and manage infrastructure without changing or setting up a separate development environment.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/li&gt;

&lt;/ul&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  References in the solutions constructs
&lt;/h3&gt;

&lt;p&gt;There are structures that are patterned by arranging components that can be used repeatedly, and I have made them into solutions and made them into CDK Constructs. You can find it on the &lt;a href="https://docs.aws.amazon.com/solutions/latest/constructs/welcome.html" rel="noopener noreferrer"&gt;AWS Solutions Constructs&lt;/a&gt; page, take it, use it, or refer to it and reuse it in the system configuration you develop to more quickly construct your desired service design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem Expansion
&lt;/h3&gt;

&lt;p&gt;CDK is a great choice for development on AWS and has great advantages, especially when designing serverless architectures. However, you may think that it is not possible to use other public clouds or to directly build an on-premise or IaaS-focused Kubernetes cluster. In order to solve these points, there are projects that can link the advantages of CDK to other platforms and make Kubernetes design with cdk, together with other platforms or CNCF.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/hashicorp/terraform-cdk" rel="noopener noreferrer"&gt;Terraform-cdk&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It was started supporting the CDK as a tool for defining and provisioning infrastructure using Terraform.&lt;br&gt;
[Terraform Providers]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufzf8866af5eb9k7w417.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufzf8866af5eb9k7w417.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
(&lt;a href="https://www.hashicorp.com/blog/cdk-for-terraform-enabling-python-and-typescript-support/" rel="noopener noreferrer"&gt;https://www.hashicorp.com/blog/cdk-for-terraform-enabling-python-and-typescript-support/&lt;/a&gt;)&lt;br&gt;
(&lt;a href="https://www.hashicorp.com/blog/announcing-cdk-for-terraform-0-1" rel="noopener noreferrer"&gt;https://www.hashicorp.com/blog/announcing-cdk-for-terraform-0-1&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://cdk8s.io/" rel="noopener noreferrer"&gt;CDK8s&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can use CDK for Kubernetes (CDK8s) to define and manage Kubernetes applications. CDK8s is currently registered as &lt;a href="https://www.cncf.io/sandbox-projects/" rel="noopener noreferrer"&gt;CNCF's Sandbox Project&lt;/a&gt;.&lt;br&gt;
[Workflow of CDK8s]&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jzk5sm3h1nta22s9jqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jzk5sm3h1nta22s9jqh.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;br&gt;
(&lt;a href="https://aws.amazon.com/ko/blogs/korea/using-cdk8s-for-kubernetes-applications/" rel="noopener noreferrer"&gt;https://aws.amazon.com/ko/blogs/korea/using-cdk8s-for-kubernetes-applications/&lt;/a&gt;)&lt;br&gt;
(&lt;a href="https://aws.amazon.com/blogs/containers/introducing-cdk-for-kubernetes/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/containers/introducing-cdk-for-kubernetes/&lt;/a&gt;)&lt;br&gt;
&lt;br&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Ending
&lt;/h1&gt;

&lt;p&gt;In this article, I introduced the CDK and briefly explained the development flow of how development proceeds. In the next article, I'm planning an article that explains in more detail how to proceed with CDK development.  My team is developing a system that provides APIs related to Vision AI using AWS CDK.  Next, I will be able to share related tips and development tips on the blog while development is in progress.  &lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cdk</category>
      <category>iac</category>
    </item>
    <item>
      <title>Applying for the AWS Community Builder</title>
      <dc:creator>Yeonggyoo Jeon</dc:creator>
      <pubDate>Tue, 25 Oct 2022 07:31:57 +0000</pubDate>
      <link>https://dev.to/aws-builders/applying-for-the-aws-community-builder-48ha</link>
      <guid>https://dev.to/aws-builders/applying-for-the-aws-community-builder-48ha</guid>
      <description>&lt;p&gt;AWS Community Builder is a DevRel(Developer Relation) program operated by AWS.  AWS operates those two DevRel programs each the AWS community builder and the AWS hero.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. What is the AWS community builder
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce2rqdnwpa52pnbgwknm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce2rqdnwpa52pnbgwknm.png" alt=" " width="800" height="153"&gt;&lt;/a&gt;&lt;br&gt;
Official webpage : &lt;a href="https://aws.amazon.com/developer/community/community-builders/" rel="noopener noreferrer"&gt;https://aws.amazon.com/developer/community/community-builders/&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One of the DevRel programs which being officially operated by AWS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide technical resources, helps and opportunities for human networking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Share technical knowledge about AWS and connect global tech communities.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Operated in the technical areas for each category below&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containers&lt;/li&gt;
&lt;li&gt;Data (Databases, Analytics and BI)&lt;/li&gt;
&lt;li&gt;Developer Tools&lt;/li&gt;
&lt;li&gt;Front-End Web and Mobile&lt;/li&gt;
&lt;li&gt;Game Tech&lt;/li&gt;
&lt;li&gt;Graviton/Arm Development&lt;/li&gt;
&lt;li&gt;Cloud Ops&lt;/li&gt;
&lt;li&gt;Machine Learning&lt;/li&gt;
&lt;li&gt;Network Content &amp;amp; Delivery&lt;/li&gt;
&lt;li&gt;Security &amp;amp; Identity&lt;/li&gt;
&lt;li&gt;Serverless&lt;/li&gt;
&lt;li&gt;Storage
There are no restrictions on activities by category, although applications are made for each one.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  2. What the benefits are providing?
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje8hacboos66jneyqi0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje8hacboos66jneyqi0z.png" alt=" " width="800" height="216"&gt;&lt;/a&gt;&lt;br&gt;
If you participate as the Community Builder, would be invited to the Slack channel which bring together the program operators and builders.  The benefits of the Community builders introduced here are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connecting with others :&lt;/li&gt;
&lt;li&gt;Webinars &amp;amp; Briefings :&lt;/li&gt;
&lt;li&gt;Certification Exam Vouchers :&lt;/li&gt;
&lt;li&gt;Swag Welcome Kit :&lt;/li&gt;
&lt;li&gt;AWS Credit :&lt;/li&gt;
&lt;li&gt;A Free One-Year Subscription to Cloud Academy :&lt;/li&gt;
&lt;li&gt;Third-Party ISV Offers :&lt;/li&gt;
&lt;li&gt;Publish Content on Dev.to :&lt;/li&gt;
&lt;li&gt;Super sweet logos, Wallpaper &amp;amp; Assets&lt;/li&gt;
&lt;li&gt;re:Invent discount passes:&lt;/li&gt;
&lt;li&gt;Service beta opportunities&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;WelcomeKit&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4csurpsuzsrkjze0c051.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4csurpsuzsrkjze0c051.jpeg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  3. Ending
&lt;/h1&gt;

&lt;p&gt;It would great opportunity for those who are interested in AWS technology and like sharing knowledge with community or blog.  The application is made twice a year and the duration of the activity is one year and can be extended.  I hope that many people around the world will share their know-how and knowledge about AWS-related technologies through this community activity.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
