<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akash Hardia</title>
    <description>The latest articles on DEV Community by Akash Hardia (@akashhardia).</description>
    <link>https://dev.to/akashhardia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2982661%2F68df8609-45e8-4537-a0ea-c2889fd84f86.jpeg</url>
      <title>DEV Community: Akash Hardia</title>
      <link>https://dev.to/akashhardia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akashhardia"/>
    <language>en</language>
    <item>
      <title>Improve container boot time by lazy loading with SOCI</title>
      <dc:creator>Akash Hardia</dc:creator>
      <pubDate>Sat, 12 Apr 2025 12:53:39 +0000</pubDate>
      <link>https://dev.to/aws-builders/improve-container-boot-time-by-lazy-loading-with-soci-2b9d</link>
      <guid>https://dev.to/aws-builders/improve-container-boot-time-by-lazy-loading-with-soci-2b9d</guid>
      <description>&lt;p&gt;Managing a big sale event is not simple, and autoscaling is insufficient on its own because scaling takes a lot of time, mostly because it takes time for the hosts’ containers to boot up and process requests. It is just this problem that we are trying to solve today, and here is where the recent offering from AWS — “SOCI” can help us!  &lt;/p&gt;

&lt;p&gt;Originally published &lt;a href="https://medium.com/@akashhardia/improve-container-boot-time-on-ecs-fargate-by-lazy-loading-with-soci-7146d1e7e7dc" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Booting Containers — before SOCI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To speed up the container boot time, first, we look at some relevant container orchestration bits done by ECS.&lt;/p&gt;

&lt;p&gt;When a new ECS task is provisioned by AWS ECS Agent on the host either due to auto-scaling or scheduling… The snapshotter on the ecs task has to pull the container image onto the task first &amp;amp; once the image is downloaded &amp;amp; decompressed, then only the container is started.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9288az51wti8xr53wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9288az51wti8xr53wj.png" alt="ECS Agent provisioning a task which pulls image from ECR &amp;amp; starts container — traditional way" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Factors affecting container boot-time&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;After gaining a better understanding, two key points are evident here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Image size&lt;/em&gt; directly impacts the &lt;em&gt;time taken to boot the containers.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Image has to be &lt;strong&gt;first downloaded &amp;amp; decompressed&lt;/strong&gt; before the container can be started — unnecessary blocking.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, 1. Image size is influenced by application requirements and can be significantly reduced through optimization techniques like multi-stage build, etc. But, even with this reduced image size, it has to wait &amp;amp; download an image (could still be a large one) every time before a container can start serving the requests.&lt;/p&gt;

&lt;p&gt;It gets especially crucial in a sale when there’s a sudden spike in traffic &amp;amp; ECS decides to double the containers (200 -&amp;gt; 400). This may cause a lot of requests queuing &amp;amp; that’s not an experience we want for our customers. What’s worse? — we overprovision our resources as a safety net for this.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Lazy Loading with SOCI - for faster boot-times&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;So our focus today is on point 2; instead of waiting for the whole image to download &amp;amp; decompress first, we only pull some files out of the image from the registry &amp;amp; start the container with the available files while the rest can be lazy loaded in the background. In this non-blocking approach, the container is started a lot earlier than it was supposed to be. Shown by figure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnve1blnc4s6i8fvvwhy3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnve1blnc4s6i8fvvwhy3.png" alt="load the container image asynchronously" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Challenges for lazy loading&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;But wait…&lt;/p&gt;

&lt;p&gt;we don’t know &lt;strong&gt;which file is present in which layer&lt;/strong&gt; of the image&lt;/p&gt;

&lt;p&gt;&amp;amp; &lt;strong&gt;how to pull.. only some files&lt;/strong&gt;?… 🤔&lt;/p&gt;

&lt;p&gt;While the latter is taken care of by ECR (since, it’s an &lt;a href="https://aws.amazon.com/blogs/containers/oci-artifact-support-in-amazon-ecr/" rel="noopener noreferrer"&gt;OCI-compatible&lt;/a&gt; registry — an initiative to make different registries’ format, storage &amp;amp; distribution a bit generic across vendors). The former requires some additional steps in our docker build process.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A Docker Image&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A docker image is composed of multiple layers gelled together by a manifest file. These layers are nothing but tarballs storing various files in different compressed sections/spans. Many a time, common layers are shared by different images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyesrexle7xmh5jiyjl07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyesrexle7xmh5jiyjl07.png" alt="representation of layers for docker image 1 &amp;amp; 2" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Solution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To load the files asynchronously, we need a way to know which file is present in which layer &amp;amp; particularly which span/section of that layer tarball.&lt;/p&gt;

&lt;p&gt;For this, we create a Table of Contents (TOC) index file for every layer containing information about every file present &amp;amp; its location(offset) in the tarball. We also create a SOCI index to map layers with their respective zTOCs for the image. This zTOC can be considered similar to a TOC found at the beginning of books before the chapters come in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lpivh7k45xfk8911dhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2lpivh7k45xfk8911dhn.png" alt="SOCI index &amp;amp; zTOCs" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this SOCI index file &amp;amp; TOC indexes, we know the exact location where a particular file can be found inside an image.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Integrate SOCI&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Don’t worry.. we have a tool that will generate the indexes for us. This can be just one more step in your typical docker build workflow after the image is built &amp;amp; pushed to ECR. And we do two things here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3d9b6i5rzvpvq04btd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3d9b6i5rzvpvq04btd3.png" alt="basic pipeline to deploy container" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create Indices for the image by pulling it first.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Push these generated SOCI artifacts back to the ECR.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The below script can help us do these. It downloads the SOCI cli, pulls the image from ecr &amp;amp; generates the artifacts for us. Finally, pushes back the artifacts to the original image registry. Do note that soci doesn’t work with images available in Docker Runtime that’s why we’re pulling it using ctr so that it’s available to containerd.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget https://github.com/awslabs/soci-snapshotter/releases/download/v0.3.0/soci-snapshotter-0.3.0-linux-amd64.tar.gz
sudo tar -C . -xvf soci-snapshotter-0.3.0-linux-amd64.tar.gz soci

sudo ctr i pull --user xyz:password &amp;lt;image-uri&amp;gt;:latest
sudo ctr i ls
sudo ./soci create &amp;lt;image-uri&amp;gt;:latest
sudo ./soci index list
sudo ./soci push --user xyz:password &amp;lt;image-uri&amp;gt;:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwl8jgtnym7numxuinan.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwl8jgtnym7numxuinan.png" alt="ECR image with SOCI artifacts (not visible to the user)" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;SOCI in action (figure below)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ECS Fargate&lt;/strong&gt; has a special &lt;strong&gt;SOCI snapshotter&lt;/strong&gt; for pulling the docker images which works with OCI-compatible registries like ECR. That means if it detects a SOCI index when pulling an image from ECR, it’ll pull all of the layers immediately whose zTOCs were skipped &amp;amp; automatically lazy load the remaining image.&lt;/p&gt;

&lt;p&gt;&amp;amp; the best part — if it doesn’t detect the SOCI artifacts, it’ll go the traditional way of loading the image (completely download the image first &amp;amp; start the container after that.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsh3c4n2fwwsa5yzqjgfr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsh3c4n2fwwsa5yzqjgfr.png" alt="Faster booting containers with SOCI lazy loading" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To verify if the container was started using the SOCI artifacts, we should see the snapshotter value set as “soci”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget "${ECS_CONTAINER_METADATA_URI_V4}/task"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Improved Scaling&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To measure the improvements, we can simply look at the task start-time &amp;amp; creation-time difference. A similar observation can be made at Alb’s level, checking the amount of time taken for a container to be considered healthy before &amp;amp; after SOCI.&lt;/p&gt;

&lt;p&gt;In my observation, I saw an improvement of &lt;strong&gt;30 %&lt;/strong&gt; reduction in boot time which is really a good number &amp;amp; can result in quicker scaling during sales as well. This improvement can be different for different needs &amp;amp; people. This also depends on the number of layers you mark for zTOC creation.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Few Limitations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;SOCI is a recent introduction by AWS &amp;amp; is only available to Fargate as of now… so the community support is limited. In fact, during my first iteration, I hit a roadblock with the problem later found at AWS’s end &amp;amp; support gave a month ETA for the fix. Apart from this, there are a few things to keep in mind:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;You need an image size &amp;gt;250mb to see some noticeable difference.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The improvement you see is also dependent on the number of layers you create zTOCs for. Yes, that’s configurable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It doesn’t support ARM architecture for Fargate Spot provider as of now.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s only for tasks running Linux 1.4.0 platform&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It’s not for those using zstd compression with docker.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Service Connect support might be a hit or miss.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Although there’s no direct cost involved with using SOCI, we’re pushing extra artifacts to ECR continuously &amp;amp; for that, we’ll have to pay for the ECR storage cost eventually.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;…&lt;/p&gt;

</description>
      <category>ecs</category>
      <category>docker</category>
      <category>soci</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
