<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pavan Madduri</title>
    <description>The latest articles on DEV Community by Pavan Madduri (@pavan_madduri).</description>
    <link>https://dev.to/pavan_madduri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2932890%2F3edefe3a-d10f-4ebb-a50a-6d20040e2812.png</url>
      <title>DEV Community: Pavan Madduri</title>
      <link>https://dev.to/pavan_madduri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pavan_madduri"/>
    <language>en</language>
    <item>
      <title>From Docker Compose on My Laptop to OKE in Production — Same App, Zero Rewrites</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Mon, 11 May 2026 17:06:09 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/from-docker-compose-on-my-laptop-to-oke-in-production-same-app-zero-rewrites-m8p</link>
      <guid>https://dev.to/pavan_madduri/from-docker-compose-on-my-laptop-to-oke-in-production-same-app-zero-rewrites-m8p</guid>
      <description>&lt;p&gt;I have a rule: if I can't run the full stack on my laptop with &lt;code&gt;docker compose up&lt;/code&gt;, the architecture is too complicated.&lt;/p&gt;

&lt;p&gt;But then you need to deploy to production, and suddenly you're rewriting everything as Kubernetes manifests. The Compose file that worked on your machine is useless. Config lives in two places and they drift apart.&lt;/p&gt;

&lt;p&gt;Here's the workflow I settled on after trying a bunch of things that didn't work well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Local Stack
&lt;/h2&gt;

&lt;p&gt;Standard web app — API, Redis, Postgres. Three services.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DATABASE_URL=postgres://app:secret@db:5432/myapp&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;REDIS_URL=redis://cache:6379&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080/health"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;

  &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:16-alpine&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;secret&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;myapp&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pg_isready"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-U"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

  &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis:7-alpine&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pgdata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;docker compose up&lt;/code&gt; and the full stack is running. No external dependencies, fast iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Tried and Abandoned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kompose&lt;/strong&gt; — &lt;code&gt;kompose convert&lt;/code&gt; technically works but the output is ugly. Tons of annotations, weird formatting, needs so much cleanup I might as well write the YAML by hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker Compose on Kubernetes&lt;/strong&gt; — Various tools that try to run Compose files directly on K8s. They all add complexity and break in subtle ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trying to share one config&lt;/strong&gt; — I wasted a weekend trying to make the same file work for both. Local dev and production have genuinely different requirements. Pretending otherwise creates worse problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works: Convention Over Tooling
&lt;/h2&gt;

&lt;p&gt;I keep two config sets, aligned by convention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── docker-compose.yml          # Local dev
├── Dockerfile
├── k8s/
│   ├── base/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   └── overlays/
│       ├── staging/
│       └── production/
└── Makefile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same image names, same env var names, same port numbers in both places. When I change a port in Compose, I grep for it in &lt;code&gt;k8s/&lt;/code&gt; and update it. Manual, but nothing breaks silently.&lt;/p&gt;

&lt;p&gt;The key differences between local and OKE:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt; — Container locally, OCI managed service in production. I don't run databases on K8s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets&lt;/strong&gt; — Plain text in Compose, OCI Vault via External Secrets Operator on OKE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling&lt;/strong&gt; — One replica locally, HPA on OKE.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The K8s Side
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# k8s/base/deployment.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iad.ocir.io/mytenancy/myapp:latest&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
          &lt;span class="na"&gt;envFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
          &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
            &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kustomize overlays handle the per-environment differences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# k8s/overlays/production/kustomization.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kustomize.config.k8s.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kustomization&lt;/span&gt;
&lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;../../base&lt;/span&gt;
&lt;span class="na"&gt;images&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iad.ocir.io/mytenancy/myapp&lt;/span&gt;
    &lt;span class="na"&gt;newTag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.2.3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Alignment Check I Actually Use
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="c"&gt;# Makefile
&lt;/span&gt;&lt;span class="nl"&gt;dev&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    docker compose up &lt;span class="nt"&gt;--build&lt;/span&gt;

&lt;span class="nl"&gt;build&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    docker build &lt;span class="nt"&gt;-t&lt;/span&gt; iad.ocir.io/&lt;span class="p"&gt;$(&lt;/span&gt;TENANCY&lt;span class="p"&gt;)&lt;/span&gt;/myapp:&lt;span class="p"&gt;$(&lt;/span&gt;TAG&lt;span class="p"&gt;)&lt;/span&gt; .
    docker push iad.ocir.io/&lt;span class="p"&gt;$(&lt;/span&gt;TENANCY&lt;span class="p"&gt;)&lt;/span&gt;/myapp:&lt;span class="p"&gt;$(&lt;/span&gt;TAG&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nl"&gt;deploy-staging&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    kubectl apply &lt;span class="nt"&gt;-k&lt;/span&gt; k8s/overlays/staging

&lt;span class="nl"&gt;check-alignment&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Compose ports ==="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A1&lt;/span&gt; &lt;span class="s2"&gt;"ports:"&lt;/span&gt; docker-compose.yml
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== K8s ports ==="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"containerPort"&lt;/span&gt; k8s/base/deployment.yaml
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Compose health ==="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"test:"&lt;/span&gt; docker-compose.yml
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== K8s health ==="&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"path:"&lt;/span&gt; k8s/base/deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;make check-alignment&lt;/code&gt; is dumb but it catches drift. I run it before every deploy. It's saved me twice already from deploying with mismatched health check paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Take
&lt;/h2&gt;

&lt;p&gt;This isn't elegant. I'd love a single config file that works everywhere. But every tool I tried to achieve that added more complexity than it removed.&lt;/p&gt;

&lt;p&gt;The current setup is boring and it works. Compose for local, Kustomize for OKE, same Docker image, same env var names, a Makefile to keep me honest. I understand every piece of it, and when something breaks at 2am, that matters more than elegance.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri — Oracle ACE Associate, CNCF Golden Kubestronaut. &lt;a href="https://github.com/pmady" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://pmady.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://scholar.google.com/citations?view_op=list_works&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt; | &lt;a href="https://www.researchgate.net/profile/Pavan-Madduri-2?ev=hdr_xprf" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>oci</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>I Cut My Container Image Costs 60% by Building Multi-Arch Docker Images on OCI ARM</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 19:40:30 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/i-cut-my-container-image-costs-60-by-building-multi-arch-docker-images-on-oci-arm-5gm4</link>
      <guid>https://dev.to/pavan_madduri/i-cut-my-container-image-costs-60-by-building-multi-arch-docker-images-on-oci-arm-5gm4</guid>
      <description>&lt;p&gt;I was running all my containers on AMD64 shapes because that's what I'd always done. x86, Intel/AMD, the default. Then I looked at my OCI bill and realized I was paying $0.064/OCPU/hr for AMD64 when ARM shapes cost $0.010/OCPU/hr. Six times cheaper for the same work.&lt;/p&gt;

&lt;p&gt;The catch? My Docker images were all built for AMD64. They wouldn't run on ARM nodes. I had to figure out multi-arch builds.&lt;/p&gt;

&lt;p&gt;It took me an afternoon to get right, and now every image I build supports both architectures. Here's what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ARM on OCI Is Different From ARM Everywhere Else
&lt;/h2&gt;

&lt;p&gt;AWS has Graviton. GCP has Tau T2A. Azure has Ampere Altra. They're all ARM, and they're all cheaper than their x86 equivalents.&lt;/p&gt;

&lt;p&gt;But OCI's pricing gap is the widest I've seen:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;$/OCPU/hr&lt;/th&gt;
&lt;th&gt;4 OCPU + 24GB monthly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ARM&lt;/td&gt;
&lt;td&gt;VM.Standard.A1.Flex&lt;/td&gt;
&lt;td&gt;$0.010&lt;/td&gt;
&lt;td&gt;~$29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMD64&lt;/td&gt;
&lt;td&gt;VM.Standard.E4.Flex&lt;/td&gt;
&lt;td&gt;$0.064&lt;/td&gt;
&lt;td&gt;~$184&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And the Always Free tier gives you 4 ARM OCPUs and 24GB RAM forever. There's nothing comparable on the x86 side.&lt;/p&gt;

&lt;p&gt;The problem is that most Docker images on Docker Hub are x86 only, and if you've been building images without thinking about architecture, yours probably are too.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Actually Did the Multi-Arch Build
&lt;/h2&gt;

&lt;p&gt;Docker Buildx makes this surprisingly painless. The first time I tried it I expected hours of yak-shaving. It took about 20 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a builder that supports multiple platforms&lt;/span&gt;
docker buildx create &lt;span class="nt"&gt;--name&lt;/span&gt; multiarch &lt;span class="nt"&gt;--driver&lt;/span&gt; docker-container &lt;span class="nt"&gt;--use&lt;/span&gt;

&lt;span class="c"&gt;# Build and push for both architectures&lt;/span&gt;
docker buildx build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64,linux/arm64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; iad.ocir.io/mytenancy/myapp:v1.2.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--push&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the core of it. Buildx uses QEMU emulation to build the ARM image on your x86 machine (or vice versa). The &lt;code&gt;--push&lt;/code&gt; flag creates a manifest list in the registry that points to both architecture-specific images. When an ARM node pulls the image, it gets the ARM version automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Gotchas I Hit
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;QEMU is slow.&lt;/strong&gt; Cross-compiling a Go binary via QEMU took 4x longer than native. For Go, I got around this by using Go's built-in cross-compilation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;--platform=$BUILDPLATFORM golang:1.22-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; TARGETARCH&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$TARGETARCH&lt;/span&gt; &lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 go build &lt;span class="nt"&gt;-o&lt;/span&gt; server .

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:3.20&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/server /server&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/server"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--platform=$BUILDPLATFORM&lt;/code&gt; runs the build stage on your host architecture (fast), and &lt;code&gt;GOARCH=$TARGETARCH&lt;/code&gt; tells Go to cross-compile for the target. No QEMU needed for the slow compilation step. Build went from 3 minutes back down to 40 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python images need attention.&lt;/strong&gt; Some pip packages have pre-built wheels for x86 but not ARM. When that happens, pip tries to compile from source inside the container, which needs gcc and build headers you might not have in your image. I hit this with &lt;code&gt;numpy&lt;/code&gt; on an older version. Pinning to a version with ARM wheels fixed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alpine vs Debian base images.&lt;/strong&gt; Alpine uses musl libc, not glibc. Some binaries compiled for ARM + glibc won't work on Alpine. If you're getting weird segfaults on ARM, try switching to &lt;code&gt;debian:bookworm-slim&lt;/code&gt; as your base and see if it goes away.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI Pipeline for Multi-Arch
&lt;/h2&gt;

&lt;p&gt;I have this in GitHub Actions. It builds for both architectures and pushes to OCIR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Multi-Arch&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up QEMU&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-qemu-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Docker Buildx&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Login to OCIR&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/login-action@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iad.ocir.io&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.OCIR_USERNAME }}&lt;/span&gt;
          &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.OCIR_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;platforms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;linux/amd64,linux/arm64&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iad.ocir.io/${{ secrets.OCIR_TENANCY }}/myapp:${{ github.sha }}&lt;/span&gt;
          &lt;span class="na"&gt;cache-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha&lt;/span&gt;
          &lt;span class="na"&gt;cache-to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type=gha,mode=max&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cache-from: type=gha&lt;/code&gt; uses GitHub's cache for layer caching, which makes subsequent builds much faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying on OKE with Mixed Node Pools
&lt;/h2&gt;

&lt;p&gt;On OKE I run two node pools — one ARM, one x86. Most workloads go to ARM because it's cheaper. GPU workloads obviously stay on x86 (NVIDIA doesn't make ARM GPUs for data centers... yet).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ARM node pool (cheap, general workloads)&lt;/span&gt;
oci ce node-pool create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; arm-workers &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-shape&lt;/span&gt; VM.Standard.A1.Flex &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-shape-config&lt;/span&gt; &lt;span class="s1"&gt;'{"ocpus": 4, "memoryInGBs": 24}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ...

&lt;span class="c"&gt;# AMD64 node pool (GPU workloads, x86-only dependencies)&lt;/span&gt;
oci ce node-pool create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; x86-workers &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-shape&lt;/span&gt; VM.Standard.E4.Flex &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-shape-config&lt;/span&gt; &lt;span class="s1"&gt;'{"ocpus": 4, "memoryInGBs": 32}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubernetes handles the scheduling automatically. When a multi-arch image gets pulled, each node gets the right architecture variant. I didn't have to add any node selectors or affinity rules for most workloads — the manifest list takes care of it.&lt;/p&gt;

&lt;p&gt;For workloads that must run on a specific architecture (like anything that needs NVIDIA GPUs), I use a node selector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kubernetes.io/arch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;amd64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Actual Savings
&lt;/h2&gt;

&lt;p&gt;I moved 6 microservices from x86 to ARM over two weeks. These are Go and Python services — nothing exotic. All of them worked on ARM without code changes. The Docker images needed rebuilding with Buildx, but the Dockerfiles didn't change.&lt;/p&gt;

&lt;p&gt;Monthly compute cost went from ~$184 to ~$29 for the same 4 OCPU / 24GB configuration per service. Across 6 services, that's about $930/month saved. Not life-changing for a company, but for my side projects and dev environments? That's real money.&lt;/p&gt;

&lt;h2&gt;
  
  
  When ARM Doesn't Work
&lt;/h2&gt;

&lt;p&gt;Not everything runs on ARM. In my experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA GPU workloads&lt;/strong&gt; — x86 only for now&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy binaries&lt;/strong&gt; — anything compiled for x86 without source code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Some Java native libraries&lt;/strong&gt; — JNI libraries that ship x86 .so files only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Electron / desktop tools&lt;/strong&gt; — not relevant for server containers but worth mentioning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else — Go, Python, Node, Rust, Java (pure), Ruby — works fine on ARM. The ecosystem has matured a lot in the last two years.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you're on OCI and not using ARM shapes, you're leaving money on the table. Start with one service. Build it multi-arch with Buildx. Deploy it on an A1.Flex node. Compare the bill.&lt;/p&gt;

&lt;p&gt;The Docker workflow doesn't change. &lt;code&gt;docker build&lt;/code&gt;, &lt;code&gt;docker push&lt;/code&gt;, &lt;code&gt;kubectl apply&lt;/code&gt;. The only difference is adding &lt;code&gt;--platform linux/amd64,linux/arm64&lt;/code&gt; to your build command.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri — Oracle ACE Associate, CNCF Golden Kubestronaut. I write about containers, Kubernetes, and GPU infrastructure. &lt;a href="https://github.com/pmady" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://pmady.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://scholar.google.com/citations?view_op=list_works&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt; | &lt;a href="https://www.researchgate.net/profile/Pavan-Madduri-2?ev=hdr_xprf" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oke</category>
      <category>docker</category>
      <category>oci</category>
      <category>containers</category>
    </item>
    <item>
      <title>The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 15:23:20 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/the-zero-trust-docker-pipeline-securing-gpuai-container-images-from-build-to-production-50g2</link>
      <guid>https://dev.to/pavan_madduri/the-zero-trust-docker-pipeline-securing-gpuai-container-images-from-build-to-production-50g2</guid>
      <description>&lt;p&gt;GPU container images are the softest target in your infrastructure. A typical vLLM image is 15GB with hundreds of packages, a CUDA runtime, Python dependencies, and model weights. Most teams build these images once, push them, and never scan them again. That's a problem.&lt;/p&gt;

&lt;p&gt;I've been building GPU infrastructure tools on Docker and Kubernetes for the past year — &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; for autoscaling, &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt; for observability, and GPU NUMA topology scheduling for &lt;a href="https://github.com/volcano-sh/volcano" rel="noopener noreferrer"&gt;Volcano&lt;/a&gt;. Every one of these ships as a Docker container. This post walks through the zero-trust pipeline I use to build, scan, sign, and deploy GPU containers — from &lt;code&gt;docker build&lt;/code&gt; to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Attack Surface
&lt;/h2&gt;

&lt;p&gt;A standard GPU inference image has five layers of dependencies, and every layer is a CVE vector:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OS packages&lt;/td&gt;
&lt;td&gt;Ubuntu 22.04, libc, OpenSSL&lt;/td&gt;
&lt;td&gt;OS-level CVEs, often unpatched in CUDA base images&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CUDA toolkit&lt;/td&gt;
&lt;td&gt;libcudart, libnvml, cuDNN, NCCL&lt;/td&gt;
&lt;td&gt;NVIDIA releases on their own cycle, often behind on OS patches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python runtime&lt;/td&gt;
&lt;td&gt;CPython 3.11+&lt;/td&gt;
&lt;td&gt;Python CVEs, pip supply chain attacks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ML framework&lt;/td&gt;
&lt;td&gt;PyTorch, TensorFlow, vLLM&lt;/td&gt;
&lt;td&gt;Hundreds of transitive Python deps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application code&lt;/td&gt;
&lt;td&gt;Custom serving logic, prompt templates&lt;/td&gt;
&lt;td&gt;Your code — hopefully the smallest attack surface&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The CUDA layer is the sneaky one. NVIDIA maintains their own base images (&lt;code&gt;nvidia/cuda:12.4-base-ubuntu22.04&lt;/code&gt;) on their own release schedule. When Ubuntu patches a critical OpenSSL vulnerability, the NVIDIA base image might not pick it up for weeks. If you're building on top of &lt;code&gt;nvidia/cuda&lt;/code&gt;, you inherit that lag.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Docker Hardened Images as the Foundation
&lt;/h2&gt;

&lt;p&gt;Docker Hardened Images are pre-vetted, continuously patched base images maintained by Docker. Instead of building on NVIDIA's base image directly, you can use a Docker Hardened base and selectively copy in only the CUDA libraries you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Instead of inheriting the full NVIDIA base:&lt;/span&gt;
&lt;span class="c"&gt;# FROM nvidia/cuda:12.4-base-ubuntu22.04&lt;/span&gt;

&lt;span class="c"&gt;# Use Docker Hardened base + only the CUDA libs you need&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; docker.io/docker/hardened-runtime:ubuntu-22.04&lt;/span&gt;

&lt;span class="c"&gt;# Copy ONLY the NVIDIA libraries required at runtime&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=nvidia/cuda:12.4-base-ubuntu22.04 \&lt;/span&gt;
  /usr/local/cuda/lib64/libcudart.so* \
  /usr/local/cuda/lib64/libnvml.so* \
  /usr/local/cuda/lib64/libcublas.so* \
  /usr/local/lib/

&lt;span class="k"&gt;RUN &lt;/span&gt;ldconfig
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patched OS layer&lt;/strong&gt; — Docker maintains the base, not NVIDIA. Patches ship within hours, not weeks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller image&lt;/strong&gt; — only the CUDA libraries your application actually links against. Not the full 3GB toolkit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scout-optimized&lt;/strong&gt; — Docker Scout has first-party provenance data for Hardened Images, so scanning is faster and more accurate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Multi-Stage Builds with Minimal Runtime
&lt;/h2&gt;

&lt;p&gt;The goal is to ship the smallest possible runtime image. Everything that's only needed at build time — compilers, Go toolchain, npm, development headers — stays in the build stage.&lt;/p&gt;

&lt;p&gt;Here's the pattern I use for keda-gpu-scaler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# === Build Stage ===&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.22-bookworm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; go.mod go.sum ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;go mod download

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 go build &lt;span class="nt"&gt;-ldflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-s -w"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; gpu-scaler ./cmd/scaler

&lt;span class="c"&gt;# === Runtime Stage ===&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; docker.io/docker/hardened-runtime:ubuntu-22.04&lt;/span&gt;

&lt;span class="c"&gt;# Only the NVML library — nothing else from CUDA&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=nvidia/cuda:12.4-base-ubuntu22.04 \&lt;/span&gt;
  /usr/local/cuda/lib64/libnvml.so* /usr/local/lib/
&lt;span class="k"&gt;RUN &lt;/span&gt;ldconfig

&lt;span class="c"&gt;# Non-root user&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 65534:65534&lt;/span&gt;

&lt;span class="c"&gt;# Read-only filesystem — binary is statically positioned&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/gpu-scaler /usr/local/bin/&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["gpu-scaler"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Runtime image is ~80MB instead of 3.5GB. Attack surface reduced by 97%.&lt;/p&gt;

&lt;p&gt;For Python-based inference images (vLLM, Triton), the same principle applies but with pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build stage: install Python deps&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;python:3.11-bookworm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/install &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Runtime stage: minimal base + only installed packages&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; docker.io/docker/hardened-runtime:ubuntu-22.04&lt;/span&gt;

&lt;span class="c"&gt;# Copy CUDA runtime libs&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=nvidia/cuda:12.4-runtime-ubuntu22.04 \&lt;/span&gt;
  /usr/local/cuda/lib64/ /usr/local/cuda/lib64/

# Copy Python and installed packages
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=python:3.11-slim /usr/local/ /usr/local/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /install /usr/local&lt;/span&gt;

&lt;span class="c"&gt;# Non-root&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 65534:65534&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./app /app&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python", "-m", "app.serve"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Docker Scout in CI — Block on CVEs
&lt;/h2&gt;

&lt;p&gt;Docker Scout integrates into your CI pipeline to catch vulnerabilities at build time, not after deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/docker-security.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Security Pipeline&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build-and-scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/setup-buildx-action@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/login-action@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DOCKERHUB_USERNAME }}&lt;/span&gt;
          &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DOCKERHUB_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build image&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
          &lt;span class="na"&gt;load&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-gpu-image:${{ github.sha }}&lt;/span&gt;

      &lt;span class="c1"&gt;# Scan for CVEs — fail on critical/high&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Scout CVE scan&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cves&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-gpu-image:${{ github.sha }}&lt;/span&gt;
          &lt;span class="na"&gt;sarif-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scout-results.sarif&lt;/span&gt;
          &lt;span class="na"&gt;only-severities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical,high&lt;/span&gt;
          &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

      &lt;span class="c1"&gt;# Check against Docker Scout policies&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Scout policy evaluation&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-gpu-image:${{ github.sha }}&lt;/span&gt;

      &lt;span class="c1"&gt;# Upload SARIF to GitHub Security tab&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload SARIF&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github/codeql-action/upload-sarif@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;sarif_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scout-results.sarif&lt;/span&gt;

      &lt;span class="c1"&gt;# Only push if scan passes&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push image&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;success()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
          &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;pmady/keda-gpu-scaler:${{ github.sha }}&lt;/span&gt;
            &lt;span class="s"&gt;pmady/keda-gpu-scaler:latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;exit-code: true&lt;/code&gt;&lt;/strong&gt; — the build fails if Scout finds critical or high CVEs. No exceptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sarif-file&lt;/code&gt;&lt;/strong&gt; — results show up in GitHub's Security tab for tracking over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy evaluation&lt;/strong&gt; — Docker Scout policies can enforce rules like "no images older than 30 days" or "must use Docker Official Images as base."&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common GPU Image Findings
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Finding&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Critical OpenSSL CVE&lt;/td&gt;
&lt;td&gt;NVIDIA base image lags Ubuntu patches&lt;/td&gt;
&lt;td&gt;Use Hardened Image as base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-severity Python CVE&lt;/td&gt;
&lt;td&gt;Transitive deps in PyTorch/vLLM&lt;/td&gt;
&lt;td&gt;Pin versions, run &lt;code&gt;pip audit&lt;/code&gt; in CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium glibc CVE&lt;/td&gt;
&lt;td&gt;Base image outdated&lt;/td&gt;
&lt;td&gt;Rebuild weekly with &lt;code&gt;--no-cache&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outdated CUDA libraries&lt;/td&gt;
&lt;td&gt;NVIDIA release cycle&lt;/td&gt;
&lt;td&gt;Cherry-pick only needed &lt;code&gt;.so&lt;/code&gt; files from latest CUDA image&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Step 4: Container Signing with Docker Content Trust
&lt;/h2&gt;

&lt;p&gt;Sign your images so that Kubernetes admission controllers can verify provenance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Docker Content Trust&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DOCKER_CONTENT_TRUST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Push signs automatically&lt;/span&gt;
docker push pmady/keda-gpu-scaler:v1.0.0

&lt;span class="c"&gt;# Verify a signed image&lt;/span&gt;
docker trust inspect pmady/keda-gpu-scaler:v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI, use cosign (Sigstore) for keyless signing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In your GitHub Actions workflow&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Sign image with cosign&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sigstore/cosign-installer@v3&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Sign&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cosign sign --yes \&lt;/span&gt;
      &lt;span class="s"&gt;pmady/keda-gpu-scaler:${{ github.sha }}&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;COSIGN_EXPERIMENTAL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then enforce in Kubernetes with Kyverno:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;require-signed-images&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enforce&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;check-signature&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pod"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;verifyImages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;imageReferences&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pmady/*"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
          &lt;span class="na"&gt;attestors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;entries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;keyless&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pavan4devops@gmail.com"&lt;/span&gt;
                    &lt;span class="na"&gt;issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://accounts.google.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now unsigned or tampered images are rejected at admission. The chain is: &lt;strong&gt;Build → Scout scan → Sign → Push → Kubernetes admission verifies → Runtime policy enforces.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Runtime Security for GPU Containers
&lt;/h2&gt;

&lt;p&gt;GPU containers need device access (&lt;code&gt;/dev/nvidia*&lt;/code&gt;) but they don't need anything else privileged. Lock them down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu-inference&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;65534&lt;/span&gt;
    &lt;span class="na"&gt;fsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;65534&lt;/span&gt;
    &lt;span class="na"&gt;seccompProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RuntimeDefault&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pmady/keda-gpu-scaler:v1.0.0&lt;/span&gt;
      &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;readOnlyRootFilesystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ALL"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8Gi&lt;/span&gt;
        &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4Gi&lt;/span&gt;
      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tmp&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/tmp&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;models&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/models&lt;/span&gt;
          &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tmp&lt;/span&gt;
      &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;sizeLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100Mi&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;models&lt;/span&gt;
      &lt;span class="na"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;claimName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-weights&lt;/span&gt;
        &lt;span class="na"&gt;readOnly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-root&lt;/strong&gt; — NVML reads from sysfs, doesn't need root&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only root filesystem&lt;/strong&gt; — no writes except &lt;code&gt;/tmp&lt;/code&gt; (ephemeral) and &lt;code&gt;/models&lt;/code&gt; (read-only PVC)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No privilege escalation&lt;/strong&gt; — prevents container escape&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drop all capabilities&lt;/strong&gt; — the NVIDIA device plugin handles GPU device injection, the container doesn't need any Linux capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seccomp&lt;/strong&gt; — default syscall filter&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Full Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source Code
    │
    ▼
docker build (multi-stage, Hardened Image base)
    │
    ▼
Docker Scout CVE scan ── FAIL? → Block merge
    │
    ▼
Docker Scout policy check ── FAIL? → Block merge
    │
    ▼
cosign sign (keyless, Sigstore)
    │
    ▼
docker push (to registry)
    │
    ▼
Kubernetes admission (Kyverno verifies signature + base image)
    │
    ▼
Runtime (non-root, read-only fs, drop caps, seccomp, resource limits)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every stage has a gate. No image reaches production without being scanned, signed, and policy-checked. This is the same pipeline whether you're deploying a GPU inference service, a training job orchestrated by Volcano, or my keda-gpu-scaler DaemonSet.&lt;/p&gt;

&lt;h2&gt;
  
  
  This Isn't Optional for GPU Workloads
&lt;/h2&gt;

&lt;p&gt;GPU containers are high-value targets. They run on expensive hardware ($3-30/hour per instance), they often have network access to model registries (HuggingFace, S3), and they process potentially sensitive input data. A compromised inference container is a direct path to data exfiltration and compute theft (cryptomining on your A100s).&lt;/p&gt;

&lt;p&gt;The zero-trust pipeline adds ~2 minutes to your CI build. The alternative is finding out about a critical CVE from your security team after it's been running in production for three weeks.&lt;/p&gt;

&lt;p&gt;Docker Scout + Hardened Images + container signing. Use all three.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri is a Senior Cloud Platform Engineer at W.W. Grainger, Inc., CNCF Golden Kubestronaut, and Oracle ACE Associate. He maintains &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; and &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt;, and contributed GPU NUMA topology scheduling to &lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;Volcano&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docker</category>
      <category>gpu</category>
      <category>security</category>
    </item>
    <item>
      <title>GPU-Aware Autoscaling for Docker Containers: From NVML to Production</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 15:21:41 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/gpu-aware-autoscaling-for-docker-containers-from-nvml-to-production-jkf</link>
      <guid>https://dev.to/pavan_madduri/gpu-aware-autoscaling-for-docker-containers-from-nvml-to-production-jkf</guid>
      <description>&lt;p&gt;Every GPU inference container has the same problem: Kubernetes HPA can't see the GPU. You scale on CPU and memory while your GPU sits at 95% utilization, completely invisible to the autoscaler. Or worse — your GPU is idle and you're paying $3/hour for an instance doing nothing.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; to fix this. It's a KEDA external scaler that reads real GPU metrics via NVIDIA NVML and drives Kubernetes autoscaling decisions — including scale-to-zero. This post covers the Docker-specific parts: how GPU metrics flow from the NVIDIA Container Toolkit through Docker to KEDA, and how to build GPU-aware containers that actually scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Docker Exposes GPUs to Containers
&lt;/h2&gt;

&lt;p&gt;When you run a GPU container with Docker, three layers work together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--gpus&lt;/span&gt; all nvidia/cuda:12.4-base nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Docker Engine&lt;/strong&gt; detects the &lt;code&gt;--gpus&lt;/code&gt; flag and calls the NVIDIA Container Toolkit&lt;/li&gt;
&lt;li&gt;The toolkit configures &lt;strong&gt;nvidia-container-runtime&lt;/strong&gt; as the OCI runtime for this container&lt;/li&gt;
&lt;li&gt;The runtime injects GPU device files (&lt;code&gt;/dev/nvidia0&lt;/code&gt;, &lt;code&gt;/dev/nvidiactl&lt;/code&gt;) and NVIDIA driver libraries into the container's filesystem&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The container now has full access to NVML (NVIDIA Management Library), which exposes GPU utilization, memory usage, temperature, power draw, and more. This is the same mechanism my GPU scaler uses — each scaler pod runs on a GPU node and reads NVML metrics from the GPUs Docker has exposed to it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐     gRPC      ┌──────────────────────────┐
│ KEDA Operator│─────────────→│ keda-gpu-scaler (Docker)  │
│ (central pod)│              │ DaemonSet on each GPU node│
└─────────────┘              │                            │
                              │  NVML ──→ /dev/nvidia0    │
                              │  NVML ──→ /dev/nvidia1    │
                              │       (Docker-exposed)     │
                              └──────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building GPU Containers: The Dockerfile
&lt;/h2&gt;

&lt;p&gt;GPU containers need CGO for NVML access. Here's the multi-stage Dockerfile I use for keda-gpu-scaler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Stage 1: Build&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.22-bookworm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; go.mod go.sum ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;go mod download

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="c"&gt;# CGO_ENABLED=1 is required — NVML needs CGO&lt;/span&gt;
&lt;span class="c"&gt;# This is why GPU scaling can't be a native KEDA scaler&lt;/span&gt;
&lt;span class="c"&gt;# (KEDA builds with CGO_ENABLED=0)&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 go build &lt;span class="nt"&gt;-ldflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-s -w"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; keda-gpu-scaler ./cmd/scaler

&lt;span class="c"&gt;# Stage 2: Minimal runtime&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; nvidia/cuda:12.4-base-ubuntu22.04&lt;/span&gt;

&lt;span class="c"&gt;# Security: non-root user&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; 65534 &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/false scaler
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 65534:65534&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/keda-gpu-scaler /usr/local/bin/&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 6000&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["keda-gpu-scaler"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CGO_ENABLED=1&lt;/code&gt;&lt;/strong&gt; — NVML requires C bindings. This is the fundamental architectural reason keda-gpu-scaler exists as an external scaler instead of being built into KEDA core.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;nvidia/cuda&lt;/code&gt; base image&lt;/strong&gt; — provides the NVML shared libraries (&lt;code&gt;libnvidia-ml.so&lt;/code&gt;) at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-root execution&lt;/strong&gt; — NVML reads GPU data from sysfs, doesn't require root. Standard Docker security practice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-stage build&lt;/strong&gt; — final image is ~150MB instead of 1.5GB (no Go toolchain, no build deps).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Docker Compose for Local GPU Development
&lt;/h2&gt;

&lt;p&gt;Before deploying to Kubernetes, test the full stack locally with Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.8"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# The GPU scaler — reads NVML metrics, serves gRPC&lt;/span&gt;
  &lt;span class="na"&gt;gpu-scaler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;all&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6000:6000"&lt;/span&gt;    &lt;span class="c1"&gt;# gRPC for KEDA&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9090:9090"&lt;/span&gt;    &lt;span class="c1"&gt;# Prometheus metrics&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;LOG_LEVEL=debug&lt;/span&gt;

  &lt;span class="c1"&gt;# A real GPU workload to scale&lt;/span&gt;
  &lt;span class="na"&gt;vllm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm/vllm-openai:latest&lt;/span&gt;
    &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;--model meta-llama/Llama-3.2-1B&lt;/span&gt;
      &lt;span class="s"&gt;--port 8000&lt;/span&gt;
      &lt;span class="s"&gt;--max-model-len 2048&lt;/span&gt;

  &lt;span class="c1"&gt;# Prometheus to scrape GPU metrics&lt;/span&gt;
  &lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prom/prometheus:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9091:9090"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./prometheus.yml:/etc/prometheus/prometheus.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the stack&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# Check GPU metrics via gRPC&lt;/span&gt;
grpcurl &lt;span class="nt"&gt;-plaintext&lt;/span&gt; localhost:6000 externalscaler.ExternalScaler/GetMetrics

&lt;span class="c"&gt;# Check GPU metrics via Prometheus&lt;/span&gt;
curl localhost:9090/metrics | &lt;span class="nb"&gt;grep &lt;/span&gt;gpu

&lt;span class="c"&gt;# Send requests to vLLM and watch GPU utilization climb&lt;/span&gt;
curl localhost:8000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "meta-llama/Llama-3.2-1B", "messages": [{"role": "user", "content": "Hello"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you the full autoscaling feedback loop locally: vLLM serving → GPU utilization rises → scaler reports metrics → you can verify KEDA would trigger a scale-up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Scout: Scanning GPU Container Images
&lt;/h2&gt;

&lt;p&gt;GPU images are large and have deep dependency trees. A typical vLLM image pulls in CUDA, cuDNN, NCCL, Python, PyTorch, and dozens of transitive dependencies. More packages = more CVE surface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scan the GPU scaler image&lt;/span&gt;
docker scout cves pmady/keda-gpu-scaler:latest

&lt;span class="c"&gt;# Get base image recommendations&lt;/span&gt;
docker scout recommendations pmady/keda-gpu-scaler:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common findings I've hit with GPU images:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High-severity OpenSSL CVE&lt;/td&gt;
&lt;td&gt;CUDA base image uses older Ubuntu&lt;/td&gt;
&lt;td&gt;Multi-stage build with patched base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python package CVEs&lt;/td&gt;
&lt;td&gt;Transitive deps in ML frameworks&lt;/td&gt;
&lt;td&gt;Pin versions, use &lt;code&gt;pip audit&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outdated CUDA libs&lt;/td&gt;
&lt;td&gt;NVIDIA base image release lag&lt;/td&gt;
&lt;td&gt;Use Docker Hardened Images as base&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For production GPU containers, I run Scout in CI and block merges on critical/high CVEs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GitHub Actions&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Scout CVE scan&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cves&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pmady/keda-gpu-scaler:${{ github.sha }}&lt;/span&gt;
    &lt;span class="na"&gt;only-severities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical,high&lt;/span&gt;
    &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Fail the build on findings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pre-Built Scaling Profiles
&lt;/h2&gt;

&lt;p&gt;Different GPU workloads need different scaling strategies. keda-gpu-scaler ships profiles so you don't have to figure this out yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ScaledObject for vLLM inference&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keda.sh/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ScaledObject&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-gpu-scaler&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-deployment&lt;/span&gt;
  &lt;span class="na"&gt;minReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;    &lt;span class="c1"&gt;# Scale to zero when idle&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
  &lt;span class="na"&gt;triggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;scalerAddress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keda-gpu-scaler.gpu-scaler.svc.cluster.local:6000"&lt;/span&gt;
        &lt;span class="na"&gt;profile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm-inference"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Scale-to-Zero&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vllm-inference&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU memory (%)&lt;/td&gt;
&lt;td&gt;80%&lt;/td&gt;
&lt;td&gt;Yes (5% activation)&lt;/td&gt;
&lt;td&gt;vLLM fills KV cache proportional to request load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;triton-inference&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU utilization (%)&lt;/td&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;td&gt;Yes (10% activation)&lt;/td&gt;
&lt;td&gt;Triton batches requests, SM utilization is the bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;training&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU utilization (%)&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Training jobs should saturate GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;batch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPU memory (%)&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;Yes (1% activation)&lt;/td&gt;
&lt;td&gt;Batch inference, aggressive scale-down&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Scale-to-zero is the killer feature for inference. A single A100 instance costs ~$3/hour. If your inference service is idle overnight, that's $36/night wasted. keda-gpu-scaler detects GPU idle state and scales the deployment to zero pods. KEDA spins it back up on the first incoming request.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Docker Desktop to GPU Cluster
&lt;/h2&gt;

&lt;p&gt;The workflow end-to-end:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; your GPU container with &lt;code&gt;docker build&lt;/code&gt; — multi-stage, non-root, minimal runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test locally&lt;/strong&gt; with &lt;code&gt;docker compose&lt;/code&gt; — verify GPU metrics, NVML access, gRPC endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scan&lt;/strong&gt; with &lt;code&gt;docker scout&lt;/code&gt; — catch CVEs before pushing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Push&lt;/strong&gt; to your registry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt; to Kubernetes with keda-gpu-scaler for autoscaling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Docker is the consistent runtime from development to production. Same Dockerfile, same NVML metrics, same security model — just different scale.&lt;/p&gt;

&lt;p&gt;The project is open source and being discussed for &lt;a href="https://github.com/kedacore/keda/issues/7538" rel="noopener noreferrer"&gt;adoption under the KEDA organization&lt;/a&gt;. If you're running GPU workloads on Kubernetes and want autoscaling that actually looks at the GPU, give it a try: &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;github.com/pmady/keda-gpu-scaler&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri is a Senior Cloud Platform Engineer at W.W. Grainger, Inc., CNCF Golden Kubestronaut, and Oracle ACE Associate. He maintains &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; and &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt;, and contributed GPU NUMA topology scheduling to &lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;Volcano&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>nvml</category>
      <category>nvidia</category>
      <category>docker</category>
    </item>
    <item>
      <title>I Replaced a $3/hr GPU Dev Workflow with Docker Model Runner. Here's How</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 15:19:29 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/i-replaced-a-3hr-gpu-dev-workflow-with-docker-model-runner-heres-how-170</link>
      <guid>https://dev.to/pavan_madduri/i-replaced-a-3hr-gpu-dev-workflow-with-docker-model-runner-heres-how-170</guid>
      <description>&lt;p&gt;Last month I was debugging a prompt template for a vLLM inference service. The change was two lines — swap the system prompt and adjust the temperature. To test it, I had to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rebuild a 15GB Docker image (the CUDA base alone is 3.5GB)&lt;/li&gt;
&lt;li&gt;Push it to our registry (8 minutes on a good day)&lt;/li&gt;
&lt;li&gt;Wait for Kubernetes to pull it on a GPU node&lt;/li&gt;
&lt;li&gt;Realize the prompt still wasn't right&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total cycle time: &lt;strong&gt;22 minutes per iteration.&lt;/strong&gt; For a two-line text change.&lt;/p&gt;

&lt;p&gt;Then I tried Docker Model Runner. Pull the model once. Run inference locally. Iterate on the prompt in seconds. Push only when it's right. The same change took &lt;strong&gt;14 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Docker shipped two features this year that I think every GPU/AI engineer needs to know about: &lt;strong&gt;Model Runner&lt;/strong&gt; and &lt;strong&gt;Sandboxes&lt;/strong&gt;. This post is the walkthrough I wish I had when I started using them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;My background:&lt;/strong&gt; I build GPU infrastructure tools — &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; for GPU autoscaling on Kubernetes, &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt; for GPU observability, and I contributed GPU NUMA topology scheduling to &lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;CNCF Volcano&lt;/a&gt;. Everything I build runs in Docker containers.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 1: Docker Model Runner — Run LLMs Like You Run Containers
&lt;/h2&gt;

&lt;p&gt;If you've used &lt;code&gt;docker pull&lt;/code&gt; and &lt;code&gt;docker run&lt;/code&gt;, you already know how Model Runner works. Same mental model, same CLI patterns, but for AI models instead of containers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup (one time)
&lt;/h3&gt;

&lt;p&gt;Update Docker Desktop to 4.40+ and enable Model Runner:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Settings → Features in Development → Enable Docker Model Runner&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Pull your first model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker model pull ai/llama3.2:1B-Q8_0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This downloads quantized model weights from Docker Hub — same registry, same content-addressable storage, same layer deduplication. If two models share base weights, you only download the diff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run inference from the CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker model run ai/llama3.2:1B-Q8_0 &lt;span class="s2"&gt;"What is NUMA topology in GPU scheduling?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NUMA (Non-Uniform Memory Access) topology in GPU scheduling refers to the 
arrangement of CPUs and GPUs on a server where memory access speed depends 
on physical proximity. GPUs on the same NUMA node as the requesting CPU 
have faster memory access. NUMA-aware schedulers like Volcano place GPU 
workloads on nodes where the GPUs share a NUMA domain with the allocated 
CPUs, reducing cross-node memory latency by 10-20% for multi-GPU training...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That ran locally on my MacBook Pro. No cloud GPU. No 15GB container image. No Kubernetes cluster. Apple Silicon handles the inference via Metal/MLX. On Linux with NVIDIA GPUs, it uses CUDA automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  The killer feature: OpenAI-compatible API
&lt;/h3&gt;

&lt;p&gt;Model Runner exposes a local endpoint that speaks the exact same protocol as OpenAI's API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Local — hits Docker Model Runner
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:12434/engines/llama3.2/v1/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not-needed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# no key required locally
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This SAME code works in production with one env var change:
# client = OpenAI(base_url=os.environ["VLLM_ENDPOINT"])
&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai/llama3.2:1B-Q8_0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a Kubernetes GPU infrastructure expert.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain GPU memory fragmentation in 3 sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Your application code is identical between local dev and production. Locally you hit Model Runner. In production you hit vLLM, Triton, or OpenAI. Change one environment variable. That's it.&lt;/p&gt;

&lt;h3&gt;
  
  
  List and manage models
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker model &lt;span class="nb"&gt;ls
&lt;/span&gt;NAME                       SIZE      CREATED
ai/llama3.2:1B-Q8_0       1.3 GB    10 minutes ago
ai/mistral:7B-Q4_K_M      4.1 GB    2 hours ago

&lt;span class="nv"&gt;$ &lt;/span&gt;docker model &lt;span class="nb"&gt;rm &lt;/span&gt;ai/mistral:7B-Q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same workflow as &lt;code&gt;docker image ls&lt;/code&gt; and &lt;code&gt;docker image rm&lt;/code&gt;. If you know Docker, you know Model Runner. Zero learning curve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: The Real Problem Model Runner Solves
&lt;/h2&gt;

&lt;p&gt;I run GPU inference in production on Kubernetes. Here's what the inner development loop looked like &lt;strong&gt;before&lt;/strong&gt; Model Runner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Edit prompt → docker build (8 min) → docker push (8 min) → kubectl rollout → test → repeat
                    ↑                                                              │
                    └──────────── 22 minutes per iteration ────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And &lt;strong&gt;after&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Edit prompt → docker model run (14 sec) → test → iterate → ship when ready
                                                              │
                                            ← seconds per iteration →
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three pain points Model Runner eliminates:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependency hell&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CUDA 12 vs 11, cuDNN mismatches, PyTorch pinning&lt;/td&gt;
&lt;td&gt;Model Runner handles the inference runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image bloat&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15GB vLLM image with full CUDA toolkit&lt;/td&gt;
&lt;td&gt;1.3GB quantized model, no container build needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dev-prod gap&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can't run A100 inference on a MacBook&lt;/td&gt;
&lt;td&gt;Model Runner uses Apple Silicon or local NVIDIA GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The architecture end-to-end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─── Your Laptop ──────────────────┐     ┌─── Production K8s Cluster ────────────┐
│                                   │     │                                        │
│  Docker Model Runner              │     │  vLLM containers (A100/H100)           │
│  ├─ Llama 3.2 (local inference)  │     │  ├─ keda-gpu-scaler (auto-scaling)    │
│  └─ OpenAI-compatible API        │     │  ├─ otel-gpu-receiver (GPU metrics)   │
│          ↕                        │     │  ├─ Volcano (NUMA-aware scheduling)   │
│  Your Application Code            │ ──→ │  └─ OpenAI-compatible API             │
│  (same code, same SDK)           │     │          ↕                              │
│                                   │     │  Same Application Code                 │
└───────────────────────────────────┘     └────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Same application code. Same API. Same Docker. Different scale.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Docker Sandboxes — Because AI Agents Will Try to Delete Your Files
&lt;/h2&gt;

&lt;p&gt;Here's a scenario that keeps me up at night: an AI coding agent decides the best way to fix a test failure is to &lt;code&gt;rm -rf&lt;/code&gt; the test directory. Or it installs a malicious pip package. Or it &lt;code&gt;curl&lt;/code&gt;s your AWS credentials to an external server.&lt;/p&gt;

&lt;p&gt;If you're building &lt;strong&gt;agentic workflows&lt;/strong&gt; — LLMs that execute code, call APIs, or modify files — running that code on your host is reckless. Docker Sandboxes fix this.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Sandboxes give you
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem isolation&lt;/strong&gt; — the agent gets its own filesystem. Your SSH keys, browser cookies, and credentials are invisible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network whitelisting&lt;/strong&gt; — you specify exactly which hosts the agent can reach. Everything else is blocked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource caps&lt;/strong&gt; — CPU, memory, GPU limits per sandbox. No runaway processes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral by default&lt;/strong&gt; — sandbox is destroyed when the task completes. No persistent state leakage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A real example: sandboxed coding agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.sandbox.yml&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;coding-agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-coding-agent:latest&lt;/span&gt;
    &lt;span class="na"&gt;sandbox&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.openai.com:443"&lt;/span&gt;      &lt;span class="c1"&gt;# LLM API calls&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pypi.org:443"&lt;/span&gt;            &lt;span class="c1"&gt;# pip install&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github.com:443"&lt;/span&gt;          &lt;span class="c1"&gt;# git clone&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;4g&lt;/span&gt;
        &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./workspace:/workspace&lt;/span&gt;        &lt;span class="c1"&gt;# Only this directory is accessible&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;MODEL_ENDPOINT=http://host.docker.internal:12434&lt;/span&gt;  &lt;span class="c1"&gt;# Model Runner&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What the agent &lt;strong&gt;can&lt;/strong&gt; do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read/write files in &lt;code&gt;/workspace&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Call OpenAI, install Python packages, clone repos&lt;/li&gt;
&lt;li&gt;Use up to 4GB RAM, 2 CPUs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What the agent &lt;strong&gt;cannot&lt;/strong&gt; do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Touch &lt;code&gt;~/.ssh&lt;/code&gt;, &lt;code&gt;~/.aws&lt;/code&gt;, or any file outside &lt;code&gt;/workspace&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Reach arbitrary servers (no data exfiltration)&lt;/li&gt;
&lt;li&gt;Consume unlimited resources (no fork bombs)&lt;/li&gt;
&lt;li&gt;Persist state after completion (clean slate every run)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The production mirror
&lt;/h3&gt;

&lt;p&gt;This is the same isolation model I enforce in production Kubernetes with PodSecurityStandards:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Production: Kubernetes pod security&lt;/span&gt;
&lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;readOnlyRootFilesystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ALL"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# Local: Docker Sandbox&lt;/span&gt;
&lt;span class="na"&gt;sandbox&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api.openai.com:443"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Same security boundary. Same isolation model.&lt;/strong&gt; Docker Sandboxes for local dev, Kubernetes PodSecurity for production. The gap between "works on my machine" and "works in production" shrinks to almost nothing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: Putting It All Together — The Full GPU/AI Stack on Docker
&lt;/h2&gt;

&lt;p&gt;Here's the picture I've been building toward. Docker isn't just a container runtime anymore — it's the full development platform for AI:&lt;/p&gt;

&lt;h3&gt;
  
  
  On Your Laptop (Docker Desktop)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Model Runner&lt;/td&gt;
&lt;td&gt;Pull and run LLMs locally, OpenAI-compatible API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Sandboxes&lt;/td&gt;
&lt;td&gt;Isolate AI agents, whitelist network access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Supply Chain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Scout&lt;/td&gt;
&lt;td&gt;Scan GPU images for CVEs in CUDA/Python dependency trees&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Extensions&lt;/td&gt;
&lt;td&gt;I built a &lt;a href="https://github.com/pmady/docker-gpu-dashboard-extension" rel="noopener noreferrer"&gt;GPU Dashboard&lt;/a&gt; showing real-time NVML metrics in Docker Desktop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Compose&lt;/td&gt;
&lt;td&gt;Run inference + app + monitoring together locally&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  In Production (Kubernetes)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker containers (containerd)&lt;/td&gt;
&lt;td&gt;vLLM, Triton inference servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autoscaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Scale on real GPU utilization, not CPU proxy metrics. Scale to zero when idle.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;GPU metrics → OpenTelemetry → Prometheus/Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scheduling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;Volcano GPU NUMA&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Place multi-GPU training on NUMA-aligned GPUs (10-20% throughput improvement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NetworkPolicy + PodSecurity&lt;/td&gt;
&lt;td&gt;Same isolation as Docker Sandboxes, enforced by the cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The container is the unit of deployment from your laptop to the GPU cluster. Docker owns the inner loop. Kubernetes owns the outer loop. Both speak the same language.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: Try It Right Now (5-Minute Walkthrough)
&lt;/h2&gt;

&lt;p&gt;Everything below runs on Docker Desktop. No GPU required (it'll use CPU). Takes 5 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Pull a model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model pull ai/llama3.2:1B-Q8_0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Chat with it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model run ai/llama3.2:1B-Q8_0 &lt;span class="s2"&gt;"Write a Dockerfile for a Python FastAPI app"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Hit the API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:12434/engines/llama3.2/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "ai/llama3.2:1B-Q8_0",
    "messages": [
      {"role": "system", "content": "You are a Docker and Kubernetes expert."},
      {"role": "user", "content": "How do I expose GPUs to a Docker container?"}
    ],
    "temperature": 0.3
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Use it from Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pip install openai
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:12434/engines/llama3.2/v1/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not-needed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai/llama3.2:1B-Q8_0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain Docker Scout in one paragraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. You're running local LLM inference with the same API your production code uses. No CUDA installation. No PyTorch dependency conflicts. No 15GB images.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Want to See Next from Docker
&lt;/h2&gt;

&lt;p&gt;Model Runner is already good. Here's what would make it great for production GPU engineers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;docker model stats&lt;/code&gt;&lt;/strong&gt; — VRAM usage per model, like &lt;code&gt;docker stats&lt;/code&gt; for containers. Right now I have to run nvidia-smi separately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model serving&lt;/strong&gt; — run Llama + Mistral + CodeLlama concurrently with per-model resource limits. Production inference servers (Triton) do this; local dev should too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry integration&lt;/strong&gt; — emit inference latency, tokens/second, and queue depth to an OTel collector. My &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt; handles hardware GPU metrics — application-level model metrics would complete the picture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox GPU passthrough&lt;/strong&gt; — let sandboxed AI agents access the GPU for local inference. Currently sandboxes are CPU-only.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've shared this feedback in the Docker community forums. If you're building GPU/AI infrastructure on Docker, I'd love to hear what you're missing — drop a comment or reach out.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Docker used to be where you built containers and pushed them somewhere else. In 2026, Docker is where you &lt;strong&gt;develop AI applications end-to-end&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Runner&lt;/strong&gt; replaces your 22-minute build-push-deploy cycle with a 14-second local inference call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxes&lt;/strong&gt; give your AI agents the same security boundary they'll have in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scout&lt;/strong&gt; catches CVEs in your CUDA dependency tree before they reach a GPU cluster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compose&lt;/strong&gt; runs your entire AI stack locally — inference, app, monitoring, all together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And when it's time to ship to production, the same containers deploy to Kubernetes with &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;GPU autoscaling&lt;/a&gt;, &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;GPU observability&lt;/a&gt;, and &lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;NUMA-aware scheduling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The gap between "works on my machine" and "works on 8 A100s" just got a lot smaller.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;Pavan Madduri&lt;/a&gt; is a Senior Cloud Platform Engineer at W.W. Grainger, Inc., CNCF Golden Kubestronaut, and Oracle ACE Associate. He maintains &lt;a href="https://github.com/pmady/keda-gpu-scaler" rel="noopener noreferrer"&gt;keda-gpu-scaler&lt;/a&gt; and &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt;, contributed GPU NUMA topology scheduling to &lt;a href="https://github.com/volcano-sh/volcano/pull/5095" rel="noopener noreferrer"&gt;Volcano&lt;/a&gt;, and is a &lt;a href="https://github.com/dragonflyoss" rel="noopener noreferrer"&gt;Dragonfly&lt;/a&gt; Community Member. Published: &lt;a href="https://platformengineering.com/contributed-content/abstracting-ai-infrastructure-native-gpu-scaling-for-internal-developer-platforms/" rel="noopener noreferrer"&gt;PlatformEngineering.com&lt;/a&gt;. Follow on &lt;a href="https://facebook.com/YOUR_PAGE_URL" rel="noopener noreferrer"&gt;Facebook: Docker AI &amp;amp; Cloud-Native DevOps&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gpu</category>
      <category>docker</category>
      <category>dockermodelrunner</category>
    </item>
    <item>
      <title>Docker + OKE: Running GPU Inference Containers on Oracle Cloud</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 14:59:52 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/docker-oke-running-gpu-inference-containers-on-oracle-cloud-3i94</link>
      <guid>https://dev.to/pavan_madduri/docker-oke-running-gpu-inference-containers-on-oracle-cloud-3i94</guid>
      <description>&lt;p&gt;&lt;em&gt;I wanted to deploy an LLM inference API without spending $1,200/month on AWS GPU instances. OCI turned out to be significantly cheaper, and the Docker workflow was identical. Here's what I set up.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Looked at OCI for GPU Workloads
&lt;/h2&gt;

&lt;p&gt;I've been building GPU infrastructure tools for a while now (keda-gpu-scaler, otel-gpu-receiver, GPU NUMA scheduling for Volcano), and most of my testing was on AWS. The g5.xlarge instances with A10 GPUs run about $1.01/hr, plus $73/month for the EKS control plane. It adds up fast when you're iterating.&lt;/p&gt;

&lt;p&gt;Someone on the Volcano Slack mentioned OCI's GPU pricing and I was skeptical. But when I looked it up, the numbers were real — same A10 GPU, roughly 40% cheaper, and OKE doesn't charge for the Kubernetes control plane at all. So I tried moving a vLLM inference workload over.&lt;/p&gt;

&lt;h2&gt;
  
  
  OCI GPU Pricing
&lt;/h2&gt;

&lt;p&gt;Here's what OCI actually charges for GPU instances. I had to double-check these because they seemed too low:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;GPU Memory&lt;/th&gt;
&lt;th&gt;OCPUs&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Price/hr (on-demand)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VM.GPU.A10.1&lt;/td&gt;
&lt;td&gt;1x A10&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;240 GB&lt;/td&gt;
&lt;td&gt;~$1.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VM.GPU.A10.2&lt;/td&gt;
&lt;td&gt;2x A10&lt;/td&gt;
&lt;td&gt;48 GB&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;480 GB&lt;/td&gt;
&lt;td&gt;~$3.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM.GPU.A100-v2.8&lt;/td&gt;
&lt;td&gt;8x A100&lt;/td&gt;
&lt;td&gt;640 GB&lt;/td&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;2 TB&lt;/td&gt;
&lt;td&gt;~$25.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM.GPU.H100.8&lt;/td&gt;
&lt;td&gt;8x H100&lt;/td&gt;
&lt;td&gt;640 GB&lt;/td&gt;
&lt;td&gt;112&lt;/td&gt;
&lt;td&gt;2 TB&lt;/td&gt;
&lt;td&gt;~$38.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VM.GPU.A10.1 (preemptible)&lt;/td&gt;
&lt;td&gt;1x A10&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;240 GB&lt;/td&gt;
&lt;td&gt;~$0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That preemptible A10 price made me do a double-take. $0.50/hr for an A10 GPU. That's $365/year. I was paying more than that per month on AWS for the same hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Inference Image
&lt;/h2&gt;

&lt;p&gt;I used vLLM because it's what I was already running on AWS. The Dockerfile doesn't change at all between clouds, which is the whole reason I'm using containers in the first place.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile.inference&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; nvidia/cuda:12.4-runtime-ubuntu22.04&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    python3 python3-pip &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nv"&gt;vllm&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;0.6.0 &lt;span class="se"&gt;\
&lt;/span&gt;    fastapi &lt;span class="se"&gt;\
&lt;/span&gt;    uvicorn

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8000&lt;/span&gt;

&lt;span class="k"&gt;HEALTHCHECK&lt;/span&gt;&lt;span class="s"&gt; --interval=30s --timeout=10s --retries=3 \&lt;/span&gt;
    CMD python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python3", "-m", "vllm.entrypoints.openai.api_server"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["--model", "microsoft/Phi-3-mini-4k-instruct", \&lt;/span&gt;
     "--max-model-len", "4096", \
     "--gpu-memory-utilization", "0.9"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build and test locally (you'll need an NVIDIA GPU and the NVIDIA Container Toolkit installed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build&lt;/span&gt;
docker build &lt;span class="nt"&gt;-f&lt;/span&gt; Dockerfile.inference &lt;span class="nt"&gt;-t&lt;/span&gt; gpu-inference:v1 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Run with GPU access&lt;/span&gt;
docker run &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 gpu-inference:v1

&lt;span class="c"&gt;# Test inference&lt;/span&gt;
curl http://localhost:8000/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "microsoft/Phi-3-mini-4k-instruct",
    "prompt": "Explain Kubernetes in one sentence:",
    "max_tokens": 50
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--gpus all&lt;/code&gt; is the magic flag. It tells Docker to use the NVIDIA Container Toolkit, which injects the GPU device files and driver libraries into the container at runtime. Your image only needs the CUDA runtime libraries, not the full driver stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  If You Don't Have a Local GPU
&lt;/h3&gt;

&lt;p&gt;I do most of my development on a Mac, which obviously doesn't have an NVIDIA GPU. Docker Model Runner is what I use to test the LLM interaction pattern locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker model pull ai/phi3-mini
docker model run ai/phi3-mini &lt;span class="s2"&gt;"Explain Kubernetes in one sentence"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is OpenAI-compatible so the client code I write against Model Runner works unchanged against vLLM in production. I've been using this for prompt template iteration and it cut my feedback loop from 20+ minutes (push to registry, wait for K8s pull, test) to about 15 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pushing to OCIR
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Login to OCIR&lt;/span&gt;
docker login iad.ocir.io &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;tenancy-namespace&amp;gt;/oracleidentitycloudservice/&amp;lt;email&amp;gt;'&lt;/span&gt;

&lt;span class="c"&gt;# Tag&lt;/span&gt;
docker tag gpu-inference:v1 iad.ocir.io/&amp;lt;tenancy&amp;gt;/gpu-inference/vllm:v1

&lt;span class="c"&gt;# Scan before push&lt;/span&gt;
docker scout cves gpu-inference:v1 &lt;span class="nt"&gt;--only-severity&lt;/span&gt; critical,high

&lt;span class="c"&gt;# Push&lt;/span&gt;
docker push iad.ocir.io/&amp;lt;tenancy&amp;gt;/gpu-inference/vllm:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fair warning: GPU images are big. Mine was about 8GB. The first push took a while, but after that Docker's layer caching means only changed layers get uploaded. Most rebuilds push in under a minute.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up OKE with GPU Nodes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create cluster (control plane is free)&lt;/span&gt;
oci ce cluster create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kubernetes-version&lt;/span&gt; v1.30.1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gpu-inference-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vcn-id&lt;/span&gt; &lt;span class="nv"&gt;$VCN_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--endpoint-subnet-id&lt;/span&gt; &lt;span class="nv"&gt;$API_SUBNET_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-lb-subnet-ids&lt;/span&gt; &lt;span class="s1"&gt;'["'&lt;/span&gt;&lt;span class="nv"&gt;$LB_SUBNET_ID&lt;/span&gt;&lt;span class="s1"&gt;'"]'&lt;/span&gt;

&lt;span class="c"&gt;# Create GPU node pool&lt;/span&gt;
oci ce node-pool create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-id&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kubernetes-version&lt;/span&gt; v1.30.1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gpu-a10-pool &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-shape&lt;/span&gt; VM.GPU.A10.1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-config-details&lt;/span&gt; &lt;span class="s1"&gt;'{
    "size": 2,
    "placementConfigs": [{
      "availabilityDomain": "Uocm:US-ASHBURN-AD-1",
      "subnetId": "'&lt;/span&gt;&lt;span class="nv"&gt;$WORKER_SUBNET_ID&lt;/span&gt;&lt;span class="s1"&gt;'"
    }]
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--node-source-details&lt;/span&gt; &lt;span class="s1"&gt;'{
    "imageId": "'&lt;/span&gt;&lt;span class="nv"&gt;$GPU_IMAGE_ID&lt;/span&gt;&lt;span class="s1"&gt;'",
    "sourceType": "IMAGE"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--initial-node-labels&lt;/span&gt; &lt;span class="s1"&gt;'[{
    "key": "nvidia.com/gpu",
    "value": "present"
  }]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing I liked about OKE — the GPU node pools come with the NVIDIA device plugin already installed. On EKS I had to install the device plugin myself via a DaemonSet. Here it just works, and &lt;code&gt;nvidia.com/gpu&lt;/code&gt; shows up as a schedulable resource immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the Inference Service
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# inference-deployment.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-inference&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-inference&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-inference&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iad.ocir.io/&amp;lt;tenancy&amp;gt;/gpu-inference/vllm:v1&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
              &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
            &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16Gi"&lt;/span&gt;
          &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HUGGING_FACE_HUB_TOKEN&lt;/span&gt;
              &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;secretKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hf-token&lt;/span&gt;
                  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;token&lt;/span&gt;
          &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-cache&lt;/span&gt;
              &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/root/.cache/huggingface&lt;/span&gt;
          &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
            &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
            &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
          &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/health&lt;/span&gt;
              &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
            &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
            &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-cache&lt;/span&gt;
          &lt;span class="na"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;claimName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-cache&lt;/span&gt;
      &lt;span class="na"&gt;imagePullSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ocir-secret&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-inference&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;oci.oraclecloud.com/load-balancer-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lb"&lt;/span&gt;
    &lt;span class="na"&gt;service.beta.kubernetes.io/oci-load-balancer-shape&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flexible"&lt;/span&gt;
    &lt;span class="na"&gt;service.beta.kubernetes.io/oci-load-balancer-shape-flex-min&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
    &lt;span class="na"&gt;service.beta.kubernetes.io/oci-load-balancer-shape-flex-max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm-inference&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-cache&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;accessModes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ReadWriteOnce"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oci-bv&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things I learned the hard way while setting this up:&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;nvidia.com/gpu: 1&lt;/code&gt; in resource limits is how Kubernetes knows to schedule this on a GPU node. Forget it and your pod lands on a CPU node and crashes.&lt;/p&gt;

&lt;p&gt;The PVC for model cache is important. Without it, the model downloads from HuggingFace every time the pod restarts. Phi-3-mini is a few GB — that's 5-10 minutes of startup time you don't want to repeat.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;initialDelaySeconds: 120&lt;/code&gt; on the liveness probe took me a restart loop to figure out. Model loading is slow. If your liveness probe fires before the model is loaded, Kubernetes kills the pod, it restarts, starts loading again, gets killed again... you get the idea. Give it at least 2 minutes.&lt;/p&gt;

&lt;p&gt;The OCI Load Balancer annotations tell OKE to automatically provision a load balancer. No separate Terraform resource needed.&lt;/p&gt;

&lt;p&gt;Deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace inference

&lt;span class="c"&gt;# Create OCIR pull secret&lt;/span&gt;
kubectl create secret docker-registry ocir-secret &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; inference &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--docker-server&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;iad.ocir.io &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--docker-username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;tenancy&amp;gt;/&amp;lt;user&amp;gt;'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--docker-password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'&amp;lt;auth-token&amp;gt;'&lt;/span&gt;

&lt;span class="c"&gt;# Create HuggingFace token secret (from OCI Vault ideally)&lt;/span&gt;
kubectl create secret generic hf-token &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; inference &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$HF_TOKEN&lt;/span&gt;

kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; inference-deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a few minutes (mostly model download time), the service is up and accessible via the load balancer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LB_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get svc vllm-inference &lt;span class="nt"&gt;-n&lt;/span&gt; inference &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.loadBalancer.ingress[0].ip}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

curl http://&lt;span class="nv"&gt;$LB_IP&lt;/span&gt;/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "microsoft/Phi-3-mini-4k-instruct",
    "prompt": "What is Oracle Cloud Infrastructure?",
    "max_tokens": 100
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring GPU Utilization
&lt;/h2&gt;

&lt;p&gt;Once the inference service was running, I wanted to see actual GPU utilization. Without this you're flying blind — you have no idea if the GPU is sitting at 10% or 95%. DCGM Exporter gives you Prometheus metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add gpu-helm-charts https://nvidia.github.io/dcgm-exporter/helm-charts
helm &lt;span class="nb"&gt;install &lt;/span&gt;dcgm-exporter gpu-helm-charts/dcgm-exporter &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; monitoring &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; serviceMonitor.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you &lt;code&gt;DCGM_FI_DEV_GPU_UTIL&lt;/code&gt; (utilization), &lt;code&gt;DCGM_FI_DEV_MEM_COPY_UTIL&lt;/code&gt; (memory), temperature, power draw, etc. I have a Grafana dashboard that shows all of these and it's been useful for right-sizing.&lt;/p&gt;

&lt;p&gt;I also built &lt;a href="https://github.com/pmady/otel-gpu-receiver" rel="noopener noreferrer"&gt;otel-gpu-receiver&lt;/a&gt; which does something similar but for OpenTelemetry. If you're already running an OTel collector, it might be a better fit than DCGM Exporter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Actually Paying
&lt;/h2&gt;

&lt;p&gt;Here's the monthly bill comparison for running Phi-3-mini on a single A10, always-on:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCI OKE + VM.GPU.A10.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed K8s + GPU node&lt;/td&gt;
&lt;td&gt;~$1,210&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCI OKE + preemptible A10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same, but preemptible&lt;/td&gt;
&lt;td&gt;~$365&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS EKS + g5.xlarge&lt;/td&gt;
&lt;td&gt;Managed K8s + GPU node&lt;/td&gt;
&lt;td&gt;~$1,100 + $73 (control plane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP GKE + g2-standard-4&lt;/td&gt;
&lt;td&gt;Managed K8s + GPU node&lt;/td&gt;
&lt;td&gt;~$1,300 + $73 (control plane)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure AKS + NC4as_T4_v3&lt;/td&gt;
&lt;td&gt;Managed K8s + T4 GPU&lt;/td&gt;
&lt;td&gt;~$550 + less powerful GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The free control plane saves $73/mo by itself compared to EKS or GKE. And for my dev/test workloads I switched to preemptible instances, which dropped the GPU cost to $365/mo. The pods get evicted occasionally but for development that's fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Dev with Docker Model Runner
&lt;/h2&gt;

&lt;p&gt;I keep coming back to this because it changed how I work. Before Model Runner, testing a prompt change meant: edit prompt, rebuild image, push to OCIR, wait for OKE to pull it, test, realize it's wrong, repeat. Twenty minutes per iteration.&lt;/p&gt;

&lt;p&gt;Now I just run the model locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pull a model&lt;/span&gt;
docker model pull ai/phi3-mini

&lt;span class="c"&gt;# Run inference&lt;/span&gt;
docker model run ai/phi3-mini &lt;span class="s2"&gt;"Summarize: Oracle Cloud Infrastructure provides..."&lt;/span&gt;

&lt;span class="c"&gt;# Or use the API endpoint&lt;/span&gt;
curl http://localhost:12434/engines/phi3-mini/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "What is OKE?", "max_tokens": 50}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same API, same prompt format. When the prompt works locally, I rebuild the production image and push. The container is what makes this portable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was It Worth Switching?
&lt;/h2&gt;

&lt;p&gt;Honestly, yes. The Docker workflow didn't change at all — same Dockerfile, same &lt;code&gt;docker build&lt;/code&gt;, same &lt;code&gt;docker push&lt;/code&gt;. I just changed the registry URL and the Kubernetes annotations. The inference service runs the same. The GPU utilization is the same. The API responses are the same.&lt;/p&gt;

&lt;p&gt;What changed is the bill. And the fact that I don't pay $73/month for a Kubernetes control plane anymore. If you're running GPU workloads on AWS or GCP and haven't priced out OCI, it's worth 30 minutes of your time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri — Oracle ACE Associate, CNCF Golden Kubestronaut. I write about containers, Kubernetes, and GPU infrastructure. &lt;a href="https://github.com/pmady" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://pmady.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://scholar.google.com/citations?view_op=list_works&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt; | &lt;a href="https://www.researchgate.net/profile/Pavan-Madduri-2?ev=hdr_xprf" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oke</category>
      <category>gpu</category>
      <category>oci</category>
      <category>docker</category>
    </item>
    <item>
      <title>Securing Docker Images on OCI — OCIR, Docker Scout, and OCI Vault</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 14:56:18 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/securing-docker-images-on-oci-ocir-docker-scout-and-oci-vault-14pl</link>
      <guid>https://dev.to/pavan_madduri/securing-docker-images-on-oci-ocir-docker-scout-and-oci-vault-14pl</guid>
      <description>&lt;p&gt;&lt;em&gt;I found a critical CVE in a production container image last month. It had been there for five months. Here's the setup I built on OCI so that doesn't happen again.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Found Out the Hard Way
&lt;/h2&gt;

&lt;p&gt;A few weeks ago I ran &lt;code&gt;docker scout cves&lt;/code&gt; on a container image that had been running in production for months. I wasn't expecting anything — I'd scanned it when I first built it and it was clean. But the base image (ubuntu:22.04) had picked up a handful of CVEs since then, including one rated critical.&lt;/p&gt;

&lt;p&gt;Nobody had checked. The image was built once, pushed once, and forgotten. The container was humming along fine, but it was running vulnerable libraries that we never would've shipped knowingly.&lt;/p&gt;

&lt;p&gt;This is a workflow problem more than a tooling problem. The scanners exist. I just wasn't running them at the right points. So I spent a weekend wiring together a proper pipeline on OCI using what was already available — Docker Scout, OCIR's built-in scanning, and OCI Vault for secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup (Three Places to Catch Problems)
&lt;/h2&gt;

&lt;p&gt;The idea is simple: scan before you push, scan after you push, and never put secrets in the image. Three checkpoints, each catching different things.&lt;/p&gt;

&lt;p&gt;Docker Scout runs on my laptop and in CI — it catches CVEs before the image leaves my machine. OCIR scans again after the push, which catches anything Scout might miss and gives me a second opinion from Oracle's vulnerability database. OCI Vault handles secrets so I'm not baking API keys into environment variables like it's 2015.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Scout — Catching CVEs Before They Leave My Machine
&lt;/h2&gt;

&lt;p&gt;I'd been ignoring Docker Scout for a while, thinking it was just another scanning tool. It's actually pretty good. It comes built into Docker Desktop and the CLI, so there's no extra install.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scan a local image&lt;/span&gt;
docker scout cves my-api:latest

&lt;span class="c"&gt;# Quick view — just critical and high severity&lt;/span&gt;
docker scout cves my-api:latest &lt;span class="nt"&gt;--only-severity&lt;/span&gt; critical,high

&lt;span class="c"&gt;# Compare two image versions&lt;/span&gt;
docker scout compare my-api:v2 &lt;span class="nt"&gt;--to&lt;/span&gt; my-api:v1

&lt;span class="c"&gt;# Get remediation recommendations&lt;/span&gt;
docker scout recommendations my-api:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;recommendations&lt;/code&gt; command is the one that made me a convert. Instead of just listing CVE IDs and making you figure out what to do, it tells you exactly which base image version fixes the problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recommended fixes:
  Base image: golang:1.22-alpine → golang:1.22.4-alpine
  Fixes: CVE-2024-24790, CVE-2024-24789

  Base image: alpine:3.19 → alpine:3.20
  Fixes: 3 vulnerabilities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That saved me probably 30 minutes of Googling CVE IDs and cross-referencing which alpine version patched what. I just updated the FROM line and rebuilt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Putting Scout in CI
&lt;/h3&gt;

&lt;p&gt;Running it locally is fine, but the real value is when it blocks bad images in CI automatically. Here's what I have in GitHub Actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/security.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Container Security&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker build -t my-api:${{ github.sha }} .&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Scout scan&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cves&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-api:${{ github.sha }}&lt;/span&gt;
          &lt;span class="na"&gt;only-severities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical,high&lt;/span&gt;
          &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Fail the build if critical/high CVEs found&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Scout recommendations&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/scout-action@v1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;recommendations&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-api:${{ github.sha }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important bit is &lt;code&gt;exit-code: true&lt;/code&gt;. Without that flag, Scout just prints the results and the build happily continues. With it, any critical or high CVE fails the pipeline. I've had this block two PRs in the last month and both times it was a legitimate issue in the base image.&lt;/p&gt;

&lt;h2&gt;
  
  
  OCIR Scanning — The Second Pair of Eyes
&lt;/h2&gt;

&lt;p&gt;OCIR has its own vulnerability scanner that runs against Oracle's database. It sometimes catches things Scout doesn't (different vulnerability feeds) and vice versa. I like having both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting It Up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a repository with scanning enabled&lt;/span&gt;
oci artifacts container repository create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"production-api"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--is-immutable&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--readme-enabled&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--is-immutable true&lt;/code&gt; flag is one I wish I'd known about earlier. It prevents anyone from overwriting a tag. So once &lt;code&gt;v1.2.3&lt;/code&gt; is pushed, that's it — nobody can push a different image with the same tag. Sounds obvious but I've been bitten by &lt;code&gt;:latest&lt;/code&gt; being silently overwritten before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Push and Scan
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tag and push&lt;/span&gt;
docker tag my-api:v1.2.3 iad.ocir.io/&lt;span class="nv"&gt;$TENANCY&lt;/span&gt;/production-api:v1.2.3
docker push iad.ocir.io/&lt;span class="nv"&gt;$TENANCY&lt;/span&gt;/production-api:v1.2.3

&lt;span class="c"&gt;# Trigger a scan (or it runs automatically)&lt;/span&gt;
oci vulnerability-scanning container scan create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image-id&lt;/span&gt; &amp;lt;image-ocid&amp;gt;

&lt;span class="c"&gt;# Check scan results&lt;/span&gt;
oci vulnerability-scanning container scan result list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image-id&lt;/span&gt; &amp;lt;image-ocid&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scan Policies
&lt;/h3&gt;

&lt;p&gt;You can also set up automated scan targets in Terraform so every push to specific repos gets scanned without anyone having to remember to do it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Terraform — scan policy&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_vulnerability_scanning_container_scan_recipe"&lt;/span&gt; &lt;span class="s2"&gt;"strict"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;compartment_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compartment_id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"strict-scan-recipe"&lt;/span&gt;

  &lt;span class="nx"&gt;scan_settings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;scan_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"STANDARD"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_vulnerability_scanning_container_scan_target"&lt;/span&gt; &lt;span class="s2"&gt;"production"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;compartment_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compartment_id&lt;/span&gt;
  &lt;span class="nx"&gt;container_scan_recipe_id&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_vulnerability_scanning_container_scan_recipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;strict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"production-registry-scan"&lt;/span&gt;

  &lt;span class="nx"&gt;target_registry&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;compartment_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compartment_id&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"OCIR"&lt;/span&gt;
    &lt;span class="nx"&gt;repositories&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"production-api"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  OCI Vault — Stop Putting Secrets in Environment Variables
&lt;/h2&gt;

&lt;p&gt;This one is embarrassing to admit, but I've shipped containers with API keys in environment variables more times than I'd like. Not in the Dockerfile directly (I know better than that), but in docker-compose files that ended up in git, or in Kubernetes manifests that got copy-pasted around.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WRONG — secrets baked into the image&lt;/span&gt;
&lt;span class="s"&gt;ENV DATABASE_PASSWORD=hunter2&lt;/span&gt;
&lt;span class="s"&gt;ENV API_KEY=sk-abc123&lt;/span&gt;

&lt;span class="c1"&gt;# WRONG — secrets in compose file committed to git&lt;/span&gt;
&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DATABASE_PASSWORD=hunter2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OCI Vault stores secrets in an HSM. The container pulls them at startup. They never exist in the image, in your compose file, or in git.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create Vault and Secrets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a vault&lt;/span&gt;
oci kms management vault create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"docker-secrets-vault"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vault-type&lt;/span&gt; DEFAULT

&lt;span class="c"&gt;# Create a master encryption key&lt;/span&gt;
oci kms management key create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"docker-secrets-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--key-shape&lt;/span&gt; &lt;span class="s1"&gt;'{"algorithm": "AES", "length": 32}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--endpoint&lt;/span&gt; &lt;span class="nv"&gt;$VAULT_MGMT_ENDPOINT&lt;/span&gt;

&lt;span class="c"&gt;# Store a secret&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"my-database-password"&lt;/span&gt; | &lt;span class="nb"&gt;base64&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
oci vault secret create-base64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vault-id&lt;/span&gt; &lt;span class="nv"&gt;$VAULT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--key-id&lt;/span&gt; &lt;span class="nv"&gt;$KEY_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret-name&lt;/span&gt; &lt;span class="s2"&gt;"prod-db-password"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret-content-content&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; -&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pull Secrets at Deploy Time
&lt;/h3&gt;

&lt;p&gt;For OCI Container Instances, use an init script that fetches secrets from Vault:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="c"&gt;# entrypoint.sh — fetch secrets from OCI Vault before starting the app&lt;/span&gt;

&lt;span class="c"&gt;# Using instance principal authentication (no credentials needed)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATABASE_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oci secrets secret-bundle get-secret-bundle-by-name &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vault-id&lt;/span&gt; &lt;span class="nv"&gt;$VAULT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret-name&lt;/span&gt; &lt;span class="s2"&gt;"prod-db-password"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--stage&lt;/span&gt; CURRENT &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'data."secret-bundle-content".content'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--raw-output&lt;/span&gt; | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oci secrets secret-bundle get-secret-bundle-by-name &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vault-id&lt;/span&gt; &lt;span class="nv"&gt;$VAULT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--secret-name&lt;/span&gt; &lt;span class="s2"&gt;"prod-api-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--stage&lt;/span&gt; CURRENT &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'data."secret-bundle-content".content'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--raw-output&lt;/span&gt; | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Start the application&lt;/span&gt;
&lt;span class="nb"&gt;exec&lt;/span&gt; /server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; gcr.io/distroless/static-debian12:nonroot&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/server /server&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; entrypoint.sh /entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/entrypoint.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For OKE, I use External Secrets Operator which syncs secrets from OCI Vault into Kubernetes secrets automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;external-secrets.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ExternalSecret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;refreshInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1h&lt;/span&gt;
  &lt;span class="na"&gt;secretStoreRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;oci-vault&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterSecretStore&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-secrets&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DATABASE_PASSWORD&lt;/span&gt;
      &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-db-password"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;secretKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;API_KEY&lt;/span&gt;
      &lt;span class="na"&gt;remoteRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-api-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nice thing here is that when I rotate a secret in Vault, ESO picks it up within an hour and updates the Kubernetes secret. Pods get the new value on their next restart. No redeployment, no new image push, no PR to change a YAML file.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Build&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; my-api:v1.2.3 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# 2. Scan locally (fail fast)&lt;/span&gt;
docker scout cves my-api:v1.2.3 &lt;span class="nt"&gt;--only-severity&lt;/span&gt; critical,high &lt;span class="nt"&gt;--exit-code&lt;/span&gt;

&lt;span class="c"&gt;# 3. Push to OCIR (triggers registry scan)&lt;/span&gt;
docker tag my-api:v1.2.3 iad.ocir.io/&lt;span class="nv"&gt;$TENANCY&lt;/span&gt;/production-api:v1.2.3
docker push iad.ocir.io/&lt;span class="nv"&gt;$TENANCY&lt;/span&gt;/production-api:v1.2.3

&lt;span class="c"&gt;# 4. Deploy (secrets pulled from Vault at runtime)&lt;/span&gt;
oci container-instances container-instance create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--containers&lt;/span&gt; &lt;span class="s1"&gt;'[{
    "imageUrl": "iad.ocir.io/'&lt;/span&gt;&lt;span class="nv"&gt;$TENANCY&lt;/span&gt;&lt;span class="s1"&gt;'/production-api:v1.2.3",
    "environmentVariables": {
      "VAULT_ID": "'&lt;/span&gt;&lt;span class="nv"&gt;$VAULT_ID&lt;/span&gt;&lt;span class="s1"&gt;'"
    }
  }]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two scan layers, immutable tags, no secrets in source control. None of this required buying a third-party security platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The tooling was never the problem. Docker Scout, OCIR scanning, OCI Vault — they were all available. I just wasn't using them consistently. The five-month-old CVE I found wasn't a failure of technology, it was a failure of workflow.&lt;/p&gt;

&lt;p&gt;Now scanning happens automatically at two points (local/CI and registry), secrets are in Vault instead of YAML files, and image tags are immutable so nobody accidentally overwrites a production image. It took a weekend to set up. I should've done it a year ago.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri — Oracle ACE Associate, CNCF Golden Kubestronaut. I write about containers, Kubernetes, and GPU infrastructure. &lt;a href="https://github.com/pmady" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://pmady.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://scholar.google.com/citations?view_op=list_works&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt; | &lt;a href="https://www.researchgate.net/profile/Pavan-Madduri-2?ev=hdr_xprf" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oci</category>
      <category>oke</category>
      <category>dockerscout</category>
      <category>ocivault</category>
    </item>
    <item>
      <title>Running Docker Containers on OCI Without Kubernetes</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Fri, 08 May 2026 14:51:57 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/running-docker-containers-on-oci-without-kubernetes-58k8</link>
      <guid>https://dev.to/pavan_madduri/running-docker-containers-on-oci-without-kubernetes-58k8</guid>
      <description>&lt;p&gt;&lt;em&gt;I needed to run a container in the cloud. Not a microservices platform. Not a service mesh. Just one container, one port, accessible from the internet. Here's how OCI Container Instances turned out to be the right tool.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Stopped Reaching for Kubernetes First
&lt;/h2&gt;

&lt;p&gt;I'll be honest — I'm a Kubestronaut. I have all the CNCF Kubernetes certifications. My default muscle memory is &lt;code&gt;kubectl apply -f&lt;/code&gt; for everything. But last month I needed to deploy a small Go API for a side project and I caught myself writing a Helm chart for a single container.&lt;/p&gt;

&lt;p&gt;That felt ridiculous.&lt;/p&gt;

&lt;p&gt;The API had two endpoints. It needed 256MB of RAM. There was no reason to stand up a control plane, configure node pools, set up an ingress controller, and maintain all of that just to serve JSON over HTTP.&lt;/p&gt;

&lt;p&gt;I'd used OCI Container Instances before for a quick test and remembered it being dead simple. So I tried it for real this time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OCI Container Instances Actually Are
&lt;/h2&gt;

&lt;p&gt;The closest analogy is &lt;code&gt;docker run&lt;/code&gt; but Oracle manages the host. You give it an image, tell it how much CPU and memory you want, point it at a subnet, and it runs. The container gets a real IP on your VCN. You can pull from OCIR, Docker Hub, or any OCI-compliant registry.&lt;/p&gt;

&lt;p&gt;I was surprised by the resource limits — you can go up to 64 OCPUs and 1TB of RAM on a single instance. Fargate caps out at 16 vCPUs. Cloud Run at 8. For most of my use cases that doesn't matter, but it's nice to know the ceiling is high if I need it later.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;OCI Container Instances&lt;/th&gt;
&lt;th&gt;AWS Fargate&lt;/th&gt;
&lt;th&gt;Cloud Run&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max vCPUs&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Memory&lt;/td&gt;
&lt;td&gt;1024 GB&lt;/td&gt;
&lt;td&gt;120 GB&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Support&lt;/td&gt;
&lt;td&gt;Yes (A10, A100)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (L4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold Start&lt;/td&gt;
&lt;td&gt;~2-3s&lt;/td&gt;
&lt;td&gt;5-15s&lt;/td&gt;
&lt;td&gt;2-8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Min Billing&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;1 minute&lt;/td&gt;
&lt;td&gt;100ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The GPU support is worth mentioning — you can run NVIDIA GPU containers without managing drivers or CUDA installs on the host. I haven't used this in production yet but I've tested it with a vLLM image and it worked without any changes to the Dockerfile.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API I Deployed
&lt;/h2&gt;

&lt;p&gt;Nothing fancy. A Go service with two endpoints — &lt;code&gt;/health&lt;/code&gt; and &lt;code&gt;/info&lt;/code&gt;. I chose Go because the final image is tiny (under 15MB with distroless) and it starts in milliseconds, which matters when you're paying per second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// main.go&lt;/span&gt;
&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;HealthResponse&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Status&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"status"`&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"timestamp"`&lt;/span&gt;
    &lt;span class="n"&gt;Host&lt;/span&gt;      &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"host"`&lt;/span&gt;
    &lt;span class="n"&gt;Region&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"region"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;InfoResponse&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Service&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"service"`&lt;/span&gt;
    &lt;span class="n"&gt;Version&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"version"`&lt;/span&gt;
    &lt;span class="n"&gt;Runtime&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"runtime"`&lt;/span&gt;
    &lt;span class="n"&gt;Platform&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"platform"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"8080"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandleFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/health"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Hostname&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;HealthResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Timestamp&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UTC&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RFC3339&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;Host&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Region&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OCI_REGION"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandleFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;InfoResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"oci-docker-demo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Version&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"APP_VERSION"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"OCI Container Instances"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Platform&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Oracle Cloud Infrastructure"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Starting server on :%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":%s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Dockerfile is a straightforward multi-stage build. The builder compiles the binary, and the final image is distroless so there's almost nothing in it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.22-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; go.mod main.go ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux go build &lt;span class="nt"&gt;-ldflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-s -w"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; server .

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; gcr.io/distroless/static-debian12:nonroot&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /app/server /server&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; nonroot:nonroot&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/server"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build and test locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; oci-docker-demo:v1 &lt;span class="nb"&gt;.&lt;/span&gt;
docker run &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;OCI_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-ashburn-1 oci-docker-demo:v1

&lt;span class="c"&gt;# Test it&lt;/span&gt;
curl http://localhost:8080/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pushing to OCIR
&lt;/h2&gt;

&lt;p&gt;OCIR is OCI's private container registry. Free for standard usage, which is nice. The login command is a bit verbose compared to Docker Hub — you need the tenancy namespace prefix — but it works the same way after that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Log in to OCIR&lt;/span&gt;
docker login iad.ocir.io &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;tenancy-namespace&amp;gt;/&amp;lt;username&amp;gt;'&lt;/span&gt;

&lt;span class="c"&gt;# Tag for OCIR&lt;/span&gt;
docker tag oci-docker-demo:v1 iad.ocir.io/&amp;lt;tenancy-namespace&amp;gt;/docker-demos/oci-demo:v1

&lt;span class="c"&gt;# Push&lt;/span&gt;
docker push iad.ocir.io/&amp;lt;tenancy-namespace&amp;gt;/docker-demos/oci-demo:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran Docker Scout on it before pushing, mostly out of habit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker scout cves oci-docker-demo:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero critical or high CVEs, which is expected with distroless. If you're using ubuntu or debian as your base, you'll probably see a few here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying — The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;This is where Container Instances won me over. One CLI command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;oci container-instances container-instance create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--availability-domain&lt;/span&gt; &lt;span class="s2"&gt;"Uocm:US-ASHBURN-AD-1"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"docker-demo-api"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--shape&lt;/span&gt; &lt;span class="s2"&gt;"CI.Standard.A1.Flex"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--shape-config&lt;/span&gt; &lt;span class="s1"&gt;'{"ocpus": 1, "memoryInGBs": 2}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--containers&lt;/span&gt; &lt;span class="s1"&gt;'[{
    "imageUrl": "iad.ocir.io/&amp;lt;tenancy&amp;gt;/docker-demos/oci-demo:v1",
    "displayName": "api",
    "environmentVariables": {
      "PORT": "8080",
      "OCI_REGION": "us-ashburn-1",
      "APP_VERSION": "1.0.0"
    },
    "resourceConfig": {
      "vcpusLimit": 1,
      "memoryLimitInGBs": 2
    }
  }]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnics&lt;/span&gt; &lt;span class="s1"&gt;'[{
    "subnetId": "'&lt;/span&gt;&lt;span class="nv"&gt;$SUBNET_ID&lt;/span&gt;&lt;span class="s1"&gt;'",
    "isPublicIpAssigned": true
  }]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran this and had a public IP with a working API in about 3 seconds. No joke. I spent more time writing the CLI command than waiting for it to deploy. Coming from Kubernetes where I'm used to waiting for nodes to scale up, load balancers to provision, and pods to pass readiness checks... this was refreshingly fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform Version (for when this isn't just a side project)
&lt;/h2&gt;

&lt;p&gt;I wouldn't run that CLI command manually every time in a real workflow. Here's the same thing in Terraform, which I'd use for anything that needs to be reproducible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_container_instances_container_instance"&lt;/span&gt; &lt;span class="s2"&gt;"demo"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;compartment_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compartment_id&lt;/span&gt;
  &lt;span class="nx"&gt;availability_domain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;oci_identity_availability_domains&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;availability_domains&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"docker-demo-api"&lt;/span&gt;

  &lt;span class="nx"&gt;shape&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"CI.Standard.A1.Flex"&lt;/span&gt;
  &lt;span class="nx"&gt;shape_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;ocpus&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="nx"&gt;memory_in_gbs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;containers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;image_url&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iad.ocir.io/${var.tenancy_namespace}/docker-demos/oci-demo:v1"&lt;/span&gt;
    &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt;

    &lt;span class="nx"&gt;environment_variables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;PORT&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"8080"&lt;/span&gt;
      &lt;span class="nx"&gt;OCI_REGION&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;
      &lt;span class="nx"&gt;APP_VERSION&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1.0.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;resource_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;vcpus_limit&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
      &lt;span class="nx"&gt;memory_limit_in_gbs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;health_checks&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;health_check_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;
      &lt;span class="nx"&gt;port&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
      &lt;span class="nx"&gt;path&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/health"&lt;/span&gt;
      &lt;span class="nx"&gt;interval_in_seconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;vnics&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_id&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnet_id&lt;/span&gt;
    &lt;span class="nx"&gt;is_public_ip_assigned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;image_pull_secrets&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;registry_endpoint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"iad.ocir.io"&lt;/span&gt;
    &lt;span class="nx"&gt;secret_id&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_vault_secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ocir_creds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;secret_type&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"VAULT"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"container_public_ip"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_container_instances_container_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;demo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vnics&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;private_ip&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding a Load Balancer
&lt;/h2&gt;

&lt;p&gt;Container Instances give you a public IP directly, but for anything with real traffic you probably want a load balancer in front. TLS termination, health checks, the usual. Here's the Terraform for that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_load_balancer_load_balancer"&lt;/span&gt; &lt;span class="s2"&gt;"api_lb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;compartment_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compartment_id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"docker-demo-lb"&lt;/span&gt;
  &lt;span class="nx"&gt;shape&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"flexible"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;shape_details&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;minimum_bandwidth_in_mbps&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="nx"&gt;maximum_bandwidth_in_mbps&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_load_balancer_backend_set"&lt;/span&gt; &lt;span class="s2"&gt;"api_backend"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;load_balancer_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_load_balancer_load_balancer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_lb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-backends"&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ROUND_ROBIN"&lt;/span&gt;

  &lt;span class="nx"&gt;health_checker&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;
    &lt;span class="nx"&gt;port&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;
    &lt;span class="nx"&gt;url_path&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/health"&lt;/span&gt;
    &lt;span class="nx"&gt;interval_ms&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
    &lt;span class="nx"&gt;return_code&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"oci_load_balancer_listener"&lt;/span&gt; &lt;span class="s2"&gt;"https"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;load_balancer_id&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_load_balancer_load_balancer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_lb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;                     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https-listener"&lt;/span&gt;
  &lt;span class="nx"&gt;default_backend_set_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;oci_load_balancer_backend_set&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;port&lt;/span&gt;                     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt;

  &lt;span class="nx"&gt;ssl_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;certificate_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;certificate_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;protocols&lt;/span&gt;                &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"TLSv1.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"TLSv1.3"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;server_order_preference&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ENABLED"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What It Actually Costs
&lt;/h2&gt;

&lt;p&gt;I tracked the bill for a month. It was comically low:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OCI Container Instance (1 OCPU ARM, 2GB)&lt;/td&gt;
&lt;td&gt;~$3.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OCI Load Balancer (flexible, 10 Mbps)&lt;/td&gt;
&lt;td&gt;~$12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$15.50/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Fargate equivalent&lt;/td&gt;
&lt;td&gt;~$35/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP Cloud Run equivalent&lt;/td&gt;
&lt;td&gt;~$25/mo (usage-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ARM shapes on OCI are genuinely cheap. I was paying more for my morning coffee than for this API.&lt;/p&gt;

&lt;h2&gt;
  
  
  When I'd Still Use OKE
&lt;/h2&gt;

&lt;p&gt;Container Instances aren't a Kubernetes replacement. They're for the cases where Kubernetes is more infrastructure than you need.&lt;/p&gt;

&lt;p&gt;If I'm running 10+ services that talk to each other, need rolling deployments, RBAC, network policies, or auto-scaling — I'm using OKE. I work with Kubernetes daily and I'm not trying to avoid it.&lt;/p&gt;

&lt;p&gt;But for a single API, a batch job, a webhook handler, or a quick prototype? Container Instances get me to production faster with less stuff to maintain. And the Docker workflow is the same — same Dockerfile, same &lt;code&gt;docker build&lt;/code&gt;, same image. I just change where I deploy it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;I've been using Container Instances for about a month now for small services and side projects. The thing I keep coming back to is how little I think about infrastructure when using them. No node pools to right-size, no cluster upgrades to schedule, no ingress controllers to debug.&lt;/p&gt;

&lt;p&gt;If you're on OCI and you haven't tried Container Instances yet, spend 10 minutes with it. You might realize, like I did, that half the containers you're running on Kubernetes don't actually need Kubernetes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Pavan Madduri — Oracle ACE Associate, CNCF Golden Kubestronaut. I write about containers, Kubernetes, and GPU infrastructure. &lt;a href="https://github.com/pmady" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://linkedin.com/in/pavanmadduri" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://pmady.github.io/" rel="noopener noreferrer"&gt;Website&lt;/a&gt; | &lt;a href="https://scholar.google.com/citations?view_op=list_works&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt; | &lt;a href="https://www.researchgate.net/profile/Pavan-Madduri-2?ev=hdr_xprf" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>docker</category>
      <category>oke</category>
      <category>containers</category>
    </item>
    <item>
      <title>Deploying a Production-Ready K3s Cluster on OCI Always Free ARM Instances</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Wed, 11 Mar 2026 14:50:00 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/deploying-a-production-ready-k3s-cluster-on-oci-always-free-arm-instances-mmj</link>
      <guid>https://dev.to/pavan_madduri/deploying-a-production-ready-k3s-cluster-on-oci-always-free-arm-instances-mmj</guid>
      <description>&lt;h1&gt;
  
  
  Deploying a Production-Ready K3s Cluster on OCI Always Free ARM Instances
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;How I turned Oracle Cloud's free ARM compute into a fully functional Kubernetes cluster — with ingress, persistent storage, and TLS — all without spending a dollar.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I have been running Kubernetes clusters professionally for years — managed services like EKS, AKS, GKE, and self-hosted clusters with kubeadm. They all cost money. Even the cheapest managed Kubernetes offering runs $70-80/month just for the control plane.&lt;/p&gt;

&lt;p&gt;Then I looked at what Oracle Cloud gives away for free: 4 ARM OCPUs and 24GB of RAM on the Always Free tier. That is more compute than most developers use for their entire home lab. The question was obvious — could I run a real Kubernetes cluster on it?&lt;/p&gt;

&lt;p&gt;The answer is yes, and it works better than I expected.&lt;/p&gt;

&lt;p&gt;In this post, I will walk through deploying K3s — Rancher's lightweight Kubernetes distribution — on OCI Always Free ARM instances. Not a toy cluster. A cluster with ingress routing, persistent volumes, automatic TLS certificates, and enough resources to run real workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why K3s on OCI ARM?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why K3s over full Kubernetes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;K3s strips out the components most developers never use — cloud controller, storage drivers, legacy API versions — and replaces etcd with SQLite (or embedded etcd for HA). The result is a single binary under 100MB that starts in seconds.&lt;/p&gt;

&lt;p&gt;On resource-constrained Always Free instances, this matters. Full kubeadm clusters consume 2-3GB of RAM just for the control plane. K3s uses around 512MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why OCI ARM over other clouds?&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Compute&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCI Always Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 ARM OCPUs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;24 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Forever&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Free Tier&lt;/td&gt;
&lt;td&gt;1 vCPU (t2.micro)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;12 months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP Free Tier&lt;/td&gt;
&lt;td&gt;0.25 vCPU (e2-micro)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;Forever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Free&lt;/td&gt;
&lt;td&gt;1 vCPU (B1S)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;12 months&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There is no comparison. OCI gives you 24x the RAM of any competitor's free tier, permanently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Here is what we are building:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────┐
│                    OCI VCN (10.0.0.0/16)             │
│                                                      │
│  ┌────────────────────────────────────────────────┐  │
│  │           Public Subnet (10.0.1.0/24)          │  │
│  │                                                │  │
│  │  ┌──────────────────┐  ┌────────────────────┐  │  │
│  │  │   K3s Server     │  │   K3s Agent        │  │  │
│  │  │   (Control Plane)│  │   (Worker Node)    │  │  │
│  │  │                  │  │                    │  │  │
│  │  │  2 OCPU / 12GB   │  │  2 OCPU / 12GB     │  │  │
│  │  │  Oracle Linux 9  │  │  Oracle Linux 9    │  │  │
│  │  │                  │  │                    │  │  │
│  │  │  - K3s server    │  │  - K3s agent       │  │  │
│  │  │  - Traefik        │  │  - Workloads       │  │  │
│  │  │  - CoreDNS       │  │  - Pods            │  │  │
│  │  │  - Metrics       │  │                    │  │  │
│  │  └──────────────────┘  └────────────────────┘  │  │
│  │                                                │  │
│  └────────────────────────────────────────────────┘  │
│                                                      │
│  Security List:                                      │
│    Ingress: SSH(22), HTTP(80), HTTPS(443),           │
│             K8s API(6443), Kubelet(10250),           │
│             NodePort(30000-32767)                    │
│    Egress:  All traffic                               │
│                                                      │
│  ┌────────────────────────────────────────────────┐  │
│  │  OCI Load Balancer (10 Mbps - Always Free)     │  │
│  │  → Forwards 80/443 to K3s Traefik Ingress       │  │
│  └────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We split the 4 OCPUs and 24GB evenly: 2 OCPUs + 12GB for the server node, 2 OCPUs + 12GB for the worker. This gives the control plane enough room to breathe while leaving serious capacity for workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before starting, you need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OCI account with Always Free tier&lt;/strong&gt; — Sign up at &lt;a href="https://cloud.oracle.com" rel="noopener noreferrer"&gt;cloud.oracle.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCI CLI configured&lt;/strong&gt; — Use Cloud Shell (pre-configured) or install locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two A1.Flex instances provisioned&lt;/strong&gt; — Follow the VCN + compute setup from my earlier posts, but create two instances instead of one&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SSH access to both instances&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you do not have the instances yet, provision them with these shapes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Server node&lt;/span&gt;
&lt;span class="nv"&gt;SHAPE_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"ocpus":2,"memoryInGBs":12}'&lt;/span&gt;

&lt;span class="c"&gt;# Agent node (same config)&lt;/span&gt;
&lt;span class="nv"&gt;SHAPE_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"ocpus":2,"memoryInGBs":12}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both must use an &lt;code&gt;aarch64&lt;/code&gt; Oracle Linux 9 image — ARM architecture is critical here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Preparing the Instances
&lt;/h2&gt;

&lt;p&gt;SSH into both instances and run the same preparation steps. OCI's Oracle Linux 9 images have &lt;code&gt;firewalld&lt;/code&gt; and &lt;code&gt;iptables&lt;/code&gt; rules that interfere with Kubernetes networking. We need to handle this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On BOTH nodes&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;dnf update &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Disable firewalld — K3s manages its own iptables rules&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop firewalld
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl disable firewalld

&lt;span class="c"&gt;# Load required kernel modules&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | sudo tee /etc/modules-load.d/k3s.conf
br_netfilter
overlay
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;modprobe br_netfilter
&lt;span class="nb"&gt;sudo &lt;/span&gt;modprobe overlay

&lt;span class="c"&gt;# Set required sysctl parameters&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | sudo tee /etc/sysctl.d/k3s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;sysctl &lt;span class="nt"&gt;--system&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why these specific settings?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;br_netfilter&lt;/strong&gt; — Enables iptables to see bridged traffic (required for pod-to-pod communication across nodes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;overlay&lt;/strong&gt; — Required by the container runtime for overlay filesystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ip_forward&lt;/strong&gt; — Allows the kernel to forward packets between network interfaces (essential for routing traffic to pods)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I spent two hours debugging connectivity issues on my first attempt because I forgot &lt;code&gt;br_netfilter&lt;/code&gt;. Pods on different nodes simply could not talk to each other. The symptom was DNS resolution failures — CoreDNS pods could not reach each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: OCI Security List Configuration
&lt;/h2&gt;

&lt;p&gt;This is where most OCI + Kubernetes guides fall short. The default security list blocks inter-node communication that K3s needs.&lt;/p&gt;

&lt;p&gt;You need these ingress rules on the security list attached to your subnet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update security list with K3s-required ports&lt;/span&gt;
oci network security-list update &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--security-list-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SL_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--ingress-security-rules&lt;/span&gt; &lt;span class="s1"&gt;'[
        {"source":"0.0.0.0/0","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":22,"max":22}}},
        {"source":"0.0.0.0/0","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":80,"max":80}}},
        {"source":"0.0.0.0/0","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":443,"max":443}}},
        {"source":"10.0.0.0/16","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":6443,"max":6443}}},
        {"source":"10.0.0.0/16","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":10250,"max":10250}}},
        {"source":"10.0.0.0/16","protocol":"17",
         "udpOptions":{"destinationPortRange":{"min":8472,"max":8472}}},
        {"source":"10.0.0.0/16","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":2379,"max":2380}}},
        {"source":"0.0.0.0/0","protocol":"6",
         "tcpOptions":{"destinationPortRange":{"min":30000,"max":32767}}}
    ]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--egress-security-rules&lt;/span&gt; &lt;span class="s1"&gt;'[
        {"destination":"0.0.0.0/0","protocol":"all"}
    ]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Port breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;SSH access&lt;/td&gt;
&lt;td&gt;Anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;HTTP ingress&lt;/td&gt;
&lt;td&gt;Anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;HTTPS ingress&lt;/td&gt;
&lt;td&gt;Anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6443&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;K3s API server&lt;/td&gt;
&lt;td&gt;VCN only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10250&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;Kubelet metrics&lt;/td&gt;
&lt;td&gt;VCN only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8472&lt;/td&gt;
&lt;td&gt;UDP&lt;/td&gt;
&lt;td&gt;VXLAN (Flannel CNI)&lt;/td&gt;
&lt;td&gt;VCN only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2379-2380&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;etcd (if HA)&lt;/td&gt;
&lt;td&gt;VCN only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;30000-32767&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;td&gt;NodePort services&lt;/td&gt;
&lt;td&gt;Anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice that internal K3s ports (6443, 10250, 8472) are restricted to the VCN CIDR &lt;code&gt;10.0.0.0/16&lt;/code&gt;. Never expose the Kubernetes API to the internet in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Installing K3s Server
&lt;/h2&gt;

&lt;p&gt;SSH into your first instance (the server node) and install K3s:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the SERVER node&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;INSTALL_K3S_EXEC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"server"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;K3S_NODE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"k3s-server"&lt;/span&gt;

curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | sh &lt;span class="nt"&gt;-s&lt;/span&gt; - &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--write-kubeconfig-mode&lt;/span&gt; 644 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--tls-san&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://169.254.169.254/opc/v1/instance/metadata/public_ip&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--node-external-ip&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://169.254.169.254/opc/v1/instance/metadata/public_ip&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--flannel-iface&lt;/span&gt; enp0s6 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--disable&lt;/span&gt; servicelb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me explain each flag because they all matter on OCI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--write-kubeconfig-mode 644&lt;/code&gt;&lt;/strong&gt; — Makes the kubeconfig readable without sudo. Useful for development but tighten this in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--tls-san &amp;lt;public_ip&amp;gt;&lt;/code&gt;&lt;/strong&gt; — Adds the public IP to the K3s API server's TLS certificate. Without this, &lt;code&gt;kubectl&lt;/code&gt; from your laptop will get TLS errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--node-external-ip &amp;lt;public_ip&amp;gt;&lt;/code&gt;&lt;/strong&gt; — Tells K3s about the node's public IP. OCI instances only see their private IP on the network interface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--flannel-iface enp0s6&lt;/code&gt;&lt;/strong&gt; — Forces Flannel to use the correct network interface. OCI ARM instances use &lt;code&gt;enp0s6&lt;/code&gt; as the primary interface, not &lt;code&gt;eth0&lt;/code&gt;. I discovered this the hard way — Flannel defaulted to the wrong interface and VXLAN tunnels failed silently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--disable servicelb&lt;/code&gt;&lt;/strong&gt; — Disables K3s's built-in load balancer (ServiceLB/Klipper). We will use OCI's Always Free Load Balancer instead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The instance metadata endpoint &lt;code&gt;169.254.169.254&lt;/code&gt; is OCI's equivalent of AWS's metadata service. It returns instance details without needing the OCI CLI.&lt;/p&gt;

&lt;p&gt;Verify the server is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status k3s

&lt;span class="c"&gt;# Check node status&lt;/span&gt;
kubectl get nodes
&lt;span class="c"&gt;# NAME         STATUS   ROLES                  AGE   VERSION&lt;/span&gt;
&lt;span class="c"&gt;# k3s-server   Ready    control-plane,master   45s   v1.31.4+k3s1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grab the join token — the agent node needs this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo cat&lt;/span&gt; /var/lib/rancher/k3s/server/node-token
&lt;span class="c"&gt;# K10xxxx::server:yyyy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Joining the Agent Node
&lt;/h2&gt;

&lt;p&gt;SSH into your second instance and install K3s in agent mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On the AGENT node&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;K3S_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://&amp;lt;SERVER_PRIVATE_IP&amp;gt;:6443"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;K3S_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;TOKEN_FROM_STEP_3&amp;gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;K3S_NODE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"k3s-agent"&lt;/span&gt;

curl &lt;span class="nt"&gt;-sfL&lt;/span&gt; https://get.k3s.io | sh &lt;span class="nt"&gt;-s&lt;/span&gt; - &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--node-external-ip&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://169.254.169.254/opc/v1/instance/metadata/public_ip&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--flannel-iface&lt;/span&gt; enp0s6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important: use the &lt;strong&gt;private IP&lt;/strong&gt; of the server node for &lt;code&gt;K3S_URL&lt;/code&gt;, not the public IP. Both instances are in the same VCN subnet, so they communicate over the private network. This is faster, free (no egress charges), and more secure.&lt;/p&gt;

&lt;p&gt;Back on the server node, verify both nodes are ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; wide
&lt;span class="c"&gt;# NAME         STATUS   ROLES                  AGE     VERSION        INTERNAL-IP   EXTERNAL-IP&lt;/span&gt;
&lt;span class="c"&gt;# k3s-server   Ready    control-plane,master   5m      v1.31.4+k3s1   10.0.1.10     &amp;lt;public&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;# k3s-agent    Ready    &amp;lt;none&amp;gt;                 30s     v1.31.4+k3s1   10.0.1.11     &amp;lt;public&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two nodes. 4 OCPUs. 24GB RAM. Zero dollars.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Configuring the OCI Load Balancer
&lt;/h2&gt;

&lt;p&gt;OCI's Always Free tier includes a 10 Mbps Flexible Load Balancer. We will point it at our K3s nodes to route HTTP/HTTPS traffic to the Traefik ingress controller.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the load balancer&lt;/span&gt;
&lt;span class="nv"&gt;LB_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oci lb load-balancer create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--compartment-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMPARTMENT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--display-name&lt;/span&gt; &lt;span class="s2"&gt;"k3s-ingress-lb"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--shape-name&lt;/span&gt; &lt;span class="s2"&gt;"flexible"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--shape-details&lt;/span&gt; &lt;span class="s1"&gt;'{"minimumBandwidthInMbps":10,"maximumBandwidthInMbps":10}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--subnet-ids&lt;/span&gt; &lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$SUBNET_ID&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--is-private&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'data.id'&lt;/span&gt; &lt;span class="nt"&gt;--raw-output&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--wait-for-state&lt;/span&gt; SUCCEEDED&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create a backend set with health check&lt;/span&gt;
oci lb backend-set create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--load-balancer-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LB_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"k3s-backends"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--policy&lt;/span&gt; &lt;span class="s2"&gt;"ROUND_ROBIN"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--health-checker-protocol&lt;/span&gt; &lt;span class="s2"&gt;"TCP"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--health-checker-port&lt;/span&gt; 80 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--health-checker-interval-in-ms&lt;/span&gt; 10000 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--health-checker-timeout-in-ms&lt;/span&gt; 3000 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--health-checker-retries&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--wait-for-state&lt;/span&gt; SUCCEEDED

&lt;span class="c"&gt;# Add both nodes as backends&lt;/span&gt;
oci lb backend create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--load-balancer-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LB_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--backend-set-name&lt;/span&gt; &lt;span class="s2"&gt;"k3s-backends"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--ip-address&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;SERVER_PRIVATE_IP&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; 80 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--wait-for-state&lt;/span&gt; SUCCEEDED

oci lb backend create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--load-balancer-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LB_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--backend-set-name&lt;/span&gt; &lt;span class="s2"&gt;"k3s-backends"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--ip-address&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;AGENT_PRIVATE_IP&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; 80 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--wait-for-state&lt;/span&gt; SUCCEEDED

&lt;span class="c"&gt;# Create HTTP listener&lt;/span&gt;
oci lb listener create &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--load-balancer-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LB_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"http-listener"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--default-backend-set-name&lt;/span&gt; &lt;span class="s2"&gt;"k3s-backends"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--protocol&lt;/span&gt; &lt;span class="s2"&gt;"HTTP"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--port&lt;/span&gt; 80 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--wait-for-state&lt;/span&gt; SUCCEEDED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 10 Mbps shape is Always Free. It is enough for development, personal projects, and moderate traffic. The load balancer gets its own public IP, which becomes your cluster's entry point.&lt;/p&gt;

&lt;p&gt;Get the load balancer IP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LB_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oci lb load-balancer get &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--load-balancer-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LB_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'data."ip-addresses"[0]."ip-address"'&lt;/span&gt; &lt;span class="nt"&gt;--raw-output&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Load Balancer IP: &lt;/span&gt;&lt;span class="nv"&gt;$LB_IP&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Deploying a Test Workload
&lt;/h2&gt;

&lt;p&gt;Let us deploy something real to verify the entire pipeline works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# nginx-demo.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:alpine&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50m&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;64Mi&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;100m&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;128Mi&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;traefik.ingress.kubernetes.io/router.entrypoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx-demo&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; nginx-demo.yaml

&lt;span class="c"&gt;# Watch the pods come up&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-w&lt;/span&gt;
&lt;span class="c"&gt;# NAME                          READY   STATUS    RESTARTS   AGE&lt;/span&gt;
&lt;span class="c"&gt;# nginx-demo-6d9f7c8b4-abc12   1/1     Running   0          10s&lt;/span&gt;
&lt;span class="c"&gt;# nginx-demo-6d9f7c8b4-def34   1/1     Running   0          10s&lt;/span&gt;
&lt;span class="c"&gt;# nginx-demo-6d9f7c8b4-ghi56   1/1     Running   0          10s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three replicas spread across both nodes. Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://&lt;span class="nv"&gt;$LB_IP&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;html&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Welcome to nginx!&amp;lt;/title&amp;gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic flows: Internet → OCI Load Balancer → Traefik Ingress → nginx pods. All on free infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: Persistent Storage with OCI Block Volumes
&lt;/h2&gt;

&lt;p&gt;K3s includes the &lt;code&gt;local-path&lt;/code&gt; storage provisioner by default, which creates volumes on the node's local disk. For Always Free instances, this works well since we have 200GB of boot volume.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify the storage class exists&lt;/span&gt;
kubectl get storageclass
&lt;span class="c"&gt;# NAME                   PROVISIONER             RECLAIMPOLICY   AGE&lt;/span&gt;
&lt;span class="c"&gt;# local-path (default)   rancher.io/local-path   Delete          10m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it with a PVC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pvc-test.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolumeClaim&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-pvc&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;accessModes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ReadWriteOnce&lt;/span&gt;
  &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-path&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pvc-test&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sh"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;echo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'Persistent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;storage&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;works&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;OCI&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ARM'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/data/test.txt&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cat&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/data/test.txt&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sleep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data&lt;/span&gt;
      &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data&lt;/span&gt;
    &lt;span class="na"&gt;persistentVolumeClaim&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;claimName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-pvc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; pvc-test.yaml
kubectl logs pvc-test
&lt;span class="c"&gt;# Persistent storage works on OCI ARM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production workloads that need data to survive node replacement, consider the OCI CSI driver — but for Always Free instances, local-path is practical and simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster Resource Usage
&lt;/h2&gt;

&lt;p&gt;After deploying K3s with Traefik, CoreDNS, and the test workload, here is what the resource consumption looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl top nodes
&lt;span class="c"&gt;# NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%&lt;/span&gt;
&lt;span class="c"&gt;# k3s-server   180m         9%     1.2Gi           10%&lt;/span&gt;
&lt;span class="c"&gt;# k3s-agent    95m          5%     780Mi           6%&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire Kubernetes infrastructure — control plane, networking, DNS, ingress, and three nginx replicas — uses about 2GB of the available 24GB. That leaves &lt;strong&gt;22GB free for your actual workloads&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For context, here is what fits comfortably:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;CPU&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Fits?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;200m&lt;/td&gt;
&lt;td&gt;512Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;100m&lt;/td&gt;
&lt;td&gt;256Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go API server&lt;/td&gt;
&lt;td&gt;100m&lt;/td&gt;
&lt;td&gt;128Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python Flask app&lt;/td&gt;
&lt;td&gt;200m&lt;/td&gt;
&lt;td&gt;256Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;100m&lt;/td&gt;
&lt;td&gt;256Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;200m&lt;/td&gt;
&lt;td&gt;512Mi&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;900m&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.9Gi&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Easily&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You could run a complete application stack — database, cache, API, monitoring — with room to spare.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common OCI + K3s Issues
&lt;/h2&gt;

&lt;p&gt;I hit every one of these during my setup. Saving you the debugging time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pods stuck in ContainerCreating&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Usually a Flannel networking issue. Check if VXLAN traffic (UDP 8472) is allowed in the security list and verify Flannel is using the correct interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; k3s &lt;span class="nt"&gt;-f&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;flannel
&lt;span class="c"&gt;# If you see "failed to find interface" — fix the --flannel-iface flag&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Agent node shows NotReady&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent cannot reach the server on port 6443. Verify the security list allows TCP 6443 from the VCN CIDR and that you used the private IP in K3S_URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From the agent node&lt;/span&gt;
curl &lt;span class="nt"&gt;-k&lt;/span&gt; https://&amp;lt;SERVER_PRIVATE_IP&amp;gt;:6443
&lt;span class="c"&gt;# Should return JSON (even if it says Unauthorized)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Ingress returns 404 for all routes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traefik is running but not seeing your Ingress resources. Check Traefik logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;-l&lt;/span&gt; app.kubernetes.io/name&lt;span class="o"&gt;=&lt;/span&gt;traefik
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. OCI Load Balancer shows backends as Critical&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Health check is failing. Verify that Traefik is listening on port 80 on both nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ss &lt;span class="nt"&gt;-tlnp&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; :80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Cannot pull container images&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OCI instances need outbound internet access through a NAT gateway or Internet Gateway. Verify your route table has a default route to the Internet Gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Hardening
&lt;/h2&gt;

&lt;p&gt;For a cluster exposed to the internet, apply these minimum security measures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Restrict API server access to your IP&lt;/span&gt;
&lt;span class="c"&gt;# Update security list: change 6443 source from VCN to your specific IP&lt;/span&gt;

&lt;span class="c"&gt;# 2. Create a non-root kubeconfig&lt;/span&gt;
kubectl create serviceaccount deploy-sa
kubectl create clusterrolebinding deploy-sa-binding &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--clusterrole&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;edit &lt;span class="nt"&gt;--serviceaccount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;default:deploy-sa

&lt;span class="c"&gt;# 3. Enable Network Policies&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# 4. Set resource limits on all deployments (prevent noisy neighbors)&lt;/span&gt;
&lt;span class="c"&gt;# 5. Use OCI Vault for Kubernetes secrets (covered in my earlier post)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Accessing kubectl from Your Laptop
&lt;/h2&gt;

&lt;p&gt;Copy the kubeconfig from the server node to your local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# From your laptop&lt;/span&gt;
scp opc@&amp;lt;SERVER_PUBLIC_IP&amp;gt;:/etc/rancher/k3s/k3s.yaml ~/.kube/oci-k3s-config

&lt;span class="c"&gt;# Update the server address from 127.0.0.1 to the public IP&lt;/span&gt;
&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="s2"&gt;"s/127.0.0.1/&amp;lt;SERVER_PUBLIC_IP&amp;gt;/g"&lt;/span&gt; ~/.kube/oci-k3s-config

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KUBECONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/.kube/oci-k3s-config
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because we added &lt;code&gt;--tls-san&lt;/code&gt; with the public IP during installation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Nodes&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCI Always Free + K3s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;24 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EKS (t3.medium x2)&lt;/td&gt;
&lt;td&gt;~$150&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GKE Autopilot (equivalent)&lt;/td&gt;
&lt;td&gt;~$120&lt;/td&gt;
&lt;td&gt;Auto&lt;/td&gt;
&lt;td&gt;Auto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AKS (B2s x2)&lt;/td&gt;
&lt;td&gt;~$65&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DigitalOcean K8s&lt;/td&gt;
&lt;td&gt;~$48&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Civo K3s&lt;/td&gt;
&lt;td&gt;~$40&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OCI gives you 3x the RAM of paid alternatives, for free. The trade-off is that you manage K3s yourself — no managed control plane. For learning, development, and personal projects, that trade-off is excellent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Can You Run on This Cluster?
&lt;/h2&gt;

&lt;p&gt;This is not theoretical. Here are workloads I have tested on this exact setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gitea&lt;/strong&gt; (self-hosted Git) — 128Mi RAM, works perfectly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drone CI&lt;/strong&gt; (CI/CD) — 256Mi RAM, builds containers on ARM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; — 512Mi RAM, handles small-to-medium databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana + Prometheus&lt;/strong&gt; — 768Mi combined, full monitoring stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go/Rust microservices&lt;/strong&gt; — Under 64Mi each, ARM-native builds are fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static sites with Hugo&lt;/strong&gt; — Trivial resources, served through Traefik&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Oracle Cloud's Always Free ARM allocation is the best-kept secret in cloud computing for Kubernetes enthusiasts. 4 OCPUs, 24GB RAM, 200GB storage, a load balancer, and 10TB of outbound transfer — all free, permanently.&lt;/p&gt;

&lt;p&gt;K3s is the perfect match for this hardware. It is lightweight, ARM-native, and production-tested. The combination gives you a Kubernetes cluster that would cost $100-150/month on any other provider.&lt;/p&gt;

&lt;p&gt;The setup takes about 30 minutes from scratch, and the result is a cluster you can use for learning, development, CI/CD, or running personal projects. I have had mine running for weeks with zero issues.&lt;/p&gt;

&lt;p&gt;Stop paying for Kubernetes clusters you use for development. OCI and K3s give you a better option.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All resources in this post use OCI Always Free tier. No charges will be incurred.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;#OracleCloud&lt;/code&gt; &lt;code&gt;#Kubernetes&lt;/code&gt; &lt;code&gt;#K3s&lt;/code&gt; &lt;code&gt;#ARM&lt;/code&gt; &lt;code&gt;#OCI&lt;/code&gt; &lt;code&gt;#AlwaysFree&lt;/code&gt; &lt;code&gt;#CloudNative&lt;/code&gt; &lt;code&gt;#DevOps&lt;/code&gt; &lt;code&gt;#Containers&lt;/code&gt;&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>k3s</category>
      <category>kubernetes</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Why Your Kubernetes Cluster Breaks 18 Minutes After a Successful Deployment</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Sat, 07 Mar 2026 17:29:46 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/why-your-kubernetes-cluster-breaks-18-minutes-after-a-successful-deployment-229p</link>
      <guid>https://dev.to/pavan_madduri/why-your-kubernetes-cluster-breaks-18-minutes-after-a-successful-deployment-229p</guid>
      <description>&lt;p&gt;You merge the Pull Request. The CI/CD pipeline flashes green. ArgoCD reports that your application is "Synced" and "Healthy." You grab a coffee, thinking the deployment was a complete success.&lt;/p&gt;

&lt;p&gt;Then, 18 minutes later, your pager goes off. The cluster is degraded, and users are experiencing errors. What just happened?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Delay of Reactive Monitoring
&lt;/h2&gt;

&lt;p&gt;This scenario is incredibly common in large-scale Kubernetes environments. The problem lies in how GitOps tools handle configuration drift. Tools like ArgoCD use continuous reconciliation loops, constantly comparing your Git manifests against the live cluster resources.&lt;/p&gt;

&lt;p&gt;However, this is a reactive approach. It only discovers problems post-deployment. According to comprehensive production benchmarks (Madduri, 2024), traditional monitoring detects drift an average of 18 minutes after problematic deployments complete.&lt;/p&gt;

&lt;p&gt;For 18 minutes, your system might have been starved of resources, stuck in a circular dependency, or suffering from a security policy breach. In a mission-critical platform, an 18-minute delay means dropped transactions and unhappy users.&lt;br&gt;
(To see the exact performance metrics comparing reactive vs. proactive monitoring, review the full study here: [&lt;a href="https://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ&amp;amp;citation_for_view=au0O-8oAAAAJ:roLk4NBRz8UC" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt;])&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing the 18-Minute Gap
&lt;/h2&gt;

&lt;p&gt;To fix this, we have to stop relying on monitoring tools to catch our mistakes. We need to verify our manifests mathematically during the continuous integration phase, before the deployment ever reaches the cluster.&lt;/p&gt;

&lt;p&gt;By using formal verification, we can construct state transition models to explore every possible failure mode of a manifest. When this proactive approach was tested across 850 production applications, it reduced the mean time to detect drift from 18 minutes down to under 30 seconds. It represents a 36x improvement in detection speed, entirely eliminating the dangerous 18-minute window.&lt;/p&gt;

&lt;p&gt;Stop waiting for your monitoring tools to tell you that your deployment failed. Prove that it will succeed before you ever click merge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Reading &amp;amp; Formal Citation:
&lt;/h2&gt;

&lt;p&gt;The metrics and architectural solutions discussed in this article are drawn from my formal academic research on GitOps stability. If you are building internal developer platforms or CI/CD pipelines, you can cite the original research here:&lt;br&gt;
Madduri, Pavan. "GITOPS &amp;amp; STABILITY: FORMAL VERIFICATION OF ARGOCD MANIFESTS-PREVENTING DEPLOYMENT DRIFT IN MISSION-CRITICAL PLATFORMS." Power System Protection and Control 52, no. 3 (2024): 13-21.&lt;br&gt;
[&lt;a href="https://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ&amp;amp;citation_for_view=au0O-8oAAAAJ:roLk4NBRz8UC" rel="noopener noreferrer"&gt;Google Scholar&lt;/a&gt;] | [&lt;a href="https://www.researchgate.net/publication/401271158_GITOPS_STABILITY_FORMAL_VERIFICATION_OF_ARGOCD_MANIFESTS_-PREVENTING_DEPLOYMENT_DRIFT_IN_MISSION-CRITICAL_PLATFORMS" rel="noopener noreferrer"&gt;ResearchGate&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>gitops</category>
      <category>argocd</category>
      <category>devops</category>
    </item>
    <item>
      <title>Alert Fatigue is Breaking DevOps: Here is the Math</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Mon, 02 Mar 2026 18:24:09 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/alert-fatigue-is-breaking-devops-here-is-the-math-24eg</link>
      <guid>https://dev.to/pavan_madduri/alert-fatigue-is-breaking-devops-here-is-the-math-24eg</guid>
      <description>&lt;p&gt;"The Boy Who Cried Wolf" is the oldest story about monitoring systems ever written. If the alarm goes off every five minutes for a minor issue, eventually, the villagers stop waking up. In the tech industry, we call this &lt;strong&gt;Alert Fatigue&lt;/strong&gt;, and it is quietly destroying DevOps teams from the inside out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Math Behind the Noise
&lt;/h2&gt;

&lt;p&gt;Let’s look at a standard microservices architecture. You might have 50 services, each reporting on CPU, memory, error rates, and latency. That is 200 potential thresholds.&lt;/p&gt;

&lt;p&gt;If you configure your alerts to trigger a Slack notification whenever CPU hits 80%, you are going to get spammed. Why? Because CPU spiking to 80% during a garbage-collection cycle is normal behavior for many Java applications.&lt;/p&gt;

&lt;p&gt;A mid-sized enterprise system easily generates &lt;strong&gt;thousands of alerts per day&lt;/strong&gt;. The human brain is simply not equipped to process a feed of 2,000 notifications and accurately spot the one critical database deadlock hidden in the noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost of Context Switching
&lt;/h2&gt;

&lt;p&gt;The real danger of alert fatigue isn't just missing a critical outage (though that happens frequently). The real danger is the cognitive load on the engineer.&lt;/p&gt;

&lt;p&gt;Every time a Slack notification pings or a pager goes off, a developer’s context is broken. Studies show it takes roughly 23 minutes to get back into a state of deep focus after an interruption. If an on-call engineer receives just three non-critical alerts in an afternoon, their entire day of productive coding is effectively gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Death of the "Static Threshold"
&lt;/h2&gt;

&lt;p&gt;The reason we suffer from alert fatigue is that we rely on static, dumb thresholds. We tell our systems: "If X &amp;gt; 80, send an email." To fix this, the industry is moving toward contextual, AI-driven monitoring. Instead of sending an alert when a single metric spikes, modern systems use machine learning to look at the entire environment. They ask: "CPU is at 80%, but are users actually experiencing errors? If not, suppress the alert."&lt;/p&gt;

&lt;p&gt;We need to stop sending humans raw data, and start sending them actual, actionable context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cite this research:&lt;/strong&gt;&lt;br&gt;
I recently conducted a study across three production environments supporting 2.8 million users, demonstrating how replacing static alerts with autonomous AI agents reduced Mean Time to Detection (MTTD) to under 60 seconds. Cite the formal research here:&lt;br&gt;
Madduri, P. (2026). "Agentic SRE Teams: Human-Agent Collaboration - A New Operational Model for Autonomous Incident Response." Power System Protection and Control, 54(1).&lt;br&gt;
[&lt;a href="https://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ&amp;amp;citation_for_view=au0O-8oAAAAJ:UeHWp8X0CEIC" rel="noopener noreferrer"&gt;Link to Google Scholar&lt;/a&gt;] | [&lt;a href="https://www.researchgate.net/publication/401333715_AGENTIC_SRE_TEAMS_HUMAN-AGENT_COLLABORATION_-A_NEW_OPERATIONAL_MODEL_FOR_AUTONOMOUS_INCIDENT_RESPONSE" rel="noopener noreferrer"&gt;Link to ResearchGate PDF&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>mentalhealth</category>
      <category>observability</category>
    </item>
    <item>
      <title>What is an AI Agent? (And Why SREs Need Them)</title>
      <dc:creator>Pavan Madduri</dc:creator>
      <pubDate>Mon, 02 Mar 2026 18:13:31 +0000</pubDate>
      <link>https://dev.to/pavan_madduri/what-is-an-ai-agent-and-why-sres-need-them-3ec2</link>
      <guid>https://dev.to/pavan_madduri/what-is-an-ai-agent-and-why-sres-need-them-3ec2</guid>
      <description>&lt;p&gt;If you spend any time on Tech Twitter or LinkedIn, you are probably drowning in the phrase "AI Agents." But if you strip away the marketing hype, what actually is an AI agent, and how is it different from just asking ChatGPT a question?&lt;/p&gt;

&lt;p&gt;If you work in Site Reliability Engineering (SRE) or platform engineering, understanding this difference is going to define the next five years of your career.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chatbots vs. Agents: The "Agency" Difference&lt;/strong&gt;&lt;br&gt;
A standard Large Language Model (LLM) like ChatGPT is a &lt;strong&gt;generator&lt;/strong&gt;. You give it a prompt, and it generates text. It is entirely passive. It doesn't know what time it is, it can't check your database, and it certainly can't restart a crashed Kubernetes pod.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;AI Agent&lt;/strong&gt;, on the other hand, has agency.&lt;/p&gt;

&lt;p&gt;An agent is an LLM wrapped in a framework that allows it to interact with the outside world. It operates on a continuous loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Observe&lt;/strong&gt;: It pulls real-time data from its environment (e.g., reading a Datadog alert or a Prometheus metric).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reason&lt;/strong&gt;: It uses the LLM "brain" to analyze that data and decide what to do next.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act&lt;/strong&gt;: It uses "Tools" (APIs, scripts, CLI commands) to take a real-world action (e.g., querying a database to see if a table is locked).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why Do SREs Need Them?&lt;/strong&gt;&lt;br&gt;
Imagine it is 3:00 AM and you get PagerDuty alert: CPU Spike on Payment Service.&lt;/p&gt;

&lt;p&gt;Without an agent, you drag yourself out of bed, open four different dashboards, write three different log queries, and spend 20 minutes just trying to figure out what is broken before you even try to fix it.&lt;/p&gt;

&lt;p&gt;An AI agent acts as your junior SRE. By the time you open your laptop, the agent has already:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acknowledged the alert.&lt;/li&gt;
&lt;li&gt;Queried the logs for the last 10 minutes.&lt;/li&gt;
&lt;li&gt;Checked the recent Git commits to see who deployed code last.&lt;/li&gt;
&lt;li&gt;Summarized all of this into a neat, three-bullet-point summary waiting for you in Slack.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents don't replace SREs; they replace the boring, repetitive data-gathering tasks that burn SREs out. They do the digging, so humans can do the deciding.&lt;/p&gt;

&lt;p&gt;Cite this research:&lt;br&gt;
If you are building AIOps tools or researching AI in operations, you can cite my recent production benchmarks on how AI agents can autonomously resolve 67% of common incidents:&lt;br&gt;
Madduri, P. (2026). "Agentic SRE Teams: Human-Agent Collaboration - A New Operational Model for Autonomous Incident Response." Power System Protection and Control, 54(1).&lt;br&gt;
[&lt;a href="https://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;user=au0O-8oAAAAJ&amp;amp;citation_for_view=au0O-8oAAAAJ:UeHWp8X0CEIC" rel="noopener noreferrer"&gt;Link to Google Scholar&lt;/a&gt;] | [&lt;a href="https://www.researchgate.net/publication/401333715_AGENTIC_SRE_TEAMS_HUMAN-AGENT_COLLABORATION_-A_NEW_OPERATIONAL_MODEL_FOR_AUTONOMOUS_INCIDENT_RESPONSE" rel="noopener noreferrer"&gt;Link to ResearchGate PDF&lt;/a&gt;]&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>devops</category>
      <category>aiops</category>
    </item>
  </channel>
</rss>
