<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ogonna Nnamani</title>
    <description>The latest articles on DEV Community by Ogonna Nnamani (@cloudiepad).</description>
    <link>https://dev.to/cloudiepad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1373520%2F7ef2a204-c7c4-462f-abe3-7b045f39d4b8.jpeg</url>
      <title>DEV Community: Ogonna Nnamani</title>
      <link>https://dev.to/cloudiepad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cloudiepad"/>
    <language>en</language>
    <item>
      <title>Self-Hosted GitHub Runners on GKE: My $800/Month Mistake That Led to a Better Solution</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Wed, 03 Sep 2025 16:41:12 +0000</pubDate>
      <link>https://dev.to/cloudiepad/self-hosted-github-runners-on-gke-my-800month-mistake-that-led-to-a-better-solution-1nk4</link>
      <guid>https://dev.to/cloudiepad/self-hosted-github-runners-on-gke-my-800month-mistake-that-led-to-a-better-solution-1nk4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16dxb45ux8x4zi0szx35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16dxb45ux8x4zi0szx35.png" alt="Hero - Image"&gt;&lt;/a&gt;&lt;br&gt;
So, It's 3 PM on a Friday, your team is trying to push a critical hotfix, and GitHub Actions decides to put your build in a queue. For 15 minutes. Then 20 minutes. Your deployment window is closing, the stakeholders are breathing down your neck, and you're watching your GitHub Actions bill climb past $800 for the month.&lt;/p&gt;

&lt;p&gt;That was my reality six months ago. And like most problems that keep you awake at 2 AM, it started small and innocent.&lt;/p&gt;
&lt;h2&gt;
  
  
  The $800 Problem That Kept Getting Worse
&lt;/h2&gt;

&lt;p&gt;It began innocently enough. Our team grew from 3 to 15 developers, our deployment frequency increased, and suddenly our GitHub Actions usage exploded. What started as a manageable $100/month became $400, then $600, then crossed the dreaded $800 mark.&lt;/p&gt;

&lt;p&gt;But the cost wasn't even the worst part. The worst part was the waiting.&lt;/p&gt;

&lt;p&gt;During deployment rushes, builds would queue for 10-15 minutes. Developers would start their builds, then go grab coffee, chat with colleagues, or worse – start working on something else entirely, breaking their flow. Our feedback loops became molasses-slow, and productivity plummeted.&lt;/p&gt;

&lt;p&gt;I'd sit there watching the queue, thinking: "There has to be a better way."&lt;/p&gt;

&lt;p&gt;Spoiler alert: There was. And it involved making some spectacular mistakes along the way.&lt;/p&gt;
&lt;h2&gt;
  
  
  Enter Actions Runner Controller (ARC): The Light at the End of the Tunnel
&lt;/h2&gt;

&lt;p&gt;After countless late nights researching alternatives, I stumbled upon Actions Runner Controller (ARC). Think of it as having a smart assistant who only hires contractors when there's work to do, then sends them home when they're done.&lt;/p&gt;

&lt;p&gt;Traditional GitHub runners are like having a full-time employee sitting at their desk 24/7, even when there's no work. ARC creates &lt;strong&gt;ephemeral pods&lt;/strong&gt; – containers that materialize when a job arrives, do their work, and vanish when complete. It's beautiful in its simplicity.&lt;/p&gt;

&lt;p&gt;The promise was tantalizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scale from 0 to 100+ runners instantly&lt;/li&gt;
&lt;li&gt;Pay only for compute you actually use
&lt;/li&gt;
&lt;li&gt;Never wait in GitHub's queue again&lt;/li&gt;
&lt;li&gt;Full control over the runtime environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, getting there required navigating through my usual minefield of spectacular failures.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2: Installing the ARC Controller (And My First Epic Fail)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Mistake #1: The Great Firewall Fiasco
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37xkb18z159jrdftoodw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37xkb18z159jrdftoodw.png" alt="ARC FLOW"&gt;&lt;/a&gt;&lt;br&gt;
My first attempt at installing ARC was... educational. I spent an entire weekend setting everything up perfectly, only to find that GitHub couldn't talk to my cluster. Webhooks were failing, runners weren't registering, and I was questioning my life choices.&lt;/p&gt;

&lt;p&gt;The culprit? I'd forgotten to configure the firewall to allow GitHub's webhook IP ranges. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Face, meet palm.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This taught me my first crucial lesson: &lt;strong&gt;networking isn't an afterthought&lt;/strong&gt;. When you're dealing with webhooks, your cluster needs to be accessible from the internet, and GitHub needs to be able to reach your ARC controller.&lt;/p&gt;
&lt;h3&gt;
  
  
  How ARC Actually Communicates
&lt;/h3&gt;

&lt;p&gt;Here's what I learned about the communication flow the hard way:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GitHub → ARC Controller&lt;/strong&gt;: GitHub sends webhook events when workflows are triggered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ARC Controller → Kubernetes API&lt;/strong&gt;: Creates/deletes runner pods based on job queue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runner Pods → GitHub&lt;/strong&gt;: Self-register and poll for jobs to execute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runner Pods → ARC Controller&lt;/strong&gt;: Report status and job completion&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The critical insight: &lt;strong&gt;GitHub initiates the conversation&lt;/strong&gt;. Your cluster must be reachable from GitHub's servers, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Firewall rules allowing GitHub's webhook IPs&lt;/li&gt;
&lt;li&gt;Load balancer exposing the ARC webhook endpoint&lt;/li&gt;
&lt;li&gt;Proper DNS configuration for webhook URLs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  What Actually Worked
&lt;/h3&gt;

&lt;p&gt;After my networking debacle, here's the proper setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate Manager First&lt;/strong&gt;&lt;br&gt;
I installed cert-manager to handle SSL certificates automatically. This ensures secure communication between GitHub and our cluster - because webhooks over HTTP are a security nightmare.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ARC Controller Installation&lt;/strong&gt;&lt;br&gt;
The ARC controller gets installed in its own namespace (&lt;code&gt;arc-systems&lt;/code&gt;) and acts as the orchestrator. It's essentially a Kubernetes operator that watches GitHub webhook events and translates them into pod creation/deletion actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing Connectivity&lt;/strong&gt;&lt;br&gt;
Before proceeding, I learned to test the webhook endpoint thoroughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify external IP accessibility
&lt;/li&gt;
&lt;li&gt;Test SSL certificate validity&lt;/li&gt;
&lt;li&gt;Confirm GitHub can reach the webhook URL&lt;/li&gt;
&lt;li&gt;Monitor webhook delivery logs&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Building the Foundation: GKE Cluster Setup
&lt;/h2&gt;

&lt;p&gt;Once I'd conquered the networking nightmare, I focused on building a solid foundation. The key decisions that made or broke the implementation:&lt;/p&gt;
&lt;h3&gt;
  
  
  The Spot Instance Gamble
&lt;/h3&gt;

&lt;p&gt;I made a bold choice: run everything on Google Cloud Spot instances. These preemptible nodes can disappear with just 30 seconds notice, but they cost 60-70% less than regular instances.&lt;/p&gt;

&lt;p&gt;"This will either be brilliant or catastrophic," I thought.&lt;/p&gt;

&lt;p&gt;Turns out, it was brilliant. ARC handles preemptions gracefully – if a node gets terminated, pods simply reschedule elsewhere. The cost savings were immediate and substantial.&lt;/p&gt;
&lt;h3&gt;
  
  
  Storage: Where I Learned About Shared State
&lt;/h3&gt;

&lt;p&gt;Initially, I ignored storage entirely. "How hard can it be?" I thought. &lt;/p&gt;

&lt;p&gt;Very hard, as it turns out.&lt;/p&gt;

&lt;p&gt;Without shared storage for build caches, every job started from scratch. Build times were actually &lt;em&gt;slower&lt;/em&gt; than GitHub's hosted runners. My "optimization" had made things worse.&lt;/p&gt;

&lt;p&gt;The solution involved integrating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Filestore&lt;/strong&gt;: 250GB shared volume for build caches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart cache organization&lt;/strong&gt;: Structured by repository and branch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Suddenly, build times dropped by 40%. Sometimes the obvious solutions are obvious for a reason.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3: Building Custom Images (Learning Docker the Hard Way)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Mistake #2: The 8GB Image Monster
&lt;/h3&gt;

&lt;p&gt;My first custom runner image was... ambitious. I threw everything I could think of into it: multiple Node.js versions, Python 2 and 3, every CLI tool I'd ever heard of, and enough packages to power a small data center.&lt;/p&gt;

&lt;p&gt;The result? An 8GB monster that took 15 minutes to pull on each pod creation.&lt;/p&gt;

&lt;p&gt;Watching developers wait 15 minutes just to start their build was painful. I quickly learned that &lt;strong&gt;in the world of ephemeral pods, image size directly impacts developer happiness&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Right Approach to Custom Images
&lt;/h3&gt;

&lt;p&gt;After several iterations, I developed a strategy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xdryam9wsj9nioc9lll.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xdryam9wsj9nioc9lll.png" alt="docker custom"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Base Image Philosophy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ubuntu-22-04&lt;/strong&gt;: Lean base with Node.js, Python, and essential tools only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ubuntu-22-04-infra&lt;/strong&gt;: Infrastructure-focused with Terraform, kubectl, and cloud CLIs
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ubuntu-22-04-qa&lt;/strong&gt;: Testing-focused with Selenium, browsers, and test frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Size Optimization Lessons&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-stage builds to eliminate build dependencies&lt;/li&gt;
&lt;li&gt;Careful package selection (do you really need that 500MB SDK?)&lt;/li&gt;
&lt;li&gt;Layer optimization to maximize Docker cache hits&lt;/li&gt;
&lt;li&gt;Regular cleanup of apt caches and temp files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Sweet Spot&lt;/strong&gt;&lt;br&gt;
My optimized images clock in at 1.5-2GB and pull in under 60 seconds. The difference in developer experience is night and day.&lt;/p&gt;

&lt;p&gt;The results were dramatic: job setup time dropped from 3-4 minutes to under 30 seconds.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 4: Configuring Ephemeral Runners (Pod Lifecycle Mysteries)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Understanding the Magic
&lt;/h3&gt;

&lt;p&gt;Each runner pod is ephemeral, meaning it has a complete lifecycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Creation&lt;/strong&gt;: ARC sees a queued job and creates a pod&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Registration&lt;/strong&gt;: Pod starts up and registers with GitHub
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt;: Receives and executes the workflow job&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cleanup&lt;/strong&gt;: Job completes, pod reports back, and gets deleted&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Container Architecture
&lt;/h3&gt;

&lt;p&gt;Each runner pod actually runs two containers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Main runner container&lt;/strong&gt;: Executes the GitHub Actions workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker-in-Docker (DinD) sidecar&lt;/strong&gt;: Handles container builds securely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture provides isolation while enabling Docker builds – crucial for most modern CI/CD workflows.&lt;/p&gt;
&lt;h3&gt;
  
  
  Mistake #3: The Docker-in-Docker Discovery
&lt;/h3&gt;

&lt;p&gt;My next challenge was handling Docker builds within the runners. My initial approach was to mount the Docker socket from the host into the pods.&lt;/p&gt;

&lt;p&gt;This worked beautifully in testing. In production? Not so much.&lt;/p&gt;

&lt;p&gt;Security-wise, it was equivalent to giving every job root access to the host. One badly configured job could potentially compromise the entire node.&lt;/p&gt;

&lt;p&gt;The better approach: &lt;strong&gt;Docker-in-Docker (DinD)&lt;/strong&gt;. This provided isolation while enabling Docker builds. No more security nightmares, no more compromised nodes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Mistake #4: The Resource Allocation Disaster
&lt;/h2&gt;

&lt;p&gt;Confident in my progress, I deployed to production with minimal resource limits. "Let Kubernetes figure it out," I thought.&lt;/p&gt;

&lt;p&gt;Bad idea.&lt;/p&gt;

&lt;p&gt;Jobs started failing mysteriously. Pods were getting OOMKilled. The cluster was thrashing under memory pressure. I'd created a resource contention nightmare.&lt;/p&gt;

&lt;p&gt;The solution required careful tuning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU&lt;/strong&gt;: 1 core request, 4 core limit per runner&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: 1GB request, 2GB limit
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: Shared cache access for all runners&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pro tip: Always set resource requests and limits. Kubernetes is smart, but it's not psychic.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Scaling Sweet Spot
&lt;/h2&gt;

&lt;p&gt;After months of tuning, I found our ideal scaling configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimum runners&lt;/strong&gt;: 1 (always ready for immediate pickup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum runners&lt;/strong&gt;: 100 (handles our largest deployment batches)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale-to-zero&lt;/strong&gt;: Pods disappear when not needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gave us the best of both worlds: instant job pickup for small changes, and massive parallel capacity for large deployments.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Results: When Everything Finally Clicks
&lt;/h2&gt;

&lt;p&gt;Six months later, the transformation has been remarkable:&lt;/p&gt;
&lt;h3&gt;
  
  
  Cost Victory 💰
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Monthly CI/CD costs: $800+ → $200&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;70% reduction&lt;/strong&gt; in infrastructure spend&lt;/li&gt;
&lt;li&gt;Predictable costs with no surprise overages&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Performance Revolution 🚀
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Job queue time: 15 minutes → 30 seconds&lt;/li&gt;
&lt;li&gt;Build speed: 40% faster due to effective caching&lt;/li&gt;
&lt;li&gt;Deployment reliability: Near-zero failures due to resource constraints&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Developer Happiness 😊
&lt;/h3&gt;

&lt;p&gt;The real win? Developers stopped complaining about builds. Feedback loops became fast again. People could stay in flow instead of context-switching while waiting for deployments.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Antifragile System
&lt;/h2&gt;

&lt;p&gt;What we built isn't just cost-effective – it's antifragile. When traffic spikes hit, it scales up. When nodes get preempted, pods reschedule. When builds fail, we have granular logs to diagnose issues quickly.&lt;/p&gt;

&lt;p&gt;Each failure along the way taught us something valuable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The firewall issue taught us to plan networking carefully&lt;/li&gt;
&lt;li&gt;The storage problems taught us the importance of shared state&lt;/li&gt;
&lt;li&gt;The resource disasters taught us the value of proper limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every broken deployment made the system stronger.&lt;/p&gt;
&lt;h2&gt;
  
  
  Should You Make the Jump?
&lt;/h2&gt;

&lt;p&gt;If you're spending $500+ monthly on GitHub Actions and dealing with queue times, self-hosted runners on GKE might be your answer. But go in with realistic expectations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial setup time&lt;/strong&gt;: 2-3 weeks for a robust implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning curve&lt;/strong&gt;: Steep if you're new to Kubernetes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ongoing maintenance&lt;/strong&gt;: You're now responsible for infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost savings&lt;/strong&gt;: Significant, but not immediate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start simple. Get basic runners working, then add complexity gradually. Monitor everything. And don't be afraid to fail – each failure teaches you something you couldn't learn any other way.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Six months ago, I was paying $800/month to wait in GitHub's queue. Today, I'm paying $200/month for instant deployments and custom runtime environments.&lt;/p&gt;

&lt;p&gt;Sometimes the best solutions come from the problems that annoy you most.&lt;/p&gt;
&lt;h2&gt;
  
  
  If you enjoyed reading this, connect with me on &lt;a href="https://www.linkedin.com/in/ogonna-nnamani" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Building your own CI/CD infrastructure? I'd love to hear about your journey and the spectacular failures along the way. They make the best stories.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/your-username" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Follow me for more CI/CD war stories and Kubernetes adventures!&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>githubactions</category>
      <category>devops</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Hello, I am a DevOps Engineer and I Broke Production Today</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Wed, 23 Jul 2025 15:27:43 +0000</pubDate>
      <link>https://dev.to/cloudiepad/hello-i-am-a-devops-engineer-and-i-broke-production-today-3aop</link>
      <guid>https://dev.to/cloudiepad/hello-i-am-a-devops-engineer-and-i-broke-production-today-3aop</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfbdsf4oxqp5wricrv8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfbdsf4oxqp5wricrv8b.png" alt="Production Down Alert" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Beauty of Failure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: The failure lasted no more than 4 minutes, and I quickly reverted to a previous stable version. But here's the thing — I tweeted about it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F578h1441j211s2msw26o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F578h1441j211s2msw26o.jpg" alt="The Tweet" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That simple tweet made me realize some things that I wasn't quite prepared for. In the comment section, I encountered four distinct types of people:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Those who found it hilarious&lt;/strong&gt; — Fellow engineers sharing their own war stories.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Those who didn't believe me.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Those who immediately started suggesting fixes&lt;/strong&gt; — Those who couldn't help but jump into troubleshooting mode.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Those who thought it meant I was just incompetent.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But here's what that interaction taught me: there's raw beauty in admitting that you can fail and that it's absolutely okay to fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Failure Really Makes You
&lt;/h2&gt;

&lt;p&gt;Failure doesn't make you incompetent—it makes you &lt;strong&gt;experienced&lt;/strong&gt;. Each mistake adds a line of rank to your experience bar, a badge that says "I've been there, I've survived it, and I know how to handle it next time." More importantly, it removes the burden of claiming to know everything.&lt;/p&gt;

&lt;p&gt;Spoiler alert: you never will, and that's perfectly fine.&lt;/p&gt;

&lt;p&gt;Let me share some of my greatest hits in the failure department.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Great Email Blackout of Monday Morning
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnair3zv7tnepv0wkn4dg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnair3zv7tnepv0wkn4dg.png" alt="DNS EMAIL" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;When you forget to migrate nameservers and an entire company loses email access&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Picture this: I'm performing an AWS cross-account migration for a major Oil &amp;amp; Gas company. Everything is going smoothly until the DNS migration phase. In my meticulous planning, I managed to overlook one tiny detail — migrating the nameservers.&lt;/p&gt;

&lt;p&gt;Monday morning arrives, and suddenly an entire company wakes up to find themselves locked out of their emails. For over five hours. On a Monday. In the oil and gas industry.&lt;/p&gt;

&lt;p&gt;The phone calls were… let's just say they were intense. But that failure taught me more about DNS propagation, backup communication channels, and the critical importance of testing every single component of a migration than any certification course ever could. Sometimes the most expensive lessons are the most valuable ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Database Credential Catastrophe
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38oafc3q07zlcxjspayd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38oafc3q07zlcxjspayd.png" alt="Database swap" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The horror of realizing your production app is talking to staging database&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then there was the time I pushed what I thought was a simple fix. An old branch from my git repository deployed to production and decided to pick up the staging database credentials, completely replacing the production database credentials.&lt;/p&gt;

&lt;p&gt;Our production application was suddenly trying to connect to our staging database. The irony wasn't lost on me — I had created the perfect test of our monitoring systems, just not intentionally.&lt;/p&gt;

&lt;p&gt;That incident changed how I approached environment isolation. Now I have strict compliance checks before any PR is merged, proper credential management, and multiple validation layers. That "simple fix" became the catalyst for implementing some of our most robust security practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kubernetes Scheduling Nightmare
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi2sc0t7kn7qz3xy1jgq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi2sc0t7kn7qz3xy1jgq.jpg" alt="k8s scheduling nightmare" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More recently, I pushed a fix and mistakenly changed the annotations of our self-hosted GitHub runners. Suddenly, our pods couldn't schedule on our node pools because they had a &lt;code&gt;nodeSelector&lt;/code&gt; rule that no longer matched.&lt;/p&gt;

&lt;p&gt;Our entire CI/CD pipeline ground to a halt. Developers couldn't deploy. The build queue started backing up like traffic on a Friday afternoon.&lt;/p&gt;

&lt;p&gt;Each of these failures taught me something invaluable that I couldn't have learned any other way. The list is endless, honestly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Road to Antifragility
&lt;/h2&gt;

&lt;p&gt;The road to antifragility is a continuous process. What happens after you break production is remarkably similar to a murder case investigation — this is why you need to build systems that anticipate these trying times. You need evidence, you need witnesses, you need to reconstruct the timeline, and you need to understand what went wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your Detective Toolkit
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8snn194dp94zc8oxdlg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8snn194dp94zc8oxdlg.jpg" alt="Detective Toolkit" width="800" height="634"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The essential tools for investigating production failures&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Granular logs become your crime scene evidence.&lt;/strong&gt; They tell you exactly what happened, when it happened, and in what sequence. Without them, you're investigating a case blindfolded. I can't stress this enough — log everything that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comprehensive metrics are your witnesses.&lt;/strong&gt; They saw everything unfold in real-time and can testify to the state of your system at any given moment. Tools like CloudWatch, Prometheus, and Grafana have become my best friends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A similar test environment is your crime lab.&lt;/strong&gt; It's where you can safely recreate the incident, test your theories, and validate your fixes without risking further damage to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear post-mortems are your case files.&lt;/strong&gt; They document not just what went wrong, but why it went wrong, what you learned, and how you're preventing it from happening again. Write them like your future self will thank you for it.&lt;/p&gt;

&lt;p&gt;These are your detective tools when all hell breaks loose. And trust me, hell will break loose — it's not a matter of if, but when.&lt;/p&gt;

&lt;h2&gt;
  
  
  Every Failure is a Lesson
&lt;/h2&gt;

&lt;p&gt;Every failure is a lesson. Some lessons are more expensive than others, but the process invariably makes us better engineers. The engineer who has never broken production is either lying, hasn't been doing this long enough, or isn't pushing boundaries hard enough to drive real innovation.&lt;/p&gt;

&lt;p&gt;The most senior engineers I know aren't the ones who never make mistakes — they're the ones who've made the most mistakes, learned from them, and built systems resilient enough to handle future failures gracefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embracing the Inevitable
&lt;/h2&gt;

&lt;p&gt;So here's to failure. Here's to the 3 AM phone calls, the sweaty palms during incident response, and the wisdom we gain from each crash.&lt;/p&gt;

&lt;p&gt;Because at the end of the day, our failures don't define our incompetence — they define our experience.&lt;/p&gt;

&lt;p&gt;Once again, my name is Ogonna, I am a DevOps Engineer, and I broke production today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What did you do today?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you enjoyed this story of production failures and lessons learned, follow me for more DevOps insights and real-world experiences. Also connect with me on &lt;a href="https://linkedin.com/in/ogonna-nnamani" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for more behind-the-scenes DevOps stories.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have your own production failure story?&lt;/strong&gt; Share it in the comments — let's normalize talking about our failures and learning from each other!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>career</category>
      <category>failure</category>
      <category>postmortem</category>
    </item>
    <item>
      <title>Securing Your Internal Tools: Implementing Identity-Aware Proxy (IAP) for GKE Resources with CDKTF</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Tue, 22 Jul 2025 10:47:39 +0000</pubDate>
      <link>https://dev.to/cloudiepad/securing-your-internal-tools-implementing-identity-aware-proxy-iap-for-gke-resources-with-cdktf-gm5</link>
      <guid>https://dev.to/cloudiepad/securing-your-internal-tools-implementing-identity-aware-proxy-iap-for-gke-resources-with-cdktf-gm5</guid>
      <description>&lt;p&gt;Hello, Today I want to share something that's become increasingly critical in our cloud-native world — securing internal tools and dashboards without the complexity of traditional VPN setups.&lt;/p&gt;

&lt;p&gt;Picture this: Your company has grown from a small startup to a mid-sized organization. You have internal dashboards, monitoring tools, admin panels, and various services running on Google Kubernetes Engine (GKE). Initially, maybe you secured these with basic auth or just left them on internal networks. But as your team grows and remote work becomes more common, you realize you need something more robust, more scalable, and frankly, more professional.&lt;/p&gt;

&lt;p&gt;That's where Google's Identity-Aware Proxy (IAP) comes in, and today I'll walk you through implementing it using Infrastructure as Code with CDKTF.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is IAP and Why Should You Care?
&lt;/h2&gt;

&lt;p&gt;Identity-Aware Proxy (IAP) is Google Cloud's solution to the age-old problem of "how do I securely control access to my applications?" Think of IAP as a sophisticated bouncer at an exclusive club — it checks not just if you have a ticket (authentication), but also if you're on the guest list for that specific event (authorization).&lt;/p&gt;

&lt;p&gt;Here's the beautiful part: IAP sits between your users and your applications, handling all the authentication and authorization logic without you having to modify your applications. It integrates seamlessly with Google's identity systems, supports your corporate Google Workspace accounts, and can enforce granular access controls based on user attributes, device security status, and more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why IAP is a Game-Changer
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Zero Trust Security Model&lt;/strong&gt;: IAP doesn't trust anyone by default, not even users inside your corporate network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No VPN Complexity&lt;/strong&gt;: Users can access internal tools from anywhere with just their corporate Google account&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Granular Access Control&lt;/strong&gt;: You can control who accesses what, when, and from which devices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit Trail&lt;/strong&gt;: Every access attempt is logged, giving you complete visibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration with Google Workspace&lt;/strong&gt;: Leverage your existing Google accounts and groups&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Understanding CDKTF: Infrastructure as Code for the Modern Age
&lt;/h2&gt;

&lt;p&gt;Before we dive into the implementation, let's talk about CDKTF — Cloud Development Kit for Terraform. If you've worked with traditional Terraform, you know it uses HCL (HashiCorp Configuration Language). While HCL is powerful, it can feel limiting when you need complex logic, loops, or want to leverage your existing programming skills.&lt;/p&gt;

&lt;p&gt;CDKTF bridges this gap by allowing you to define your infrastructure using familiar programming languages like TypeScript, Python, Java, C#, or Go. For this article, we'll use TypeScript because of its excellent type safety and IntelliSense support.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why CDKTF with TypeScript?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type Safety&lt;/strong&gt;: Catch configuration errors at compile time, not deployment time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Reusability&lt;/strong&gt;: Create functions, classes, and modules to reduce duplication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IDE Support&lt;/strong&gt;: Full IntelliSense, autocomplete, and refactoring capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Familiar Syntax&lt;/strong&gt;: If you know TypeScript/JavaScript, you're already halfway there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing&lt;/strong&gt;: Unit test your infrastructure code just like application code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it this way: traditional Terraform is like writing configuration files, while CDKTF is like writing a program that generates those configuration files. The end result is the same, but the development experience is significantly better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backend Services vs. Backend Configs: The Kubernetes Ingress Story
&lt;/h2&gt;

&lt;p&gt;Before we implement IAP, it's crucial to understand the difference between Backend Services and Backend Configs in the GKE context — this tripped me up when I first started working with GKE ingress.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backend Services
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Backend Service&lt;/strong&gt; is a Google Cloud resource that defines how traffic should be distributed to your backend instances (in our case, Kubernetes pods). It's part of Google's load balancing infrastructure and handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Health checks&lt;/li&gt;
&lt;li&gt;Load balancing algorithms&lt;/li&gt;
&lt;li&gt;Session affinity&lt;/li&gt;
&lt;li&gt;Traffic distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you create a Kubernetes Service and expose it through an Ingress, GKE automatically creates a corresponding Backend Service in Google Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backend Configs
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Backend Config&lt;/strong&gt; is a Kubernetes Custom Resource Definition (CRD) that allows you to customize the behavior of the automatically created Backend Services. Think of it as a way to tell GKE: "When you create the Backend Service for my Kubernetes Service, please apply these additional configurations."&lt;/p&gt;

&lt;p&gt;Backend Configs can control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Connection draining timeouts&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Session affinity settings&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom request/response headers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security policies&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;And most importantly for us — IAP settings&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight here is that Backend Configs are Kubernetes resources that influence Google Cloud Backend Services. It's GKE's way of bridging Kubernetes-native configuration with Google Cloud's load balancing features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Implementing IAP for a SonarQube Instance
&lt;/h2&gt;

&lt;p&gt;Now let's get our hands dirty. We'll implement IAP for a SonarQube instance — a popular code quality and security analysis tool that's perfect for demonstrating internal tool security.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A GKE cluster&lt;/li&gt;
&lt;li&gt;CDKTF installed and configured&lt;/li&gt;
&lt;li&gt;Google Cloud project with appropriate permissions&lt;/li&gt;
&lt;li&gt;Basic understanding of Kubernetes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Project Setup and Service Enablement
&lt;/h3&gt;

&lt;p&gt;First, we need to enable the IAP API. This is crucial because it also creates an IAP service agent that we'll need later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;enableServices&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@examplecompany/iac&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Enable Google IAP Service&lt;/span&gt;
&lt;span class="nf"&gt;enableServices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;iap.googleapis.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What's happening here? The &lt;code&gt;enableServices&lt;/code&gt; function is a helper that enables Google Cloud APIs in your project. When you enable the IAP API (&lt;code&gt;iap.googleapis.com&lt;/code&gt;), Google automatically creates a special service account called the "IAP service agent." This service account is what IAP uses behind the scenes to communicate with your applications. Think of it as giving IAP the keys to act on behalf of your project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create OAuth Credentials
&lt;/h3&gt;

&lt;p&gt;Before we can use IAP, we need OAuth 2.0 credentials. Head to the Google Cloud Console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to &lt;strong&gt;APIs &amp;amp; Services &amp;gt; Credentials&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create Credentials &amp;gt; OAuth 2.0 Client IDs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;Web application&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add your domain to &lt;strong&gt;Authorized redirect URIs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Note down the Client ID and Client Secret&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We'll store these as environment variables for security:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IAP_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-client-id-here"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IAP_CLIENT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-client-secret-here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Create the Helper Function
&lt;/h3&gt;

&lt;p&gt;Let's create a reusable function for IAP implementation. This is where the magic happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Manifest&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cdktf/provider-kubernetes/lib/manifest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;IapConfigProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;backendConfigName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;oauthSecretName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createIapResources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;IapConfigProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Validate credentials are provided&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;IAP_CLIENT_ID and IAP_CLIENT_SECRET must be set&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Create Kubernetes Secret for OAuth credentials&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;iap-oauth-secret&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Secret&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;oauthSecretName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Opaque&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;client_secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Create BackendConfig with IAP enabled&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;iap-backendconfig&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cloud.google.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BackendConfig&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backendConfigName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;iap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;oauthclientCredentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;oauthSecretName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;timeoutSec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;connectionDraining&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;drainingTimeoutSec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me break down what this function is doing step by step:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Interface&lt;/strong&gt;: The &lt;code&gt;IapConfigProps&lt;/code&gt; interface is like a contract that ensures anyone using this function provides all the required information. It's TypeScript's way of saying "these are the mandatory parameters."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential Validation&lt;/strong&gt;: The first thing we do is check if OAuth credentials are provided. This prevents silent failures where you deploy everything successfully but IAP doesn't work because credentials are missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating the Secret&lt;/strong&gt;: The first &lt;code&gt;Manifest&lt;/code&gt; creates a Kubernetes Secret to store our OAuth credentials. Notice how we use &lt;code&gt;Buffer.from().toString("base64")&lt;/code&gt; — this is because Kubernetes secrets must be base64 encoded. The secret type is "Opaque," which is Kubernetes' way of saying "this is arbitrary user-defined data."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating the BackendConfig&lt;/strong&gt;: The second &lt;code&gt;Manifest&lt;/code&gt; is where the real IAP magic happens. We're telling GKE: "When you create the Google Cloud Backend Service for any service that references this BackendConfig, please enable IAP and use the OAuth credentials from this secret." The &lt;code&gt;timeoutSec&lt;/code&gt; and &lt;code&gt;connectionDraining&lt;/code&gt; are additional configurations to ensure graceful handling of requests during deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Create the Main Stack
&lt;/h3&gt;

&lt;p&gt;Now let's put it all together in our main infrastructure stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DataGoogleContainerCluster&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cdktf/provider-google/lib/data-google-container-cluster&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;IapWebIamBinding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cdktf/provider-google/lib/iap-web-iam-binding&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Namespace&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cdktf/provider-kubernetes/lib/namespace&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SonarQubeStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;TerraformStack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-example-project&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c1"&gt;// Enable IAP service&lt;/span&gt;
    &lt;span class="nf"&gt;enableServices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;iap.googleapis.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Reference existing GKE cluster&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gke&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DataGoogleContainerCluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gke-cluster&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;us-central1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-cluster&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Create namespace&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="k"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonarqube-namespace&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonarqube&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Create IAP resources&lt;/span&gt;
    &lt;span class="nf"&gt;createIapResources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;backendConfigName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonarqube-backendconfig&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;oauthSecretName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonarqube-iap-oauth-secret&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IAP_CLIENT_ID&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IAP_CLIENT_SECRET&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Grant access to specific users/groups&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;IapWebIamBinding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonarqube-iap-access&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;roles/iap.httpsResourceAccessor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;members&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;group:developers@examplecompany.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;group:devops@examplecompany.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user:admin@examplecompany.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what's happening in our main stack:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Source Reference&lt;/strong&gt;: &lt;code&gt;DataGoogleContainerCluster&lt;/code&gt; is not creating a new cluster — it's referencing an existing one. This is CDKTF's way of saying "I need information about this resource that already exists." It's like looking up a contact in your phone book rather than adding a new one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Namespace Creation&lt;/strong&gt;: We create a Kubernetes namespace to isolate our SonarQube resources. Think of namespaces like folders in your filesystem — they help organize and separate different applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Calling Our Helper&lt;/strong&gt;: We call our &lt;code&gt;createIapResources&lt;/code&gt; function with specific values. Notice how we use &lt;code&gt;process.env.IAP_CLIENT_ID ?? ""&lt;/code&gt; — the &lt;code&gt;??&lt;/code&gt; operator provides an empty string fallback if the environment variable isn't set. This prevents crashes but will be caught by our validation later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAM Binding - The Access Control&lt;/strong&gt;: This is crucial! &lt;code&gt;IapWebIamBinding&lt;/code&gt; is what actually grants people access. The role &lt;code&gt;roles/iap.httpsResourceAccessor&lt;/code&gt; is Google's predefined role that allows access through IAP. Without this binding, even authenticated users would be denied access. The &lt;code&gt;members&lt;/code&gt; array supports both individual users (&lt;code&gt;user:someone@company.com&lt;/code&gt;) and Google Groups (&lt;code&gt;group:teamname@company.com&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Configure Your Service
&lt;/h3&gt;

&lt;p&gt;The final piece is connecting your Kubernetes Service to the BackendConfig. In your service manifest (or Helm values), add this annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonarqube&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sonarqube&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cloud.google.com/backend-config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{"default":&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"sonarqube-backendconfig"}'&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# ... rest of your service configuration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're using Helm (like we are with SonarQube), update your &lt;code&gt;values.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
  &lt;span class="na"&gt;externalPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
  &lt;span class="na"&gt;internalPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9000&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
    &lt;span class="na"&gt;cloud.google.com/backend-config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{"default":&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"sonarqube-backendconfig"}'&lt;/span&gt;
    &lt;span class="na"&gt;cloud.google.com/load-balancer-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;External"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This annotation is the bridge between your Kubernetes Service and the BackendConfig. Here's what's happening:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Magic Annotation&lt;/strong&gt;: &lt;code&gt;cloud.google.com/backend-config&lt;/code&gt; tells GKE's ingress controller: "When you create a Google Cloud Backend Service for this Kubernetes Service, apply the configuration from this BackendConfig." The &lt;code&gt;{"default": "sonarqube-backendconfig"}&lt;/code&gt; part means "apply this BackendConfig to the default port" — if your service had multiple ports, you could specify different BackendConfigs for each.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Load Balancer Type&lt;/strong&gt;: The &lt;code&gt;cloud.google.com/load-balancer-type: "External"&lt;/code&gt; annotation ensures your service gets an external IP address that can be reached from the internet (after passing through IAP, of course).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Moment of Truth: Testing Your Implementation
&lt;/h2&gt;

&lt;p&gt;Deploy your infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cdktf deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deployment, navigate to your service URL. Instead of direct access, you should see the Google sign-in page. After authentication with your corporate account, IAP will check if you have the necessary permissions and either grant or deny access.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Could Go Wrong (And How to Fix It)
&lt;/h2&gt;

&lt;p&gt;From my experience implementing IAP across multiple services, here are the common gotchas:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. OAuth Configuration Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Users see "Error: redirect_uri_mismatch"&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Ensure your OAuth client's authorized redirect URIs include your actual domain&lt;/p&gt;

&lt;h3&gt;
  
  
  2. IAM Permission Problems
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Users authenticate but get "You don't have access"&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Check your IapWebIamBinding members list and verify users are in the specified groups&lt;/p&gt;

&lt;h3&gt;
  
  
  3. BackendConfig Not Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: IAP doesn't seem to work at all&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Verify the service annotation is correct and the BackendConfig exists in the same namespace&lt;/p&gt;

&lt;h3&gt;
  
  
  4. SSL Certificate Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: IAP works but with SSL warnings&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Ensure you have proper SSL certificates configured for your domain&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices I've Learned the Hard Way
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use Google Groups&lt;/strong&gt;: Instead of individual users, manage access through Google Groups. It's much easier to maintain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Environment Separation&lt;/strong&gt;: Use different OAuth clients for different environments (dev, staging, prod) for better security isolation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor Everything&lt;/strong&gt;: Enable IAP access logging and set up alerts for failed authentication attempts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test Thoroughly&lt;/strong&gt;: Always test with users who shouldn't have access to ensure your permissions are working correctly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Document Your Groups&lt;/strong&gt;: Keep clear documentation of which Google Groups have access to which services.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Payoff
&lt;/h2&gt;

&lt;p&gt;After implementing IAP across our internal tools, the benefits were immediately apparent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developer Productivity&lt;/strong&gt;: No more VPN hassles or remembering different passwords&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Compliance&lt;/strong&gt;: Clear audit trails and granular access control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational Simplicity&lt;/strong&gt;: Centralized identity management through Google Workspace&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Easy to add new tools and services under the same security model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The initial setup might seem complex, but once you have the pattern established, securing additional services becomes a matter of copying and adapting your existing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Identity-Aware Proxy represents a shift from traditional perimeter-based security to a zero-trust model. Combined with Infrastructure as Code practices using CDKTF, you get both security and maintainability.&lt;/p&gt;

&lt;p&gt;The implementation we've covered here is just the beginning. IAP supports advanced features like device-based access controls, context-aware access based on user location and device security posture, and integration with third-party identity providers.&lt;/p&gt;

&lt;p&gt;My advice? Start simple with basic IAP implementation, get comfortable with the concepts and workflows, then gradually add more sophisticated policies as your security requirements evolve.&lt;/p&gt;

&lt;p&gt;Remember, security isn't just about keeping the bad guys out — it's about making it easy for the good guys to get their work done safely and efficiently.&lt;/p&gt;

&lt;p&gt;What internal tools are you planning to secure with IAP? I'd love to hear about your implementation experiences in the comments!&lt;/p&gt;




&lt;p&gt;*If you found this helpful, follow me on &lt;a href="https://www.linkedin.com/in/ogonna-nnamani" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;.*for more DevOps and cloud security content.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>gcp</category>
      <category>iac</category>
    </item>
    <item>
      <title>The Role Of Chaos Engineering in Building Anti-Fragile Systems</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Fri, 12 Apr 2024 03:29:11 +0000</pubDate>
      <link>https://dev.to/cloudiepad/the-role-of-chaos-engineering-in-building-anti-fragile-systems-17bg</link>
      <guid>https://dev.to/cloudiepad/the-role-of-chaos-engineering-in-building-anti-fragile-systems-17bg</guid>
      <description>&lt;p&gt;&lt;strong&gt;Intro&lt;/strong&gt;&lt;br&gt;
Welcome back to the Antifragile series guys!&lt;br&gt;
We will be discussing the role of Chaos Engineering in designing antifragile systems.&lt;/p&gt;

&lt;p&gt;Firstly, &lt;/p&gt;

&lt;p&gt;What is Chaos Engineering?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chaos engineering&lt;/strong&gt; is a controlled chaos in systems design. It involves deliberately injecting failures and unexpected events into a system to see how it responds. The goal is to uncover weaknesses and vulnerabilities before they cause major issues in real-world scenarios.&lt;/p&gt;

&lt;p&gt;Building a system that responds almost immediately to failure shows resiliency and that is what antifragility is really about.&lt;br&gt;
Who needs Chaos Engineering?&lt;/p&gt;

&lt;p&gt;Implementing chaos engineering in an architecture involves a lot of planning because nobody wants to build and destroy. Certain use cases and industry inspire this method of systems design such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tech Companies&lt;/strong&gt; &lt;br&gt;
Especially those providing online services, cloud computing, or software as a service (SaaS) platforms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Financial Services&lt;/strong&gt;&lt;br&gt;
Banks, stock exchanges, payment processors, and other financial institutions rely on highly available and secure systems to process transactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Healthcare:&lt;/strong&gt; &lt;br&gt;
With the increasing digitization of medical records and telemedicine, healthcare organizations need reliable systems to provide critical services to patients.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Energy and Utilities&lt;/strong&gt;&lt;br&gt;
Power plants, oil refineries, and utility companies use complex systems for monitoring and managing infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These industries require constant uptime and design methods like chaos engineering can be implemented to test for resiliency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools used for Chaos Engineering&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Chaos Mesh&lt;/strong&gt;&lt;br&gt;
An open-source chaos engineering platform for Kubernetes-based applications. It allows users to orchestrate chaos experiments to test the resilience of their Kubernetes clusters. These include pod failure, network latency and load testing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8to2ftn4qa0eeuqe6oq.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8to2ftn4qa0eeuqe6oq.PNG" alt="Image of types of chaos by Chaos mesh" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pumba&lt;/strong&gt;&lt;br&gt;
 A chaos testing tool specifically designed for Docker containers. It allows users to introduce network latency, packet loss, and other disruptions to Docker containers to simulate real-world failures. &lt;br&gt;
Pumba can kill, stop or remove running containers. It can also pause all processes withing running container for specified period of time. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chaos Monkey&lt;/strong&gt;&lt;br&gt;
Developed by Netflix, Chaos Monkey is one of the earliest chaos engineering tools. It randomly terminates virtual machine instances to ensure that engineers design systems that can withstand failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Litmus Chaos&lt;/strong&gt;&lt;br&gt;
An open-source chaos engineering platform for Kubernetes. It provides a framework and a set of pre-defined chaos experiments for testing Kubernetes resilience.&lt;br&gt;
Litmus was accepted to CNCF on June 25, 2020 and moved to the Incubating maturity level on January 11, 2022.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13c4wjp80hkigzjc7u1q.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13c4wjp80hkigzjc7u1q.PNG" alt="LITMUS CHAOS" width="677" height="322"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apache Bench &lt;/strong&gt;&lt;br&gt;
(ab is the real program file name) is a single-threaded command line computer program used for benchmarking (measuring the performance of) HTTP web servers. This tool can be used to stress test your APIs or endpoints to ensure they can withstand huge concurrent traffic before deployment.&lt;/p&gt;

&lt;p&gt;These are some real life tools that are used to test for resiliency in your products to ensure you achieve an antifragile infrastructure.&lt;/p&gt;

&lt;p&gt;There are some special use cases where Chaos engineering is automated and continually implemented.&lt;/p&gt;

&lt;p&gt;A real life example is a company called FINBOURNE. Finbourne is a financial technology company that provides a cloud-based investment management platform called LUSID. LUSID is designed to help asset managers, wealth managers, and financial institutions streamline their investment operations.&lt;/p&gt;

&lt;p&gt;Finbourne hosts their infrastructure on AWS and they implement an automated, special type of chaos engineering that terminates an application every &lt;strong&gt;seventeen(17) minutes&lt;/strong&gt;, terminates an EC2 instance every &lt;strong&gt;six(6) hours&lt;/strong&gt; and fails an Availibility zone &lt;strong&gt;twice weekly&lt;/strong&gt; just to continually evaluate how quickly they recover from a failure.&lt;/p&gt;

&lt;p&gt;Mindblowing right !!!&lt;/p&gt;

&lt;p&gt;These are some of the extreme design methods some companies undergo just to ensure optimal performance and resiliency.&lt;br&gt;
That will be all on chaos engineering today!!&lt;/p&gt;

&lt;p&gt; If you enjoyed this read, connect with me on &lt;a href="https://www.linkedin.com/in/ogonna-nnamani/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;HAPPY CLOUD COMPUTING!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>chaosengineering</category>
      <category>devops</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Building Anti-Fragile Systems For Modern-Day DevOps</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Fri, 12 Apr 2024 03:09:36 +0000</pubDate>
      <link>https://dev.to/cloudiepad/building-anti-fragile-systems-for-modern-day-devops-39ff</link>
      <guid>https://dev.to/cloudiepad/building-anti-fragile-systems-for-modern-day-devops-39ff</guid>
      <description>&lt;p&gt;&lt;strong&gt;INTRODUCTION&lt;/strong&gt;&lt;br&gt;
What is antifragility?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Antifragility&lt;/strong&gt; is a concept introduced by Nassim Nicholas Taleb in his book “&lt;strong&gt;Antifragile: Things That Gain from Disorder,&lt;/strong&gt;” published in 2012. The term refers to a property of systems or entities that thrive and benefit from volatility, uncertainty, stress, and disorder.&lt;/p&gt;

&lt;p&gt;It’s common to say that robust or resilient is the opposite of fragile. Here, however, we respectfully disagree. I want to talk about an idea known as antifragility.&lt;/p&gt;

&lt;p&gt;“Resilience refers to the ability of a system or entity to withstand shocks, recover from adversity, and return to its original state or function. Resilience suggests the capacity to absorb and adapt to challenges without significant damage”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fadypehoboeqd19d6ctl0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fadypehoboeqd19d6ctl0.jpg" alt="chain" width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Antifragility goes beyond resilience. An antifragile system not only withstands stressors but actually benefits and improves as a result of exposure to adversity. It thrives in dynamic and uncertain environments, becoming stronger and more robust through challenges. The human muscular system is one example of an antifragile mechanism in nature. Our muscles experience stress when we work out, which causes them to grow and strengthen. Another term for this is post-traumatic growth.&lt;/p&gt;

&lt;p&gt;Let’s now use the example of using a courier service provider to deliver a glass piece. Packages marked “fragile” are those that detest stress and break easily upon experiencing it. An item that is designed to be mishandled and anticipates stress should be the exact opposite.&lt;/p&gt;

&lt;p&gt;Let’s bring that into the day-to-day designing, building, and managing of scalable systems by trying to build systems that expect variability and predict outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HOW TO MEASURE FRAGILITY&lt;/strong&gt;&lt;br&gt;
Fragility refers to the quality or state of being fragile, which means easily broken, delicate, or vulnerable. It can be used to describe physical objects that are prone to breaking or damage.&lt;/p&gt;

&lt;p&gt;Let’s revisit the glass mirror example. There are only two states for a mirror: whole and broken. There is no middle ground when it comes to measuring the risk associated with that mirror; it either breaks or it doesn’t. This implies that we now know the second-order derivative in the event that the mirror falls, and this information helps us prevent the mirror from falling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmgi6v0n6y2dmeoxzenf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmgi6v0n6y2dmeoxzenf.jpg" alt="mirrors" width="800" height="1000"&gt;&lt;/a&gt;&lt;br&gt;
The same ideology applies to systems. Only when we are aware of all that could go wrong in the system do we stand a better chance at preventing failure in the system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FACTORS THAT INFLUENCE ANTIFRAGILITY&lt;/strong&gt;&lt;br&gt;
Build, test, and fail fast: AWS and other public cloud providers have made building and developing systems easier because we can have multiple environments quickly. This has also enabled us to build and test quickly. Imagine spinning up a high-end server for a 30-minute test compared to having to rent that same server 20 years ago. The ease cannot be overemphasized. The ability to test fast also comes with failing fast, and when we fail fast, we learn fast. And by learning, we are able to measure the fragility of that environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funq8sf42doa3imhrbx9k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funq8sf42doa3imhrbx9k.jpg" alt=" " width="758" height="416"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Size&lt;/strong&gt;: The business’s size is a critical factor. For a monolithic program with ten infrequent users, the stressors of a mobile app with over 500k concurrent users cannot be the same. Determine the environmental risk level and apply antifragility accordingly.&lt;br&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: There are various ways that complexity can manifest itself; it might originate from the way certain functionalities are handled in the code or from the architecture of the entire infrastructure. I’ll use two AWS environments, ENV A and ENV B, as an example.&lt;br&gt;
&lt;strong&gt;ENV A&lt;/strong&gt; comprises a single server responsible for hosting file systems, databases, and the web server. In the event of a server failure, a recent backup can be deployed to replace the malfunctioning server. Additionally, when faced with a surge in traffic, auto-scaling mechanisms come into play to ensure the server remains operational. In this scenario, it can be asserted that the principles of disaster recovery contribute to the concept of antifragility.&lt;/p&gt;

&lt;p&gt;While &lt;strong&gt;ENV B&lt;/strong&gt; is a complex, loosely coupled microservice environment that comprises and relies on lambda functions, eventbridges, SNS, SQS, step-functions, and databases, Obviously, since the playing field is now bigger, so is the risk, meaning that multiple parts can fail; this would eventually introduce observability. Observability will in turn provide insights that predict failures using patterns. We may now configure automated actions and alerts that, depending on the kind of action, can be set to reverse or effect changes.&lt;br&gt;
Because both environments were able to measure fragility and predict the second-order derivative, antifragility was introduced. Building systems that anticipate variability has been made possible by this in addition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AREAS WHERE ANTIFRAGILITY CAN BE IMPLEMENTED&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Ensuring the security of our infrastructure is crucial, requiring a comprehensive approach from entry to exit. While traditional firewalls primarily served as detective systems, the evolution to next-generation firewalls (NGFWs) brings enhanced features. These include the Intrusion Prevention System (IPS), application awareness and control, and cloud-delivered threat intelligence, among others. NGFWs employ automation rules to detect anomalies and promptly respond by adjusting rules to mitigate potential threats. Notable examples of such advanced firewalls include AWS WAF and Fortinet FortiGate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69ue106icazjvraec4u4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69ue106icazjvraec4u4.jpg" alt=" " width="800" height="657"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compute&lt;/strong&gt;: Public cloud providers, such as AWS and Azure, have taken significant steps to enhance system robustness. One notable feature at the compute level is Auto-Scaling, which dynamically adjusts resources in response to sudden increases in traffic. Additionally, Elastic Load Balancer (ELB) is a key service that spans multiple Availability Zones (AZs) and conducts regular health checks. This ensures that only healthy servers receive traffic, and in the event of any issues, the ELB automatically redirects traffic to other healthy instances from a pool.&lt;br&gt;
This approach guarantees continuous uptime for the environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networking:&lt;/strong&gt; Networking is crucial for building robust systems, and achieving antifragility at the network level is essential for prioritizing interconnectivity. Spanning across two networks can enhance this antifragility, with services like AWS Route 53 enabling availability at a global scale. Route 53, a scalable domain name system (DNS) web service, efficiently routes end-user requests to globally distributed endpoints, contributing to application availability and reliability&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring and Observability:&lt;/strong&gt;&lt;br&gt;
To gain insight into patterns for measuring fragility, we require systems that bolster monitoring. Tools like CloudWatch, Prometheus, and Grafana are employed to establish alerts and updates when anomalies are detected. Observability tools, such as AWS X-RAY, are utilized to monitor existing systems. The insights gathered from observation are then leveraged to predict and anticipate anomalies, enhancing the predictability of fragility.&lt;/p&gt;

&lt;p&gt;By examining the breakdowns provided above, I hope I have been able to show you that building antifragile systems that can thrive in disorder is truly a robust way of development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QUICK SUMMARY&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building anti-fragile systems is possible&lt;br&gt;
Fragility should always be measured.&lt;br&gt;
The next part will focus on the day-to-day DevOps practices, including developing CI-CD pipelines, testing and integration, automated deployment, and monitoring.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this read. If you did, kindly connect with me on &lt;a href="https://www.linkedin.com/in/ogonna-nnamani/" rel="noopener noreferrer"&gt;LinkedIn.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HAPPY CLOUD COMPUTING!!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>antifragility</category>
      <category>cloud</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>The Kubernetes Resume Challenge Part 2</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Sun, 24 Mar 2024 00:05:29 +0000</pubDate>
      <link>https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-2-2op1</link>
      <guid>https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-2-2op1</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-1-488d"&gt;&lt;strong&gt;CLICK FOR PART 1 HERE&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 10: Autoscale Your Application&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Automate scaling based on CPU usage to handle unpredictable traffic spikes.&lt;/p&gt;

&lt;p&gt;Implement HPA: Create a Horizontal Pod Autoscaler targeting 50% CPU utilization, with a minimum of 2 and a maximum of 10 pods.&lt;/p&gt;

&lt;p&gt;Apply HPA: Execute kubectl autoscale deployment ecom-web --cpu-percent=50 --min=2 --max=10.&lt;/p&gt;

&lt;p&gt;Simulate Load: Use a tool like Apache Bench to generate traffic and increase CPU load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
To implement this, instead of using the autoscale command I generated a "hpa.yaml" file that defined the metrics that should trigger the autoscaling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  namespace: default
  name: ecomm-hpa
  labels:
    app: ecomm-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ecomm-app-deployment
  minReplicas: 2 
  maxReplicas: 10  
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have defined our HPA, this does not mean that automatically our deployment will scale, i encountered an error and no matter the amount of load testing done with Apache Bench, the pods did not scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqn4rs5i6ws608899e35y.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqn4rs5i6ws608899e35y.PNG" alt="HPA did not scale" width="693" height="74"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see here, when we apply the hpa to the deployment it is not able to track the current utilization of the pods hence the "unknown" status. After some research, the fix was that the resource part of the deployment file under the container section was not defined. This is an example of how this is defined in a deployment file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  labels:
    app: sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: sample-app
        image: your-registry/sample-app:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "300m"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;:&lt;br&gt;
resources: Specifies the computational resources (CPU, memory, etc.) needed by the pods.&lt;/p&gt;

&lt;p&gt;requests: Defines the minimum amount of resources required by the pods to run.&lt;/p&gt;

&lt;p&gt;cpu: "300m": Sets the CPU request to 300 milliCPU (300m), which represents 0.3 of a CPU core. This indicates the minimum amount of CPU that each pod in the deployment requires to function properly.&lt;/p&gt;

&lt;p&gt;After applying this updated file, to monitor this in real-time and to quickly test, i changed my hpa average from 50% to 10% and ran&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get hpa -w
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and we get the following outputs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4mdotcb8lg4y8a65l01.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4mdotcb8lg4y8a65l01.PNG" alt="HPA now scales" width="611" height="177"&gt;&lt;/a&gt;&lt;br&gt;
it finally tracked the CPU utilization and also our replicas are now scaling from 3 to 5.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LOAD TESTING TOOLS&lt;/strong&gt;&lt;br&gt;
The guidelines suggested we use &lt;strong&gt;Apache bench&lt;/strong&gt; which i found really interesting. The below command initiates the simulated requests to the kubernetes endpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ab -n 1000 -c 10 http://&amp;lt;endpoint_url or IP&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The ab command is used for benchmarking HTTP server performance. Here's what the command you provided does:&lt;/li&gt;
&lt;li&gt;-n 1000: Specifies the number of requests to perform. In this case, it's set to 1000, meaning Apache Bench (ab) will send 1000 requests to the server.&lt;/li&gt;
&lt;li&gt;-c 10: Specifies the number of multiple requests to perform at a time. Here, it's set to 10, meaning Apache Bench will send 10 requests concurrently.&lt;/li&gt;
&lt;li&gt;http://: Specifies the URL or IP address of the server to benchmark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also discovered another tool called &lt;strong&gt;kubectl load generator&lt;/strong&gt; that also runs load testing for specifically kubernetes endpoints. with a single command requests begin simulating.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://website-service; done"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;-i --tty: Allocates an interactive terminal for the command to run.
load-generator: Name of the pod.&lt;/li&gt;
&lt;li&gt;--rm: Removes the pod after it terminates.&lt;/li&gt;
&lt;li&gt;--image=busybox: Specifies the Docker image to use for the pod.&lt;/li&gt;
&lt;li&gt;--restart=Never: Indicates that the pod should not be restarted automatically if it fails.&lt;/li&gt;
&lt;li&gt;/bin/sh -c "while sleep 0.01; do wget -q -O- &lt;a href="http://website-service" rel="noopener noreferrer"&gt;http://website-service&lt;/a&gt;; done": The command to run inside the pod, which continuously sends HTTP requests to the specified endpoint - (&lt;a href="http://website-service" rel="noopener noreferrer"&gt;http://website-service&lt;/a&gt;) with a delay of 0.01 seconds between requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 11: Implement Liveness and Readiness Probes&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Add liveness and readiness probes to website-deployment.yaml, targeting an endpoint in your application that confirms its operational status.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
This is another step where i reach out to a friend (PHP developer) to configure two checks on both the website and the db. hitting the endpoint/app-healthcheck returned "App is running", same with the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 12: Utilize ConfigMaps and Secrets&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Securely manage the database connection string and feature toggles without hardcoding them in the application.&lt;br&gt;
Create Secret and ConfigMap: For sensitive data like DB credentials, use a Secret. For non-sensitive data like feature toggles, use a ConfigMap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
Similar to the feature-toggle-config, I generated a ConfigMap.yaml file that stored all environment variables initially hardcoded in the deployment file. Now the configs are stored in a configmap file and referenced in the deployement file while a website-secret.yaml and mariadb-secret.yaml is also created to handle DB credentials of both resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: website-configmap
data:
  DB_NAME: "ecomdb"
  DB_HOST: "mariadb-service"
  DB_USER: "ecomdb-user"
  DB_PASSWORD: "password"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example of a configmap.yaml file above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
data:
  DB_PASSWORD: cGFzc3dvcmQxMjM= 
kind: Secret
metadata:
  name: db-secret
type: Opaque
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example of a secret.yaml file.&lt;br&gt;
Now we reference this in our deployment.yaml file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  labels:
    app: sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: sample-app
        image: your-registry/sample-app:latest
        ports:
        - containerPort: 80
        env:
        - name: DB_HOST
          valueFrom:
            configMapKeyRef:
              name: website-configmap
              key: DB_HOST
        - name: DB_USER
          valueFrom:
            configMapKeyRef:
              name: website-configmap
              key: DB_USER
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: DB_PASSWORD
        - name: DB_NAME
          valueFrom:
            configMapKeyRef:
              name: website-configmap
              key: DB_NAME
        - name: FEATURE_DARK_MODE
          valueFrom:
            configMapKeyRef:
              name: feature-toggle-config
              key: FEATURE_DARK_MODE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this references the configmap.yaml and secret.yaml files above, respectively&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extra credit&lt;/strong&gt;:&lt;br&gt;
&lt;strong&gt;Package Everything in Helm&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Utilize Helm to package your application, making deployment and management on Kubernetes clusters more efficient and scalable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
Helm charts make managing and deploying kubernetes clusters very efficient. by utilizing a values.yaml file we can define values of our various files hereby making our chart generic and highly reusable. Below is a sample deployment file and a values.yaml handling the values of specific configurations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-app
  labels:
    app: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      containers:
      - name: {{ .Release.Name }}-container
        image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
        ports:
        - containerPort: {{ .Values.containerPort }}
        resources:
{{ toYaml .Values.resources | indent 10 }}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample Deployment.yaml file&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;replicaCount: 3
image:
  repository: your-registry/sample-app
  tag: latest
containerPort: 8080
resources:
  requests:
    cpu: "300m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample Values.yaml file&lt;/p&gt;

&lt;p&gt;The deployment.yaml file is a Helm template file. It uses Go templating syntax to inject values from the values.yaml file.&lt;br&gt;
The values.yaml file defines configurable values for the Helm chart, such as the number of replicas, the Docker image repository and tag, container port, and resource requests and limits.&lt;/p&gt;

&lt;p&gt;using helm install command deploys our website to a kubernetes cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implement Persistent Storage&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Ensure data persistence for the MariaDB database across pod restarts and redeployments.&lt;/p&gt;

&lt;p&gt;Create a PVC: Define a PersistentVolumeClaim for MariaDB storage needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mariadb-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Mi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following is a sample pvc.yaml file that defines a persistent volume that is abstracted from the pod restarts aensuring that data persists as long as it remains mounted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implement Basic CI/CD Pipeline&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Automate the build and deployment process using GitHub Actions.&lt;br&gt;
GitHub Actions Workflow: Create a .github/workflows/deploy.yml file to build the Docker image, push it to Docker Hub, and update the Kubernetes deployment upon push to the main branch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
To automate this deployment, I generated a CICD file that has the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;utilizes an Azure setup to install kubectl for our pipeline.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    - name: Install kubectl
      uses: azure/setup-kubectl@v2.0
      with:
        version: 'v1.27.0' # default is latest stable
      id: install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Authenticates to an AWS account using AWS account ID and secret access key.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;As we are deploying to ECR, we log in, build and finally push to an ECR repository.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Login to Amazon ECR Public
      id: login-ecr-public
      uses: aws-actions/amazon-ecr-login@v2
      with:
        registry-type: public

    - name: Build, tag, and push docker image to Amazon ECR Public
      env:
        REGISTRY: ${{ secrets.ECR_REGISTRY }}
        REPOSITORY: kubernetes-resume-challenge-repo
        IMAGE_TAG: latest
      run: |
        docker build -t  ${{ secrets.ECR_REGISTRY }}/kubernetes-resume-challenge-repo:latest .
        docker push  ${{ secrets.ECR_REGISTRY }}/kubernetes-resume-challenge-repo:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The below command updates the kubeconfig file with the credentials and endpoint information necessary to connect to the Amazon EKS cluster named kubernetes-cluster. This allows subsequent commands or steps in the pipeline to interact with the Kubernetes cluster&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Update kube config
  run: aws eks update-kubeconfig --name kubernetes-cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This below command navigates to the helm template directory of this project to execute the helm install and uninstall commands. Because this is a continuous pipeline the first command uninstalls it if it exists and re-installs it to update new changes made to the project file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
- name: Deploy go-app helm chart to EKS
      run: |
        helm uninstall helm-app -n helm
        cd kubernetes/helm-app
        helm install helm-app helm-app -f values.yaml . -n helm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Ladies and gentlemen, we have come to the end of this wonderful project and I didn't realize how much was put into this till I had to document and relive the roller coaster moments I had during this project. My knowledge of kubernetes and cloud in general has been stretched as a result of this. &lt;/p&gt;

&lt;p&gt;I hope that you enjoy going through this with me and maybe encourage you to also try this and have some fun while at it because I sure did!&lt;/p&gt;

&lt;p&gt;To connect with me&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/ogonna-nnamani/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/Waveey/Kubernetes-Resume-Challenge" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
&lt;a href="https://twitter.com/WavebuoyOG" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Kubernetes Resume Challenge Part 1</title>
      <dc:creator>Ogonna Nnamani</dc:creator>
      <pubDate>Sun, 24 Mar 2024 00:04:44 +0000</pubDate>
      <link>https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-1-488d</link>
      <guid>https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-1-488d</guid>
      <description>&lt;p&gt;&lt;strong&gt;Intro&lt;/strong&gt;&lt;br&gt;
This is a two-part blog article on the steps I took while doing the Kubernetes resume challenge by &lt;a href="https://newsletter.goodtechthings.com/p/take-on-the-kubernetes-resume-challenge" rel="noopener noreferrer"&gt;Forrest Brazeal&lt;/a&gt;, who is also known for creating the popular &lt;a href="https://cloudresumechallenge.dev/docs/the-challenge/aws/" rel="noopener noreferrer"&gt;Cloud Resume Challenge&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;A little backstory, Some months after I got into Cloud Computing as a DevOps Engineer, I really wanted to do the Cloud Resume Challenge but something always came in my way of completing the project. most times, it was the lack of AWS credits (don't blame me, it's easy to drown in cloud bills as an enthusiastic newbie), or some other urgent project or task at work. At some point i gave up on the idea. &lt;/p&gt;

&lt;p&gt;Fast forward to March 2024, Forrest Brazeal is out with another challenge called "&lt;a href="https://cloudresumechallenge.dev/docs/extensions/kubernetes-challenge/?utm_source=substack&amp;amp;utm_medium=email" rel="noopener noreferrer"&gt;The Kubernetes Resume&lt;/a&gt;", Click on the link to access this challenge and guidelines. Ignore the name, this is far from a resume! &lt;br&gt;
I got right into it and finally completed it. Link to my GitHub repo here. &lt;br&gt;
Come along as I walk you through the steps I followed to accomplish this task.&lt;br&gt;
Let's jump straight in!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;&lt;br&gt;
We have to deploy a PHP e-commerce website. This is a web application which faces challenges surrounding scalability and availability. To address these, we are going to leverage containerization using Docker and the orchestration using Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker and Kubernetes CLI Tools&lt;/li&gt;
&lt;li&gt;Cloud Provider Account: Access to AWS, Azure, or GCP for - 
creating a Kubernetes cluster.&lt;/li&gt;
&lt;li&gt;GitHub Account&lt;/li&gt;
&lt;li&gt;Kubernetes Crash Course&lt;/li&gt;
&lt;li&gt;E-commerce Application Source Code and DB Scripts: Available at 
kodekloudhub/learning-app-ecommerce.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Certification&lt;/strong&gt;&lt;br&gt;
The first step is to have the CKAD certification or complete the CKAD course on Kodekloud which I concluded last year but i had become rusty, nonetheless I proceeded. (don't be like me, take the course if you're not hands on with Kubernetes)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Containerize Your E-Commerce Website and Database&lt;/strong&gt;&lt;br&gt;
The second step is containerizing the e-commerce website and the database. your proficiency with Docker is tested here because you have to update the database connection string and attach an initialization script that is mounted on the database during creation. That means creating two Dockerfiles, one for the website and another for the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Website&lt;/strong&gt;&lt;br&gt;
 php:7.4-apache as base image&lt;br&gt;
install mysqli extension for PHP.&lt;br&gt;
and expose port 80.&lt;br&gt;
Test this to ensure there are no errors during build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Database &lt;/strong&gt;&lt;br&gt;
The database Dockerfile was built using an official MariaDB image, but a database initialization script will be mounted on launch of database. I spent alot of time working on this step as I am not really a fan of PHP. I was torn between having the script as an entrypoint script or as a Kubernetes ConfigMap object that will be used on my deployment. &lt;/p&gt;

&lt;p&gt;The errors I had was because I was trying to create a .env file but I just needed to hardcode it either on the Dockerfile or as the website configmap variables. either way we need to ensure that the application has variables referencing to DB_NAME, DB_HOST, DB_PASSWORD and DB_USER.&lt;/p&gt;

&lt;p&gt;Now after building locally, we push to Docker hub so we can pull this in on the deployment file. the below commands can be used to build, tag and push to Docker hub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -t cloudiepad/ecomm-img:v5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker push cloudiepad/ecomm-img:v5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Set Up Kubernetes on a Public Cloud Provider&lt;/strong&gt;&lt;br&gt;
As an AWS guy I used EKS to set up my clusters, If you are new to kubernetes as a whole, I would advise testing your cluster locally using minikube before going ahead to provision using EKS. it's pretty easy to rack up cloud debts using managed services on the cloud. &lt;/p&gt;

&lt;p&gt;It's best to test and ensure that all your manifest files are correct and working before deploying to EKS.&lt;br&gt;
 &lt;br&gt;
&lt;strong&gt;Steps to deploy an EKS cluster using CLI&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install and configure AWS CLI&lt;/li&gt;
&lt;li&gt;Install eksctl: This is an EKS command line tool that enables us 
run commands from the CLI to EKS.&lt;/li&gt;
&lt;li&gt;Install kubectl : we will need this to interact with our cluster
To set up an eks cluster, ensure that the IAM user has the 
appropriate permissions and roles to access both ECR and EKS. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use eksctl create a cluster with the following command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster --name eks-cluster --region us-east-1 --zones=us-east-1a,us-east-1b --nodegroup-name node-group --node-type t2.small --nodes 2 --nodes-min 2 --nodes-max 5 --managed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;eks-cluster&lt;/code&gt;,&lt;code&gt;us-east-1&lt;/code&gt; and &lt;code&gt;us-east-1a,us-east-1b&lt;/code&gt; with your preferred cluster name, AWS region and availability zones.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;create cluster: This part of the command instructs eksctl to create a new EKS cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;name eks-cluster: This specifies the name of the EKS cluster to be created, in this case, it's named "eks-cluster".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;region us-east-1: Specifies the AWS region in which the cluster will be created. In this case, it's US East (N. Virginia).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;zones=us-east-1a,us-east-1b: Specifies the availability zones in which the worker nodes will be created. In this example, it's specifying us-east-1a and us-east-1b.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nodegroup-name node-group: This specifies the name of the node group within the EKS cluster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;node-type t2.small: Specifies the EC2 instance type for the worker nodes. In this case, it's t2.small.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nodes 2: Specifies the initial number of worker nodes in the node group. In this example, it's set to 2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nodes-min 2: Specifies the minimum number of worker nodes in the node group. In this example, it's set to 2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;nodes-max 5: Specifies the maximum number of worker nodes in the node group. In this example, it's set to 5.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;managed: Indicates that this node group will be managed by Amazon EKS.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Deploy Your Website to Kubernetes&lt;/strong&gt;&lt;br&gt;
At this stage, I generated a &lt;em&gt;website-deployment.yaml&lt;/em&gt; file that declares my website deployment including the name of the website, the amount of replicas I want at all times, the name of my docker image and the location. &lt;/p&gt;

&lt;p&gt;There is also a need to generate a mariadb-deployment.yaml file for the database as well. Instead of using a custom MariaDb image for the DB, I stated all the environment variables in my Dockerfile, so that when the container is spun up, it has all the details I need already set up. &lt;/p&gt;

&lt;p&gt;Also ensure that the database pod has the db-load-script.sql script loaded unto the database and creates a database called 'ecomdb' . If you see that database then you have succesfully loaded the script as an entrypoint. &lt;/p&gt;

&lt;p&gt;Another issue that I encountered severally was the inability for my db-user to access the db.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR 1045 (28000): Access denied for user 'ecomm-user'@ 'localhost' (using password: NO)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This error happens when the right password is not captured in the Dockerfile. little errors like this can consume a lot of time. beware of the password that the website is expecting and the password used to launch the db. They should be the same.&lt;/p&gt;

&lt;p&gt;Below is an example of a deployment.yaml file on kubernetes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  labels:
    app: sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample
  template:
    metadata:
      labels:
        app: sample
    spec:
      containers:
      - name: sample-app
        image: your-registry/sample-app:latest
        ports:
        - containerPort: 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 5: Expose Your Website&lt;/strong&gt;&lt;br&gt;
In this stage we have to generate another yaml file called a service.yaml file, we can call this website-service.yaml and as the guidelines state, we create a LoadBalancer type. &lt;br&gt;
When we deploy this service to EKS, AWS provisions an ALB (Application Load Balancer) to serve our website. An equivalent is created for the database to maintain consistency. A kubernetes service.yaml file looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Service
metadata:
  name: sample-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: sample
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 6: Implement Configuration Management&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Add a feature toggle to the web application to enable a "dark mode" for the website. &lt;/p&gt;

&lt;p&gt;Modify the Web Application: Add a simple feature toggle in the application code (e.g., an environment variable FEATURE_DARK_MODE that enables a CSS dark theme).&lt;/p&gt;

&lt;p&gt;Use ConfigMaps: Create a ConfigMap named feature-toggle-config with the data FEATURE_DARK_MODE=true.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
This stage was a bit tricky for me because I am not a developer and even as a DevOps Engineer, PHP is not my favorite. The aim of this stage is to show that ConfigMaps can be used in several capacities. Anyways I reached out to a friend of mine who is a PHP developer and he helped me refactor the code and added a "style-dark.css" file as well to actually toggle. Now i was tasked with implementing this feature on ConfigMap. finally figured that out and my ConfigMap file looked like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: ConfigMap
metadata:
  name: feature-toggle-config
data:
  FEATURE_DARK_MODE: "true"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also the configMap had to be refereed to in the deployment file so the deployment is aware of this addition. When that is completed run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f feature-toggle-config.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to apply this config and voila we had a dark-mode website. I felt like a superman after this haha!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 7: Scale Your Application&lt;/strong&gt;&lt;br&gt;
AT this point we have deployment files, service files and a configmap file to toggle dark-mode. The next task is to Scale this website using the&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl scale deployment/&amp;lt;deployment_name&amp;gt; --replicas=6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt; This command refers to the active deployment and it is telling the deployment to scale to 6 replicas.&lt;br&gt;
this scale happens immediately and can be observed in real-time by running&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -w
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now observe that 6 pods will be in the running state as against 1 initial replica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 8: Perform a Rolling Update&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Update the website to include a new promotional banner for the marketing campaign.&lt;/p&gt;

&lt;p&gt;Update Application: Modify the web application's code to include the promotional banner.&lt;/p&gt;

&lt;p&gt;Build and Push New Image: Build the updated Docker image as yourdockerhubusername/ecom-web:v2 and push it to Docker Hub.&lt;br&gt;
Rolling Update: Update website-deployment.yaml with the new image version and apply the changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
This stage includes making a change in the website, building and pushing a new Dockerfile image version and updating the deployment with the new image version. &lt;br&gt;
Also, instead of just applying this new deployment we use the rollout mechanism where old pods will terminate while new pods are creating simultaneously so that there is no downtime hereby enhancing availability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl rollout status deployment/&amp;lt;deployment_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace  with the name of your deployment. This command will provide you with the status of the rollout, including whether it's in progress, successful, or failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 9: Roll Back a Deployment&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Task&lt;/strong&gt;: Suppose the new banner introduced a bug. Roll back to the previous version.&lt;/p&gt;

&lt;p&gt;Identify Issue: After deployment, monitoring tools indicate a problem affecting user experience.&lt;/p&gt;

&lt;p&gt;Roll Back: Execute kubectl rollout undo deployment/ecom-web to revert to the previous deployment state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation&lt;/strong&gt;&lt;br&gt;
Similar to the last step but this is the exact opposite, how do we undo a rollout that causes our website to break? firstly, we rollback to a working version while we troubleshoot the issue with that particular buggy release.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl rollout undo deployment/&amp;lt;deployment_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command seamlessly handles this.&lt;/p&gt;

&lt;p&gt;We have to take a coffee break now and and let's move on to the second part of the project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/cloudiepad/the-kubernetes-resume-challenge-part-2-2op1"&gt;&lt;strong&gt;CLICK HERE FOR PART 2 &lt;/strong&gt;&lt;/a&gt;&lt;br&gt;
If you enjoyed this article and would love to connect with me. Find me with the below links.&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/ogonna-nnamani/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/Waveey/Kubernetes-Resume-Challenge" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
&lt;a href="https://twitter.com/wavebuoyOG" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>kubernetesresumechallenge</category>
      <category>docker</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
