<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mustkhim Inamdar</title>
    <description>The latest articles on DEV Community by Mustkhim Inamdar (@mustkhim_inamdar).</description>
    <link>https://dev.to/mustkhim_inamdar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2403410%2Fb2d0d004-b829-4e07-a27a-abe460fa25f0.jpeg</url>
      <title>DEV Community: Mustkhim Inamdar</title>
      <link>https://dev.to/mustkhim_inamdar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mustkhim_inamdar"/>
    <language>en</language>
    <item>
      <title>Troubleshooting Real-World AWS EKS Issues in Production</title>
      <dc:creator>Mustkhim Inamdar</dc:creator>
      <pubDate>Fri, 11 Jul 2025 04:04:37 +0000</pubDate>
      <link>https://dev.to/mustkhim_inamdar/troubleshooting-real-world-aws-eks-issues-in-production-3331</link>
      <guid>https://dev.to/mustkhim_inamdar/troubleshooting-real-world-aws-eks-issues-in-production-3331</guid>
      <description>&lt;p&gt;&lt;em&gt;by &lt;a href="https://www.linkedin.com/in/m-inamdar" rel="noopener noreferrer"&gt;M Inamdar&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Amazon EKS makes it easier to run Kubernetes workloads in the cloud but as any platform engineer knows, production grade reliability still demands deep visibility, sound architecture, and well drilled troubleshooting.&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk you through some real-world EKS incidents I’ve personally resolved. You'll find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Root causes (RCA)&lt;/li&gt;
&lt;li&gt;Troubleshooting steps&lt;/li&gt;
&lt;li&gt;Fixes and lessons learned&lt;/li&gt;
&lt;li&gt;Diagrams and code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s get into it 👇&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;1. Node in &lt;code&gt;NotReady&lt;/code&gt; State&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptoms&lt;/em&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl get nodes&lt;/code&gt; → shows &lt;code&gt;NotReady&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Pods evicted or stuck
&lt;/li&gt;
&lt;li&gt;High node disk usage or kubelet crash&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Fix&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check disk space&lt;/span&gt;
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;

&lt;span class="c"&gt;# Clear logs&lt;/span&gt;
&lt;span class="nb"&gt;sudo truncate&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; 0 /var/log/containers/&lt;span class="k"&gt;*&lt;/span&gt;.log

&lt;span class="c"&gt;# Restart kubelet&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart kubelet

&lt;span class="c"&gt;# Replace node&lt;/span&gt;
kubectl drain &amp;lt;node&amp;gt; &lt;span class="nt"&gt;--ignore-daemonsets&lt;/span&gt; &lt;span class="nt"&gt;--delete-local-data&lt;/span&gt;
kubectl delete node &amp;lt;node&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🌐 Visual: Node in NotReady State&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------------+
|      EKS Node          |
|------------------------|
|  Kubelet process       |
|  Disk usage &amp;amp;gt; 85%      |
|  Memory pressure       |
+-----------+------------+
            |
            | Heartbeat to API server fails
            v
+------------------------+
| Kubernetes Control Plane|
|------------------------|
| Node marked NotReady    |
| Events generated        |
+-----------+------------+
            |
            | Admin runs diagnostics
            v
+------------------------+
| Resolution actions      |
| - Clear disk/logs       |
| - Restart kubelet       |
| - Drain/replace node    |
+------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. LoadBalancer Service Stuck in Pending&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptoms&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl get svc&lt;/code&gt; → EXTERNAL-IP = &lt;code&gt;&amp;lt;pending&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;No ELB visible in AWS Console&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Fix&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Tag subnets&lt;/span&gt;
aws ec2 create-tags &lt;span class="nt"&gt;--resources&lt;/span&gt; &amp;lt;subnet-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tags&lt;/span&gt; &lt;span class="nv"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kubernetes.io/cluster/&amp;lt;cluster&amp;gt;,Value&lt;span class="o"&gt;=&lt;/span&gt;shared

&lt;span class="c"&gt;# Install AWS Load Balancer Controller&lt;/span&gt;
helm repo add eks https://aws.github.io/eks-charts
helm &lt;span class="nb"&gt;install &lt;/span&gt;aws-load-balancer-controller eks/aws-load-balancer-controller &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;clusterName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;cluster-name&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; serviceAccount.name&lt;span class="o"&gt;=&lt;/span&gt;aws-load-balancer-controller &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🌐 Visual: EKS LoadBalancer Flow&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------------------+
|  Kubernetes Service (LB) |
|  Type: LoadBalancer      |
|  Exposes app externally  |
+-----------+--------------+
            |
            | Triggers AWS ELB provisioning
            v
+--------------------------+
|   aws-load-balancer-     |
|       controller         |
| (Installed via Helm)     |
+-----------+--------------+
            |
            | Creates ELB in AWS
            v
+--------------------------+
|     AWS Elastic LB       |
| - Internet-facing ELB    |
| - Listens on 80/443      |
+-----------+--------------+
            |
            | Forwards traffic to EKS nodes
            v
+--------------------------+
|     EKS Worker Nodes     |
| - Backed by Target Group |
| - Runs app pods          |
+--------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Pods in CrashLoopBackOff&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptoms&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application crashes repeatedly&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kubectl describe pod&lt;/code&gt; shows &lt;code&gt;BackOff&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Logs show stack traces or missing configs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Fix&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check logs&lt;/span&gt;
kubectl logs &amp;lt;pod&amp;gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &amp;lt;container&amp;gt;

&lt;span class="c"&gt;# Edit deployment and fix&lt;/span&gt;
kubectl edit deployment &amp;lt;deployment-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🌐 Visual: CrashLoopBackOff Lifecycle&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------------+
|      Kubernetes        |
|     Deployment/Job     |
+----------+-------------+
           |
           | Schedules Pod
           v
+------------------------+
|        Pod Starts      |
|  init containers run   |
+----------+-------------+
           |
           | Starts Main Container
           v
+------------------------+
|  Container Crashes     |  &amp;amp;lt;---- App bug, bad config, secret missing
|  Exit Code ≠ 0         |
+----------+-------------+
           |
           | Kubernetes Restarts Pod
           v
+------------------------+
| CrashLoopBackOff Timer |
+------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. API Server Latency &amp;amp; 5xx Errors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptoms&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl&lt;/code&gt; commands slow or fail&lt;/li&gt;
&lt;li&gt;High API server metrics in CloudWatch&lt;/li&gt;
&lt;li&gt;Prometheus or controllers hammering API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Fix&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example Prometheus config&lt;/span&gt;
&lt;span class="na"&gt;global&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scrape_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Batch CRD updates&lt;/li&gt;
&lt;li&gt;Add jitter/delay in loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🌐 Visual: API Server Under Stress&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------+
|      Clients          |
| - kubectl, CI/CD      |
| - Prometheus scrapes  |
| - Controllers         |
+-----------+-----------+
            |
            | High-frequency requests
            v
+----------------------------+
|     Kubernetes API Server |
| - Latency / 5xx errors     |
| - Queued requests          |
+-----------+---------------+
            |
            v
+-----------------------------+
|            etcd             |
| - Slow writes/timeouts      |
+-----------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. DNS Resolution Fails in Pod&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptoms&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App can't resolve DNS (e.g., can't reach &lt;code&gt;mydb.default.svc&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ping&lt;/code&gt; or &lt;code&gt;nslookup&lt;/code&gt; fails inside pod&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;resolv.conf&lt;/code&gt; points to broken CoreDNS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Fix&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl rollout restart deployment coredns &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system

&lt;span class="c"&gt;# Edit CoreDNS config if needed&lt;/span&gt;
kubectl edit configmap coredns &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🌐 Visual: DNS Failure Inside Pod&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+--------------------+        +---------------------+
|   Pod in EKS Node  |  --&amp;amp;gt;   |    CoreDNS Pod      |
| /etc/resolv.conf   |        | Resolves DNS queries|
| nameserver 10.x.x.x|        +----------+----------+
+--------+-----------+                   |
         |                               v
         |                     +---------------------+
         |                     |   VPC Network Layer  |
         |                     | - NACL / SG rules    |
         +--------------------&amp;amp;gt;+---------------------+
                                |
                                v
                        DNS Query Fails
                      (Timeout or No Response)
                                |
                                v
                    App Failure / Connection Error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;💡 Final Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Always run postmortems&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Ensure IAM/SG/Subnet configs are in place&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Monitor metrics, logs, and event streams&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Automate node drain and self-healing&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mustkhim Inamdar&lt;/em&gt;&lt;br&gt;
Cloud-Native DevOps Architect | Platform Engineer | CI/CD Specialist&lt;br&gt;
Passionate about automation, scalability, and next-gen tooling. With years of experience across Big Data, Cloud Operations (AWS), CI/CD, and DevOps for automotive systems, I’ve delivered robust solutions using tools like Terraform, Jenkins, Kubernetes, LDRA, Polyspace, MATLAB/Simulink, and more.&lt;/p&gt;

&lt;p&gt;I love exploring emerging tech like GitOps, MLOps, and Generative AI, and sharing practical insights from real-world projects.&lt;br&gt;
🔗 &lt;a href="https://www.linkedin.com/in/m-inamdar" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
🔗 &lt;a href="https://github.com/M-Inamdar" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Do bookmark ⭐ if you found this helpful. Comment below if you want me to share full runbooks or reusable Terraform modules I’ve built for EKS production clusters.&lt;/p&gt;




</description>
      <category>aws</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>eks</category>
    </item>
    <item>
      <title>Jenkins in Production: Real Issues, RCA, and Fixes That Actually Work</title>
      <dc:creator>Mustkhim Inamdar</dc:creator>
      <pubDate>Tue, 01 Jul 2025 11:35:15 +0000</pubDate>
      <link>https://dev.to/mustkhim_inamdar/jenkins-in-production-real-issues-rca-and-fixes-that-actually-work-3bfn</link>
      <guid>https://dev.to/mustkhim_inamdar/jenkins-in-production-real-issues-rca-and-fixes-that-actually-work-3bfn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This isn’t theory. This is a real production issue I faced with Jenkins documented with actual RCA, the troubleshooting I followed, and how I fixed and hardened the pipeline after recovery.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. Jenkins Master Became Unresponsive During Peak Hours
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Jenkins UI crashed
&lt;/li&gt;
&lt;li&gt;Builds queued indefinitely
&lt;/li&gt;
&lt;li&gt;Engineers across multiple teams blocked&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Duration of impact:
&lt;/h3&gt;

&lt;p&gt;~2 hours&lt;/p&gt;




&lt;h3&gt;
  
  
  Root Cause:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;JVM heap space exhausted → &lt;code&gt;OutOfMemoryError&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Disk usage at 100% → no cleanup of old builds/artifacts
&lt;/li&gt;
&lt;li&gt;SCM hooks and Git polling overwhelmed the executor queue
&lt;/li&gt;
&lt;li&gt;No workspace cleanup on matrix builds&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /var/log/jenkins/jenkins.log
jcmd &amp;lt;pid&amp;gt; GC.heap_info
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;
htop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Killed large zombie processes&lt;/li&gt;
&lt;li&gt;Cleared build directories&lt;/li&gt;
&lt;li&gt;Restarted Jenkins after freeing up memory&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅ Fixes Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;JVM Memory Configuration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;JENKINS_JAVA_OPTIONS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-Xms2g -Xmx4g"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Pipeline Cleanup &amp;amp; Discarder&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;buildDiscarder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logRotator&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;numToKeepStr:&lt;/span&gt; &lt;span class="s1"&gt;'10'&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;cleanWs&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Log Rotation &amp;amp; Backup&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-czf&lt;/span&gt; jenkins_backup_&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F&lt;span class="si"&gt;)&lt;/span&gt;.tar.gz /var/lib/jenkins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;SCM/Webhook Control&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; GitHub webhook throttled&lt;/li&gt;
&lt;li&gt; Added quiet periods and throttle plugin&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Jenkins Agent Disconnecting Mid-Build
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Builds failed halfway through execution&lt;/li&gt;
&lt;li&gt;Logs were incomplete&lt;/li&gt;
&lt;li&gt;Rebuilds triggered, wasting compute&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Root Cause:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SSH/JNLP connection dropped due to firewall timeout&lt;/li&gt;
&lt;li&gt;Cloud auto-scaling agents terminated mid-job&lt;/li&gt;
&lt;li&gt;No agent lifecycle hooks defined&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Checked &lt;code&gt;jenkins.log&lt;/code&gt; and agent logs&lt;/li&gt;
&lt;li&gt;Verified cloud termination settings&lt;/li&gt;
&lt;li&gt;Monitored for memory/cpu bottlenecks on agents&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅Fixes Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Enable TCP Keep-Alive&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ServerAliveInterval 60
ServerAliveCountMax 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Cloud Agent Protection&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Graceful shutdown scripts&lt;/li&gt;
&lt;li&gt;Increased idle timeout before scale-in&lt;/li&gt;
&lt;li&gt;Only terminate idle agents with no active job&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Jenkins Master UI Was Freezing Frequently
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Jenkins dashboard became sluggish&lt;/li&gt;
&lt;li&gt;Admin actions timed out&lt;/li&gt;
&lt;li&gt;Pipelines remained queued&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Root Cause:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scripted pipelines executing heavy shell operations on master&lt;/li&gt;
&lt;li&gt;Plugins with memory leaks&lt;/li&gt;
&lt;li&gt;Too many concurrent builds on master thread&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Analyzed &lt;code&gt;/metrics&lt;/code&gt; endpoint&lt;/li&gt;
&lt;li&gt;Monitored heap and thread dumps&lt;/li&gt;
&lt;li&gt;Disabled high-impact plugins&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅Fixes Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Move All Jobs Off Master&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set “Restrict where this project runs”&lt;/li&gt;
&lt;li&gt;Enforce master isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Switch to Declarative Pipelines&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced memory footprint&lt;/li&gt;
&lt;li&gt;Improved readability and safety&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enable Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installed &lt;a href="https://plugins.jenkins.io/metrics/" rel="noopener noreferrer"&gt;Metrics Plugin&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Integrated with Prometheus + Grafana&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Secrets Leaked into Logs and Artifacts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tokens and &lt;code&gt;.pem&lt;/code&gt; files showed up in Jenkins logs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.env&lt;/code&gt; files archived as pipeline artifacts&lt;/li&gt;
&lt;li&gt;Security audit failed&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Root Cause:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use of &lt;code&gt;echo $TOKEN&lt;/code&gt; inside &lt;code&gt;sh&lt;/code&gt; blocks&lt;/li&gt;
&lt;li&gt;No use of &lt;code&gt;withCredentials&lt;/code&gt; wrapper&lt;/li&gt;
&lt;li&gt;Logs not masked automatically&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scanned logs using keywords like &lt;code&gt;AKIA&lt;/code&gt;, &lt;code&gt;BEGIN RSA&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;Reviewed artifact contents for secrets&lt;/li&gt;
&lt;li&gt;Reviewed pipeline code across teams&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅ Fixes Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Mask Secrets&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="n"&gt;withCredentials&lt;/span&gt;&lt;span class="o"&gt;([&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;credentialsId:&lt;/span&gt; &lt;span class="s1"&gt;'secret-token'&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;variable:&lt;/span&gt; &lt;span class="s1"&gt;'TOKEN'&lt;/span&gt;&lt;span class="o"&gt;)])&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;sh&lt;/span&gt; &lt;span class="s1"&gt;'curl -H "Authorization: Bearer $TOKEN" https://secure-api/'&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Block Sensitive Artifact Upload&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;sh&lt;/span&gt; &lt;span class="s1"&gt;'''
    if grep -r 'AKIA' ./build; then exit 1; fi
  '''&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Enable Secret Scanning Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integrated &lt;a href="https://github.com/zricethezav/gitleaks" rel="noopener noreferrer"&gt;GitLeaks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pre-check builds for secret patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Jenkins Plugin Incompatibility After Upgrade
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What happened:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Jenkins failed to start after a routine upgrade&lt;/li&gt;
&lt;li&gt;Multiple jobs crashed due to missing plugin dependencies&lt;/li&gt;
&lt;li&gt;UI elements broke, pipelines wouldn't compile&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Root Cause:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Plugin versions upgraded without checking compatibility&lt;/li&gt;
&lt;li&gt;Jenkins core version jumped ahead&lt;/li&gt;
&lt;li&gt;Deprecated scripted pipelines using outdated plugin APIs&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Accessed Jenkins in safe mode&lt;/li&gt;
&lt;li&gt;Checked &lt;code&gt;/var/lib/jenkins/plugins&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Rolled back version via backup restore&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅ Fixes Applied
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Version Lock with &lt;code&gt;plugins.txt&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git:4.11.5
workflow-aggregator:2.6
credentials:2.6.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;Test Updates on Staging First&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jenkins Docker image with pinned plugins&lt;/li&gt;
&lt;li&gt;Automated plugin diff validation via &lt;code&gt;jenkins-plugin-cli&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Upgrade Policy Aligned with LTS Cycle&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📋 Summary Table: RCA &amp;amp; Fixes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Fix Applied&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Master Unresponsive&lt;/td&gt;
&lt;td&gt;Heap/Disk full, SCM flooding&lt;/td&gt;
&lt;td&gt;Memory tuning, cleanup, webhook control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent Disconnects&lt;/td&gt;
&lt;td&gt;Network timeout, auto-scale kills&lt;/td&gt;
&lt;td&gt;Keep-alive, lifecycle hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI Freezing&lt;/td&gt;
&lt;td&gt;Master overload, heavy plugins&lt;/td&gt;
&lt;td&gt;Pipeline refactor, monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets in Logs&lt;/td&gt;
&lt;td&gt;Unsafe usage of shell/env&lt;/td&gt;
&lt;td&gt;withCredentials, scanning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin Failures&lt;/td&gt;
&lt;td&gt;Incompatible versions&lt;/td&gt;
&lt;td&gt;Pin versions, test on staging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  ✅ Jenkins Production Readiness Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[x] JVM heap and thread monitoring&lt;/li&gt;
&lt;li&gt;[x] Log/artifact cleanup via pipeline config&lt;/li&gt;
&lt;li&gt;[x] Declarative pipelines with clean stages&lt;/li&gt;
&lt;li&gt;[x] Secrets masked and scanned&lt;/li&gt;
&lt;li&gt;[x] Plugin versions pinned in code&lt;/li&gt;
&lt;li&gt;[x] Staging Jenkins for dry runs&lt;/li&gt;
&lt;li&gt;[x] Backup + disaster recovery tested monthly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;Jenkins is a battle-tested CI/CD engine but &lt;strong&gt;left unchecked, it can become fragile and costly in production&lt;/strong&gt;. These 5 real-world issues cost teams hours, if not days. But they also taught us how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Think of Jenkins like core infrastructure&lt;/li&gt;
&lt;li&gt;Use IaC principles to control configuration&lt;/li&gt;
&lt;li&gt;Automate hygiene and disaster recovery&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mustkhim Inamdar&lt;/strong&gt;&lt;br&gt;
Cloud-Native DevOps Architect | Platform Engineer | CI/CD Specialist&lt;br&gt;
Passionate about automation, scalability, and next-gen tooling. With years of experience across Big Data, Cloud Operations (AWS), CI/CD, and DevOps for automotive systems, I’ve delivered robust solutions using tools like Terraform, Jenkins, Kubernetes, LDRA, Polyspace, MATLAB/Simulink, and more.&lt;/p&gt;

&lt;p&gt;I love exploring emerging tech like GitOps, MLOps, and Generative AI, and sharing practical insights from real-world projects.&lt;/p&gt;

&lt;p&gt;📬 Let’s connect:&lt;br&gt;
🔗 &lt;a href="https://www.linkedin.com/in/m-inamdar" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;br&gt;
📘 &lt;a href="https://github.com/M-Inamdar" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;br&gt;
🧠 Blog series on DevOps + AI coming soon!&lt;/p&gt;




&lt;p&gt;💬 Got your own Jenkins horror story?&lt;br&gt;
Drop it in the comments or DM me on LinkedIn. Let’s learn from each other’s scars and build resilient CI/CD systems.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>jenkins</category>
      <category>cicd</category>
      <category>troubleshooting</category>
    </item>
    <item>
      <title>Building Scalable Modular AWS Infrastructure with Terraform (IaC)</title>
      <dc:creator>Mustkhim Inamdar</dc:creator>
      <pubDate>Mon, 23 Jun 2025 09:57:19 +0000</pubDate>
      <link>https://dev.to/mustkhim_inamdar/building-scalable-modular-aws-infrastructure-with-terraform-iac-340a</link>
      <guid>https://dev.to/mustkhim_inamdar/building-scalable-modular-aws-infrastructure-with-terraform-iac-340a</guid>
      <description>&lt;p&gt;&lt;em&gt;by &lt;a href="https://www.linkedin.com/in/m-inamdar" rel="noopener noreferrer"&gt;M Inamdar&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;👋 Hey there! I’m Mustkhim Inamdar, a Cloud-Native DevOps Architect passionate about automation, scalability, and next-gen tooling. With 9+ years of experience across Big Data, Cloud Operations (AWS), CI/CD, and DevOps for automotive systems, I’ve delivered robust solutions using tools like Terraform, Jenkins, Kubernetes, LDRA, Polyspace, MATLAB/Simulink, and more.&lt;/p&gt;

&lt;p&gt;I love exploring emerging tech like GitOps, MLOps, and GenAI, and sharing practical insights from real-world projects. Let’s dive into the world of DevOps, cloud, and automation together!&lt;/p&gt;




&lt;p&gt;Modern cloud infrastructure demands automation, scalability, and maintainability and that’s exactly where &lt;strong&gt;Infrastructure as Code (IaC)&lt;/strong&gt; shines. But as your infrastructure grows, maintaining hundreds of lines of Terraform in a single file becomes chaotic.&lt;/p&gt;

&lt;p&gt;That's why I built this:&lt;br&gt;
A &lt;strong&gt;clean, modular, and production-grade Terraform repository&lt;/strong&gt; that can deploy scalable AWS infrastructure across multiple environments.&lt;/p&gt;

&lt;p&gt;This article is a deep dive into the repo:&lt;br&gt;
&lt;strong&gt;&lt;a href="https://github.com/M-Inamdar/iac-aws-modular-infra" rel="noopener noreferrer"&gt;iac-aws-modular-infra&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Project Exists
&lt;/h2&gt;

&lt;p&gt;Most Terraform tutorials and starter templates are overly simple or don’t reflect real-world complexity. This project is built for engineers and teams who need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Modular, reusable infrastructure components&lt;/li&gt;
&lt;li&gt;✅ Multi-environment separation (e.g., dev, prod)&lt;/li&gt;
&lt;li&gt;✅ Remote state management&lt;/li&gt;
&lt;li&gt;✅ Cleaner code structure and CI/CD-ready setup&lt;/li&gt;
&lt;li&gt;✅ A launchpad for integrating EKS, RDS, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This repo follows the &lt;strong&gt;best practices&lt;/strong&gt; recommended by HashiCorp and the Terraform community.&lt;/p&gt;


&lt;h2&gt;
  
  
  What’s Inside the Repo?
&lt;/h2&gt;

&lt;p&gt;This Terraform repo includes the following &lt;strong&gt;modular AWS resources&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vpc&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Creates custom VPCs, subnets, route tables, IGW, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ec2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deploys EC2 instances inside public/private subnets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;s3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Creates S3 buckets with custom configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Custom security groups for EC2, load balancers, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;iam&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;IAM roles and policies (basic version included)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cloudwatch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Adds alarms and monitoring for EC2 or custom metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  Directory Layout
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;iac-aws-modular-infra/
│
├── backend/                &lt;span class="c"&gt;# Remote state config using S3&lt;/span&gt;
│   └── s3_backend.tf
│
├── environments/           &lt;span class="c"&gt;# Separated environments&lt;/span&gt;
│   ├── dev/
│   │   └── main.tf
│   └── prod/
│
├── modules/                &lt;span class="c"&gt;# All infrastructure modules&lt;/span&gt;
│   ├── vpc/
│   ├── ec2/
│   ├── s3/
│   ├── sg/
│   ├── iam/
│   └── cloudwatch/
│
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Deploy
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Clone the Repo
&lt;/h3&gt;

&lt;p&gt;Clone the infrastructure code to your local machine and navigate to the desired environment (e.g., &lt;code&gt;dev&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/M-Inamdar/iac-aws-modular-infra.git
&lt;span class="nb"&gt;cd &lt;/span&gt;iac-aws-modular-infra/environments/dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. Create &lt;code&gt;terraform.tfvars&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Define environment-specific values for input variables. This file keeps your configuration modular and reusable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;region&lt;/span&gt;               &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ap-south-1"&lt;/span&gt;
&lt;span class="nx"&gt;vpc_cidr_block&lt;/span&gt;       &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
&lt;span class="nx"&gt;public_subnet_cidrs&lt;/span&gt;  &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.1.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nx"&gt;private_subnet_cidrs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.2.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nx"&gt;instance_type&lt;/span&gt;        &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t2.micro"&lt;/span&gt;
&lt;span class="nx"&gt;ami_id&lt;/span&gt;               &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ami-0abcdef1234567890"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: Add this file to &lt;code&gt;.gitignore&lt;/code&gt; to avoid committing sensitive or environment-specific data.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ae2kxjmc1wzceq6emkv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ae2kxjmc1wzceq6emkv.png" alt="Create the .tf files with touch command" width="610" height="195"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Initialize Terraform
&lt;/h3&gt;

&lt;p&gt;Set up the working directory and download the required provider plugins and modules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9icvvwhv5a83xietkz6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9icvvwhv5a83xietkz6d.png" alt="Result" width="779" height="301"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Plan &amp;amp; Apply
&lt;/h3&gt;

&lt;p&gt;Preview the changes with &lt;code&gt;plan&lt;/code&gt; and then provision the infrastructure using &lt;code&gt;apply&lt;/code&gt;.&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1mivi6vnmppf71kzpbk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1mivi6vnmppf71kzpbk.png" alt="Result" width="436" height="922"&gt;&lt;/a&gt;&lt;/p&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;✅ You’ll be prompted to confirm before resources are created.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdipwnl2oms1yzv7u1hrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdipwnl2oms1yzv7u1hrf.png" alt="Result" width="435" height="1012"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Optional: Destroy Infrastructure
&lt;/h3&gt;

&lt;p&gt;Use this to tear down all provisioned resources if no longer needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Caution&lt;/strong&gt;: This will remove all resources. Double-check before running.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Remote State Management
&lt;/h2&gt;

&lt;p&gt;Terraform uses an S3 bucket to manage state centrally for team collaboration and history tracking.&lt;/p&gt;

&lt;p&gt;Example backend config in &lt;code&gt;backend/s3_backend.tf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-terraform-state"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev/terraform.tfstate"&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ap-south-1"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: Enable &lt;a href="https://developer.hashicorp.com/terraform/language/settings/backends/s3#dynamodb-table" rel="noopener noreferrer"&gt;DynamoDB state locking&lt;/a&gt; to avoid concurrent apply conflicts:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;dynamodb_table&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-locks"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Additional Suggestions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modular Code&lt;/strong&gt;: Reuse modules from the &lt;code&gt;modules/&lt;/code&gt; directory across different environments (e.g., dev, staging, prod).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets Management&lt;/strong&gt;: Avoid hardcoding credentials. Use environment variables or tools like AWS Secrets Manager or SSM Parameter Store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation&lt;/strong&gt;: Run &lt;code&gt;terraform validate&lt;/code&gt; before applying to catch config errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: Integrate with CI/CD pipelines (e.g., GitHub Actions, Jenkins) for consistent deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: Maintain a README per environment or module for clarity.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design Philosophy
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;“Infrastructure should be treated as code and that code should be clean, reusable, and easy to test.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I designed this repo with the following principles:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Modularity:&lt;/strong&gt; Each AWS resource lives in its own module&lt;br&gt;
✅ &lt;strong&gt;Reusability:&lt;/strong&gt; Modules can be called from any environment&lt;br&gt;
✅ &lt;strong&gt;Clarity:&lt;/strong&gt; Variables and outputs are explicitly defined&lt;br&gt;
✅ &lt;strong&gt;Separation of concerns:&lt;/strong&gt; Code for prod/dev stays isolated&lt;br&gt;
✅ &lt;strong&gt;Cloud-readiness:&lt;/strong&gt; Backend is set up for remote storage&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;Here’s what I’m planning to add next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Terraform GitHub Actions for CI/CD&lt;/li&gt;
&lt;li&gt; Cost estimates with Infracost&lt;/li&gt;
&lt;li&gt; State locking via DynamoDB&lt;/li&gt;
&lt;li&gt; Terratest-based module unit tests&lt;/li&gt;
&lt;li&gt; EKS-ready VPC modules&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Modular Terraform setup ✅&lt;/li&gt;
&lt;li&gt;Real-world AWS patterns ✅&lt;/li&gt;
&lt;li&gt;Remote state and multi-env support ✅&lt;/li&gt;
&lt;li&gt;Ready for scaling and CI/CD ✅&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🤝 Contributing
&lt;/h2&gt;

&lt;p&gt;Feel free to fork the repo, raise issues, or open pull requests. Feedback and contributions are always welcome!&lt;/p&gt;

&lt;p&gt;If you use this in your projects, I’d love to hear about it drop a comment or star ⭐ the repo &lt;a href="https://github.com/M-Inamdar/iac-aws-modular-infra" rel="noopener noreferrer"&gt;iac-aws-modular-infra&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy automating!&lt;/p&gt;




&lt;p&gt;💬 &lt;strong&gt;Got questions or stuck somewhere?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Feel free to drop a comment below or DM me on &lt;a href="https://www.linkedin.com/in/m-inamdar" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
I’m always happy to help!&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>infrastructureascode</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
