<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jeancy Joachim Mukaka</title>
    <description>The latest articles on DEV Community by Jeancy Joachim Mukaka (@jeancy).</description>
    <link>https://dev.to/jeancy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3822295%2F820b8506-9bc3-4eab-960d-cd36d34b1f2e.jpeg</url>
      <title>DEV Community: Jeancy Joachim Mukaka</title>
      <link>https://dev.to/jeancy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jeancy"/>
    <language>en</language>
    <item>
      <title>When VPC Peering Looks Fine But Nothing Works: A 3-Day Debugging Story</title>
      <dc:creator>Jeancy Joachim Mukaka</dc:creator>
      <pubDate>Tue, 30 Jun 2026 00:47:22 +0000</pubDate>
      <link>https://dev.to/jeancy/when-vpc-peering-looks-fine-but-nothing-works-a-3-day-debugging-story-3pn5</link>
      <guid>https://dev.to/jeancy/when-vpc-peering-looks-fine-but-nothing-works-a-3-day-debugging-story-3pn5</guid>
      <description>&lt;p&gt;&lt;em&gt;A real-world lesson from a production-like AWS lab&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Imagine this: two servers, two VPCs, a peering connection marked as Active, DNS enabled, routes in place. Your colleague tries to reach the PeerServer from the ApiServer. Timeout.&lt;/p&gt;

&lt;p&gt;You check the peering connection. Active. You check the routes. Present. You check the Security Groups. Looks fine. Still timing out.&lt;br&gt;
That was me, for 3 days, stuck on a single challenge while the other five were already solved.&lt;br&gt;
This is the story of two misconfigurations that are easy to miss, and that most checklists forget to mention.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Lab Scenario
&lt;/h3&gt;

&lt;p&gt;The challenge was straightforward on paper.&lt;br&gt;
Two servers. Two VPCs. One peering connection between them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ApiServer&lt;/strong&gt; lives inside &lt;strong&gt;ApiVPC&lt;/strong&gt; (CIDR: &lt;code&gt;10.201.0.0/16&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PeerServer&lt;/strong&gt; lives inside &lt;strong&gt;PeerVPC&lt;/strong&gt; (CIDR: &lt;code&gt;10.202.0.0/16&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The two VPCs are connected via &lt;strong&gt;AWS VPC Peering&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The requirement: both servers must communicate with each other over &lt;strong&gt;private DNS&lt;/strong&gt;, using &lt;strong&gt;all ports&lt;/strong&gt;. And any other server launched in the same subnet as the ApiServer must have the same level of access automatically.One warning was explicit: &lt;em&gt;"Make sure the relevant CIDR range is restricted as much as possible."&lt;/em&gt; Simple enough. Except it wasn't.&lt;/p&gt;

&lt;p&gt;When my colleague attempted to reach the PeerServer from within the ApiServer, the response was always the same: &lt;strong&gt;timeout&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Day 1: Flying Solo
&lt;/h3&gt;

&lt;p&gt;My first instinct was to follow the classic VPC Peering troubleshooting checklist: peering status, route tables.&lt;br&gt;
The peering connection was Active. No issue there.&lt;br&gt;
The route tables looked broken at first, only local routes, nothing pointing to the peering connection. But I couldn't edit them; the lab didn't allow it. Digging further, I found &lt;strong&gt;6 route tables&lt;/strong&gt; across both VPCs, not just the two main ones I had initially seen. Two of them already had the correct routes in place.&lt;br&gt;
The routing was fine all along. I had just spent a day looking at the wrong tables.&lt;/p&gt;

&lt;p&gt;End of Day 1: still timing out.&lt;/p&gt;
&lt;h3&gt;
  
  
  Day 2: Even AI Couldn't Find It
&lt;/h3&gt;

&lt;p&gt;On Day 2, I brought in AI assistants to speed things up. The suggestions were consistent: peering status, DNS resolution, Security Group rules.&lt;/p&gt;

&lt;p&gt;I worked through all of it. DNS resolution enabled on both sides, Requester and Accepter. Security Groups verified and restricted to the right CIDR.&lt;br&gt;
Still timing out. Every suggestion felt right. None of them mentioned one entire layer of AWS networking.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(See the kind of checklist I was working with below)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fh6va5a3ynismx1cx6mh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fh6va5a3ynismx1cx6mh8.png" alt="Checklist from an AI" width="800" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;End of Day 2: DNS enabled, SGs adjusted, routes confirmed. Still timing out.&lt;/p&gt;
&lt;h3&gt;
  
  
  Day 3: The Two Real Culprits
&lt;/h3&gt;

&lt;p&gt;On Day 3, I changed my approach. Instead of applying suggestions, I decided to go through every single networking layer systematically, one by one, and verify each one with my own eyes before moving to the next.&lt;br&gt;
That's when the two real problems revealed themselves.&lt;/p&gt;
&lt;h4&gt;
  
  
  Culprit #1 — DNS Resolution Was Disabled
&lt;/h4&gt;

&lt;p&gt;Yes, I had been told to check DNS on Day 2. But what I hadn't fully verified was the exact state of both sides of the peering connection.&lt;br&gt;
In VPC Peering, DNS resolution must be explicitly enabled on &lt;strong&gt;both sides&lt;/strong&gt; independently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allow accepter VPC to resolve DNS of hosts in requester VPC → &lt;strong&gt;Enabled&lt;/strong&gt; ✅&lt;/li&gt;
&lt;li&gt;Allow requester VPC to resolve DNS of hosts in accepter VPC → &lt;strong&gt;Enabled&lt;/strong&gt; ✅&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once both were confirmed active, private hostnames could finally resolve to private IP addresses across the peering connection. Without this, even with perfect routing and open Security Groups, the servers simply couldn't find each other by name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This was the first fix.&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Culprit #2 — The NACL Nobody Mentioned
&lt;/h4&gt;

&lt;p&gt;This is where it gets interesting.&lt;/p&gt;

&lt;p&gt;After confirming DNS, I went deeper and looked at something that had never appeared in any checklist I had received over two days: &lt;strong&gt;Network ACLs&lt;/strong&gt;.&lt;br&gt;
The PeerServer's subnet was associated with a NACL called &lt;code&gt;PrivateACL2&lt;/code&gt;. When I opened its inbound rules, this is what I found:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Port Range&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Allow/Deny&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;All traffic&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;0.0.0.0/0&lt;/td&gt;
&lt;td&gt;❌ Deny&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;One single rule. A catch-all Deny. Zero Allow rules.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every single packet arriving at the PeerServer's subnet from the ApiServer was being silently dropped at the NACL level, before it could even reach the instance or the Security Group.&lt;/p&gt;

&lt;p&gt;This is the critical difference between &lt;strong&gt;NACLs and Security Groups&lt;/strong&gt; that is easy to forget:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security Groups are stateful&lt;/strong&gt; → if outbound is allowed, the return traffic is automatically allowed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NACLs are stateless&lt;/strong&gt; → every direction must be explicitly allowed, inbound AND outbound, independently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NACLs apply to the entire subnet&lt;/strong&gt; → every server launched in that subnet is automatically subject to the same rules, without needing to touch individual instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point was actually the key to satisfying the challenge requirement: &lt;em&gt;"any other server launched in the same subnet must have the same level of access automatically."&lt;/em&gt; A Security Group change on one instance would never achieve that. A NACL rule would.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt;: I added one inbound rule to &lt;code&gt;PrivateACL2&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Protocol&lt;/th&gt;
&lt;th&gt;Port Range&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Allow/Deny&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;All traffic&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;10.201.0.0/16&lt;/td&gt;
&lt;td&gt;✅ Allow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source restricted to exactly &lt;code&gt;10.201.0.0/16&lt;/code&gt; — the ApiVPC CIDR — and nothing else. Respecting the warning about keeping CIDR ranges as restricted as possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenge validated. ✅&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Key Lesson: Always Check the Full Stack
&lt;/h3&gt;

&lt;p&gt;Three days. Two misconfigurations. One layer that nobody mentioned.&lt;br&gt;
Looking back, the debugging process taught me something more valuable than the fix itself: &lt;strong&gt;in AWS networking, a timeout doesn't tell you where the problem is. It only tells you that something, somewhere in the stack, is blocking traffic.&lt;/strong&gt;&lt;br&gt;
And that stack has more layers than most checklists cover.&lt;/p&gt;
&lt;h4&gt;
  
  
  Why NACLs Are Always Forgotten
&lt;/h4&gt;

&lt;p&gt;Security Groups get all the attention. They are instance-level, they are stateful, they are the first thing everyone checks. And because they handle return traffic automatically, they feel complete.&lt;br&gt;
NACLs are different. They are subnet-level, stateless, and silent. They don't send back an error. They just drop the packet. Which is exactly why a NACL misconfiguration produces a timeout, not a rejection message.&lt;br&gt;
And because they sit at the subnet level, they are invisible when you are focused on individual instances.&lt;/p&gt;
&lt;h4&gt;
  
  
  The Complete VPC Peering Troubleshooting Checklist
&lt;/h4&gt;

&lt;p&gt;Next time you face a VPC Peering connectivity issue, go through this list in order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Peering Connection&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Status is Active&lt;/li&gt;
&lt;li&gt;Both VPCs are in compatible regions and accounts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. DNS Resolution&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enabled on the Requester VPC side&lt;/li&gt;
&lt;li&gt;Enabled on the Accepter VPC side&lt;/li&gt;
&lt;li&gt;Both must be explicitly enabled independently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Route Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subnet of Server A has a route to VPC-B CIDR via the peering connection&lt;/li&gt;
&lt;li&gt;Subnet of Server B has a route to VPC-A CIDR via the peering connection&lt;/li&gt;
&lt;li&gt;Check all route tables, not just the Main one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Network ACLs&lt;/strong&gt; ← &lt;em&gt;the one everyone forgets&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inbound rules on Server A's subnet allow traffic from VPC-B CIDR&lt;/li&gt;
&lt;li&gt;Outbound rules on Server A's subnet allow traffic to VPC-B CIDR&lt;/li&gt;
&lt;li&gt;Inbound rules on Server B's subnet allow traffic from VPC-A CIDR&lt;/li&gt;
&lt;li&gt;Outbound rules on Server B's subnet allow traffic to VPC-A CIDR&lt;/li&gt;
&lt;li&gt;Always use the specific VPC CIDR, never &lt;code&gt;0.0.0.0/0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Security Groups&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server B's SG allows inbound traffic from VPC-A CIDR on required ports&lt;/li&gt;
&lt;li&gt;Server A's SG allows outbound traffic to VPC-B CIDR&lt;/li&gt;
&lt;li&gt;Restrict CIDR ranges as much as possible&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  The Subnet-Level Requirement
&lt;/h4&gt;

&lt;p&gt;One last thing worth highlighting. The challenge required that any server launched in the same subnet as the ApiServer automatically inherits the same level of access.&lt;/p&gt;

&lt;p&gt;This is precisely why the NACL was the right tool here, not the Security Group. A Security Group is attached per instance. A NACL covers the entire subnet. Any new server launched in that subnet automatically inherits the NACL rules, with zero additional configuration.&lt;/p&gt;

&lt;p&gt;If you solve a connectivity requirement at the Security Group level only, you will need to manually replicate that configuration for every new instance. The NACL approach enforces it by design.&lt;/p&gt;
&lt;h3&gt;
  
  
  Codify It So It Never Happens Again
&lt;/h3&gt;

&lt;p&gt;This entire debugging story raises an obvious question: why was any of this discoverable only by clicking through the console for three days?&lt;br&gt;
The answer is that both misconfigurations, DNS resolution disabled, NACL missing an Allow rule, are exactly the kind of settings that get silently skipped during manual setup, and silently missed during manual review. If this infrastructure had been defined in Terraform from the start, both issues would have been visible in a pull request, not buried three clicks deep in the console.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Force DNS resolution at the peering connection level&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_peering_connection"&lt;/span&gt; &lt;span class="s2"&gt;"api_to_peer"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;peer_vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;peer_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;auto_accept&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-to-peer"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_peering_connection_options"&lt;/span&gt; &lt;span class="s2"&gt;"api_to_peer_options"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_peering_connection_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc_peering_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_to_peer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;requester&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;allow_remote_vpc_dns_resolution&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;accepter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;allow_remote_vpc_dns_resolution&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this in code, DNS resolution on both sides is no longer an optional checkbox someone might forget to tick in the console. It's an explicit, reviewable, enforced setting. If a teammate ever tries to remove it, the change shows up in a diff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Make NACL rules explicit, not implicit&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_network_acl_rule"&lt;/span&gt; &lt;span class="s2"&gt;"allow_inbound_from_api_vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;network_acl_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_network_acl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_acl_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;rule_number&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="nx"&gt;egress&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
  &lt;span class="nx"&gt;rule_action&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"allow"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_vpc_cidr&lt;/span&gt;   &lt;span class="c1"&gt;# 10.201.0.0/16&lt;/span&gt;
  &lt;span class="nx"&gt;from_port&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;to_port&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_network_acl_rule"&lt;/span&gt; &lt;span class="s2"&gt;"allow_outbound_to_api_vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;network_acl_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_network_acl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_acl_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;rule_number&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="nx"&gt;egress&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;protocol&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
  &lt;span class="nx"&gt;rule_action&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"allow"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_vpc_cidr&lt;/span&gt;
  &lt;span class="nx"&gt;from_port&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;to_port&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the CIDR is a variable, not a hardcoded value and definitely not &lt;code&gt;0.0.0.0/0&lt;/code&gt;. This keeps the "restrict the CIDR range as much as possible" requirement enforced by design, not by memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Catch drift before it becomes a 3-day debugging session&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The real value of this approach isn't the code itself, it's what it prevents. A &lt;code&gt;terraform plan&lt;/code&gt; run in CI on every pull request would have flagged a missing NACL rule or a disabled DNS option immediately, as a visible diff, instead of a silent timeout discovered days later in production or in a lab.&lt;/p&gt;

&lt;p&gt;NAT Gateways, NACLs, peering DNS options, these are exactly the settings that survive for months unnoticed because nobody is actively looking at them. Infrastructure as Code doesn't just make deployments repeatable. It makes the invisible parts of your network visible again.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article is part of my AWS Solutions Architect Associate (SAA-C03) preparation series. I document real hands-on lab experiences, networking challenges, and lessons learned along the way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow along for more practical AWS architecture and networking content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>networking</category>
      <category>peering</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>How a Single NAT Gateway Can Silently Kill Your AWS High Availability</title>
      <dc:creator>Jeancy Joachim Mukaka</dc:creator>
      <pubDate>Thu, 04 Jun 2026 15:47:31 +0000</pubDate>
      <link>https://dev.to/jeancy/how-a-single-nat-gateway-can-silently-kill-your-aws-high-availability-2ggk</link>
      <guid>https://dev.to/jeancy/how-a-single-nat-gateway-can-silently-kill-your-aws-high-availability-2ggk</guid>
      <description>&lt;p&gt;&lt;em&gt;A real-world lesson from a production-like AWS lab challenge&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Scenario That Should Scare You
&lt;/h3&gt;

&lt;p&gt;Imagine this: your AWS environment has two Availability Zones, public and private subnets, an Application Load Balancer, Auto Scaling. Your architecture diagram looks solid. Then one Availability Zone goes down,  your ALB fails over instantly, your EC2 instances in AZ-B are running fine. But your application is still broken.&lt;/p&gt;

&lt;p&gt;Because every private subnet instance, including those in AZ-B, is routing outbound traffic through one NAT Gateway sitting in AZ-A. Which is now unreachable.&lt;/p&gt;

&lt;p&gt;You didn't have a highly available architecture. You had the illusion of one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Problem: NAT Gateways Are Zonal
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A NAT Gateway is not a regional resource. It lives in a specific Availability Zone.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you create a NAT Gateway, you place it in a specific subnet, which belongs to a specific AZ. If that AZ goes down, your NAT Gateway goes down with it.&lt;/p&gt;

&lt;p&gt;Many teams create a single NAT Gateway to save costs, then route all private subnet traffic across all AZs through that one gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Private Subnet AZ-A → 0.0.0.0/0 → nat-09xxxxx (AZ-A) ✅
Private Subnet AZ-B → 0.0.0.0/0 → nat-09xxxxx (AZ-A) ❌
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The private subnet in AZ-B is routing through a NAT Gateway in AZ-A. This is a &lt;strong&gt;cross-AZ dependency&lt;/strong&gt;, and a silent Single Point of Failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Found in the Lab
&lt;/h3&gt;

&lt;p&gt;The lab presented a VPC with this structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;CIDR / Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;VPC&lt;/td&gt;
&lt;td&gt;10.0.0.0/16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public Subnet AZ-A&lt;/td&gt;
&lt;td&gt;10.0.128.0/20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public Subnet AZ-B&lt;/td&gt;
&lt;td&gt;10.0.144.0/20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Subnet 1A (AZ-A)&lt;/td&gt;
&lt;td&gt;10.0.0.0/19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Subnet 1B (AZ-A)&lt;/td&gt;
&lt;td&gt;10.0.192.0/21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Subnet 2A (AZ-B)&lt;/td&gt;
&lt;td&gt;10.0.32.0/19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Subnet 2B (AZ-B)&lt;/td&gt;
&lt;td&gt;10.0.200.0/21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two NAT Gateways existed: one in AZ-A, one in AZ-B. At first glance, this looked correct.&lt;/p&gt;

&lt;p&gt;But when I inspected the Route Tables, the problem was immediately visible. &lt;strong&gt;All four private subnet Route Tables had the same entry:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Destination: 0.0.0.0/0 → Target: nat-09xxxxxxxx (AZ-A)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The NAT Gateway in AZ-B existed, but nobody was using it. It was provisioned but completely disconnected from the routing logic. The two private subnets in AZ-B were silently depending on the NAT Gateway in AZ-A for all outbound internet traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Happens
&lt;/h3&gt;

&lt;p&gt;There are two common causes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Cost-cutting gone wrong&lt;/strong&gt;&lt;br&gt;
Teams create one NAT Gateway to reduce costs, then forget that high availability requires one per AZ. A NAT Gateway costs approximately $0.045/hour plus data transfer charges. Running two instead of one adds roughly $32/month, a small price compared to the cost of an outage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Infrastructure drift&lt;/strong&gt;&lt;br&gt;
The architecture was correct at some point, then someone modified the Route Tables manually, or via a flawed IaC change, and the second NAT Gateway became orphaned without anyone noticing. No alerts, no errors, no warnings. Everything looks fine until AZ-A goes down.&lt;/p&gt;

&lt;p&gt;This is what makes this particular SPOF so dangerous: &lt;strong&gt;it is completely invisible during normal operations.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: One NAT Gateway Per AZ, One Route Table Per Private Subnet
&lt;/h3&gt;

&lt;p&gt;The solution is straightforward: each private subnet must route its outbound internet traffic through the NAT Gateway &lt;strong&gt;in its own Availability Zone.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Correct routing after the fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Private Subnet 1A (AZ-A) → 0.0.0.0/0 → nat-AZ-A ✅
Private Subnet 1B (AZ-A) → 0.0.0.0/0 → nat-AZ-A ✅
Private Subnet 2A (AZ-B) → 0.0.0.0/0 → nat-AZ-B ✅
Private Subnet 2B (AZ-B) → 0.0.0.0/0 → nat-AZ-B ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1 — Identify which NAT Gateway belongs to which AZ&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;VPC → NAT Gateways&lt;/strong&gt;, click each NAT Gateway and check the &lt;strong&gt;Subnet&lt;/strong&gt; field, this tells you which AZ it belongs to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Fix the Route Tables for AZ-B private subnets&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;VPC → Route Tables&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Find the Route Table associated with &lt;strong&gt;Private Subnet 2A (AZ-B)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Edit routes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Change &lt;code&gt;0.0.0.0/0&lt;/code&gt; from &lt;code&gt;nat-AZ-A&lt;/code&gt; → &lt;code&gt;nat-AZ-B&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Save changes&lt;/li&gt;
&lt;li&gt;Repeat for &lt;strong&gt;Private Subnet 2B (AZ-B)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Verify&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All four private subnet Route Tables should now point exclusively to the NAT Gateway in their own AZ. If AZ-A goes down, AZ-B is completely self-sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting It Right From the Start: Terraform
&lt;/h3&gt;

&lt;p&gt;If you're provisioning your VPC with Infrastructure as Code, which you should be, here's how to enforce this pattern correctly with Terraform from day one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NAT Gateway in AZ-A&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_eip"&lt;/span&gt; &lt;span class="s2"&gt;"nat_a"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"nat_a"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"nat-gateway-az-a"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# NAT Gateway in AZ-B&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_eip"&lt;/span&gt; &lt;span class="s2"&gt;"nat_b"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"nat_b"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"nat-gateway-az-b"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Route Table — AZ-A private subnets&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table"&lt;/span&gt; &lt;span class="s2"&gt;"private_a"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;route&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_block&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;
    &lt;span class="nx"&gt;nat_gateway_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_nat_gateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"private-rt-az-a"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Route Table — AZ-B private subnets&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table"&lt;/span&gt; &lt;span class="s2"&gt;"private_b"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;route&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_block&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;
    &lt;span class="nx"&gt;nat_gateway_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_nat_gateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"private-rt-az-b"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Associations — AZ-A&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table_association"&lt;/span&gt; &lt;span class="s2"&gt;"private_1a"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_1a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_route_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table_association"&lt;/span&gt; &lt;span class="s2"&gt;"private_1b"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_1b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_route_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Associations — AZ-B&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table_association"&lt;/span&gt; &lt;span class="s2"&gt;"private_2a"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_2a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_route_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route_table_association"&lt;/span&gt; &lt;span class="s2"&gt;"private_2b"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_2b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_route_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty of this approach: &lt;strong&gt;the correct pattern is enforced by design.&lt;/strong&gt; Each AZ has its own NAT Gateway, its own Route Table, and explicit associations. Infrastructure drift becomes impossible, any change goes through code review.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Broader Lesson: Designing for Failure
&lt;/h3&gt;

&lt;p&gt;AWS high availability is built on one fundamental principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Assume everything will fail. Design so that the failure of any single component does not bring down the entire system.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A NAT Gateway is a component. An Availability Zone is a failure domain. When you route cross-AZ traffic through a single NAT Gateway, you create an invisible dependency that violates this principle, and the worst part is that &lt;strong&gt;everything looks fine until the moment it isn't.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AWS Well-Architected Framework's Reliability Pillar specifically calls for eliminating Single Points of Failure. A shared NAT Gateway is a textbook SPOF, easy to miss precisely because the architecture looks correct at first glance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A NAT Gateway is &lt;strong&gt;zonal&lt;/strong&gt;, it belongs to one specific Availability Zone&lt;/li&gt;
&lt;li&gt;Routing all private subnet traffic through a single NAT Gateway creates a &lt;strong&gt;hidden Single Point of Failure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The fix: &lt;strong&gt;one NAT Gateway per AZ, one Route Table per AZ&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Terraform&lt;/strong&gt; to enforce this pattern by design and prevent infrastructure drift&lt;/li&gt;
&lt;li&gt;The cost of two NAT Gateways (~$32/month extra) is nothing compared to the cost of an outage&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article is part of my AWS Solutions Architect Associate (SAA-C03) preparation series. I document real hands-on lab experiences, architecture challenges, and lessons learned along the way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow along for more practical AWS architecture and Infrastructure as Code content.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>devops</category>
      <category>sre</category>
    </item>
    <item>
      <title>Stop Putting Everything in One Terraform State: Use Terragrunt Dependency Blocks</title>
      <dc:creator>Jeancy Joachim Mukaka</dc:creator>
      <pubDate>Wed, 29 Apr 2026 15:40:49 +0000</pubDate>
      <link>https://dev.to/jeancy/stop-putting-everything-in-one-terraform-state-use-terragrunt-dependency-blocks-1lhl</link>
      <guid>https://dev.to/jeancy/stop-putting-everything-in-one-terraform-state-use-terragrunt-dependency-blocks-1lhl</guid>
      <description>&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before getting started, make sure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic knowledge of Terraform (HCL syntax, resources, variables, remote state)&lt;/li&gt;
&lt;li&gt;Terraform &amp;gt;= 1.11 installed - &lt;a href="https://developer.hashicorp.com/terraform/install" rel="noopener noreferrer"&gt;Download&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Terragrunt installed - &lt;a href="https://terragrunt.gruntwork.io/docs/getting-started/quick-start/#install-terragrunt" rel="noopener noreferrer"&gt;Installation guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An AWS CLI configured with sufficient permissions to create S3 buckets and EC2 instances&lt;/li&gt;
&lt;li&gt;Visual Studio Code with the &lt;a href="https://marketplace.visualstudio.com/items?itemName=HashiCorp.terraform" rel="noopener noreferrer"&gt;HashiCorp Terraform extension&lt;/a&gt; for syntax hightlighting and autocompletion&lt;/li&gt;
&lt;li&gt;Read Part 1 of this series: &lt;a&gt;Stop Copy-Pasting Terraform State Configs: Use Terragrunt Instead&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In Part 1 of this series, we saw how Terragrunt eliminates the repetition of remote state backend configurations across environments. if you haven't read it yet, I recommend starting there - &lt;a&gt;Stop Copy-Pasting Terraform State Configs: Use Terragrunt Instead&lt;/a&gt;.&lt;br&gt;
Today, we go one step further.&lt;br&gt;
Most of Terraform projects start the same way: everything in one state file. Your VPC, your security groups, your EC2 instances, your RDS database, all managed together. It feels simple and convenient at first. But as your infrastructure grows, this approach becomes a hidden risk.&lt;br&gt;
Imagine this: a developer runs terraform apply to redeploy an EC2 instance that is rebuilt multiple times a day. Because everything is in the same state file, that single command now has access to your VPC configuration, your production database, and your security groups, resources that should never be touched during a routine EC2 redeployment.&lt;br&gt;
One wrong move, one bad variable, one interrupted apply, and you could accidentally destroy or corrupt critical infrastructure that takes hours to rebuild.&lt;br&gt;
In this article, we'll explore how Terragrunt's dependency blocks allow you to split your Terraform state between infrastructure components, so that frequently changed resources never put your critical infrastructure at risk.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem: One State File to Rule Them All
&lt;/h2&gt;

&lt;p&gt;When everything lives in a single Terraform state file, your infrastructure looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;single state file
├── VPC                ← modified once a month
├── Subnets            ← modified once a month
├── Security Groups    ← modified occasionally
├── RDS Database       ← critical, rarely modified
└── EC2 Instances      ← modified 10x per day
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every terraform apply, no matter how small, touches this single state file. This creates three serious problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Problem 1: &lt;em&gt;Blast radius&lt;/em&gt;: If something goes wrong during a routine EC2 redeployment, the entire state file is at risk. A corrupted state means Terraform loses track of all your resources, VPC, database, everything.&lt;/li&gt;
&lt;li&gt;Problem 2: &lt;em&gt;No separation of concerns&lt;/em&gt;: A junior developer redeploying an EC2 instance has the same Terraform access as a senior engineer modifying the VPC. There is no natural boundary between critical and non-critical infrastructure.&lt;/li&gt;
&lt;li&gt;Problem 3: &lt;em&gt;Slow operations As your infrastructure grows&lt;/em&gt;: Terraform has to refresh the state of every single resource on every terraform plan or terraform apply, even if you're only changing one EC2 instance. This makes operations increasingly slow.
The solution is to split your state between infrastructure components, and Terragrunt dependency blocks make this both simple and elegant.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution: Separate States with Dependency Blocks
&lt;/h2&gt;

&lt;p&gt;Instead of one monolithic state file, Terragrunt allows you to give each infrastructure component its own isolated state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vpc/                    ← state 1 — modified rarely
security-groups/        ← state 2 — modified occasionally
rds/                    ← state 3 — critical, rarely modified
ec2/                    ← state 4 — modified daily
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each component lives in its own folder with its own terragrunt.hcl file and its own state file in S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s3://my-terraform-state/
├── dev/vpc/terraform.tfstate
├── dev/security-groups/terraform.tfstate
├── dev/rds/terraform.tfstate
└── dev/ec2/terraform.tfstate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when a developer runs terraform apply on the EC2 component, only the EC2 state is touched. The VPC, the database, and the security groups are completely isolated and protected.&lt;br&gt;
&lt;strong&gt;But here's the challenge&lt;/strong&gt;: if components are separated, how does the EC2 module know the subnet ID from the VPC module? How does the security group know the VPC ID?&lt;br&gt;
This is where Terragrunt's dependency block comes in.&lt;br&gt;
The dependency block allows a component to &lt;strong&gt;read the outputs of another component&lt;/strong&gt; without sharing the same state file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ec2/terragrunt.hcl&lt;/span&gt;

&lt;span class="nx"&gt;include&lt;/span&gt; &lt;span class="s2"&gt;"root"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;find_in_parent_folders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Declare dependency on VPC component&lt;/span&gt;
&lt;span class="nx"&gt;dependency&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;config_path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../vpc"&lt;/span&gt;

  &lt;span class="nx"&gt;mock_outputs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"subnet-00000000"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Declare dependency on security groups component&lt;/span&gt;
&lt;span class="nx"&gt;dependency&lt;/span&gt; &lt;span class="s2"&gt;"security_groups"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;config_path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../security-groups"&lt;/span&gt;

  &lt;span class="nx"&gt;mock_outputs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;sg_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sg-00000000"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Use outputs from dependencies as inputs&lt;/span&gt;
&lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dependency&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnet_id&lt;/span&gt;
  &lt;span class="nx"&gt;security_group&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dependency&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;security_groups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sg_id&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t2.micro"&lt;/span&gt;
  &lt;span class="nx"&gt;environment&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice here: &lt;br&gt;
First, config_path points to the folder of the dependency, not a specific file. Terragrunt knows where to find the outputs.&lt;br&gt;
Second, mock_outpouts provides fake values for when you run terragrunt plan without the dependencies being deployed yet. this allows you to validate your configuration before deploying anything.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Complete Project Structure
&lt;/h2&gt;

&lt;p&gt;Here is the complete project structure for a dev environment with separated state files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── terragrunt.hcl                    # Root — remote state defined once
└── dev/
    ├── vpc/
    │   ├── terragrunt.hcl
    │   └── main.tf                   # VPC + Subnets
    ├── security-groups/
    │   ├── terragrunt.hcl
    │   └── main.tf                   # Security Groups
    ├── rds/
    │   ├── terragrunt.hcl
    │   └── main.tf                   # RDS Database
    └── ec2/
        ├── terragrunt.hcl
        └── main.tf                   # EC2 Instances
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each component exposes its key values through Terraform outputs, which are then consumed by dependent components via the dependency block.&lt;br&gt;
Here is how the dependency chain flows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vpc/
  └── outputs: vpc_id, subnet_id
        │
        ├─────────────────────────┐
        ▼                         ▼
security-groups/                 ec2/
  inputs: vpc_id            inputs: subnet_id
  outputs: sg_id                  │
        │                         │
        └─────────────────────────┘
                    ▼
                  ec2/
             inputs: sg_id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terragrunt reads this dependency graph automatically and deploys components in the correct order: VPC first, then security groups, then EC2. You never have to think about the deployment order manually.&lt;br&gt;
The VPC component is the simplest, it has no dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dev/vpc/terragrunt.hcl&lt;/span&gt;

&lt;span class="nx"&gt;include&lt;/span&gt; &lt;span class="s2"&gt;"root"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;find_in_parent_folders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And its main.tf exposes the values that other components need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dev/vpc/main.tf&lt;/span&gt;

&lt;span class="k"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"vpc_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"subnet_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The outputs are what the dependency block reads when EC2 asks for dependency.vpc.outputs.subnet_id.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The full code for this project structure is available in the &lt;a href="https://github.com/JM01lab/aws-terragrunt-examples/tree/main/part2-component-isolation/dev" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Deploying with Dependency Blocks
&lt;/h2&gt;

&lt;p&gt;Once your structure is in place, deploying is as simple as one command from the dev/ folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terragrunt run-all apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terragrunt automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads the dependency graph across all components&lt;/li&gt;
&lt;li&gt;Deploys in the correct order — VPC → Security Groups → EC2&lt;/li&gt;
&lt;li&gt;Passes outputs from one component as inputs to the next&lt;/li&gt;
&lt;li&gt;Creates a separate state file in S3 for each component&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also target individual components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Redeploy only EC2 — VPC and Security Groups are untouched&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;dev/ec2/
terragrunt apply

&lt;span class="c"&gt;# Check outputs of a specific component&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;dev/vpc/
terragrunt output

&lt;span class="c"&gt;# Destroy in reverse order automatically&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;dev/
terragrunt run-all destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Notice the power here&lt;/strong&gt;: when you run terragrunt apply in dev/ec2/ only, Terraform touches only the EC2 state file. Your VPC and database state files are completely safe, even if something goes wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mock_outputs - Why They Matter
&lt;/h3&gt;

&lt;p&gt;When you run terragrunt plan on EC2 component before the VPC is deployed, Terragrunt needs values for subnet.id and sg.id to validate the configuration. Since the real values don't exist yet, mock_outputs provides temporary placeholders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;dependency&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;config_path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../vpc"&lt;/span&gt;

  &lt;span class="nx"&gt;mock_outputs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"subnet-00000000"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;mock_outputs_allowed_terraform_commands&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"validate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"plan"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mock_outputs_allowed_terraform_commands parameter ensures that mock values are only used during validate and plan, never during apply. This prevents accidental deployments with fake values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Note on Security&lt;/strong&gt;&lt;br&gt;
Before wrapping up, a quick but important note on security, raised by &lt;a href="https://dev.to/sqlxpert"&gt;Paul Marcelin&lt;/a&gt; in the comments of Part 1.&lt;br&gt;
When your state files are separated by component, you have a natural opportunity to apply different IAM permissions per component. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers can have read/write access to the EC2 state file&lt;/li&gt;
&lt;li&gt;Only senior engineers or CI/CD pipelines can access the VPC and RDS state files&lt;/li&gt;
&lt;li&gt;Production state files can be encrypted with dedicated KMS keys per environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a significant security improvement over a single state file where everyone has access to everything. Separating state files is the first step, securing them with IAM policies and KMS encryption is the natural next step.&lt;br&gt;
For a deep dive on Terraform state file security, I recommend this LinkedIn post by Yaroslav Naumenko: &lt;a href="https://www.linkedin.com/posts/ynaumenko_terraform-terragrunt-iac-share-7439240151249231872-OMca?utm_source=social_share_send&amp;amp;utm_medium=member_desktop_web&amp;amp;rcm=ACoAADjV-nEBBut00mj1Y619QEAPOvT8nA_vQb8" rel="noopener noreferrer"&gt;"Your Terraform state file is a secret. Most teams don't treat it that way."&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Let's recap what we covered in this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The problem:  a monolithic state file creates a dangerous blast radius where routine operations can accidentally affect critical infrastructure&lt;/li&gt;
&lt;li&gt;The solution: Terragrunt dependency blocks allow each component to have its own isolated state file&lt;/li&gt;
&lt;li&gt;The dependency block reads outputs from other components without sharing their state&lt;/li&gt;
&lt;li&gt;mock_outputs allow you to validate configurations before dependencies are deployed&lt;/li&gt;
&lt;li&gt;terragrunt run-all apply automatically respects the dependency order&lt;/li&gt;
&lt;li&gt;Separating state files is also the foundation for better security, different IAM permissions per component.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, Part 1 and Part 2 give you a complete Terragrunt workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Part 1 → One root terragrunt.hcl    = No repeated backend configs
Part 2 → Dependency blocks          = No more monolithic state files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you found this helpful, share it and follow me for the next article in the series.&lt;br&gt;
The code for this article is available on my &lt;a href="https://github.com/JM01lab/aws-terragrunt-examples/tree/main/part2-component-isolation/dev" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>devops</category>
      <category>infrastructureascode</category>
    </item>
    <item>
      <title>Stop Copy-Pasting Terraform State Configs: Use Terragrunt instead</title>
      <dc:creator>Jeancy Joachim Mukaka</dc:creator>
      <pubDate>Mon, 13 Apr 2026 14:31:31 +0000</pubDate>
      <link>https://dev.to/jeancy/stop-copy-pasting-terraform-state-configs-use-terragrunt-instead-ana</link>
      <guid>https://dev.to/jeancy/stop-copy-pasting-terraform-state-configs-use-terragrunt-instead-ana</guid>
      <description>&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;br&gt;
Before getting started, make sure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic knowledge of Terraform (HCL Syntax, resources, variables, remote state), the full prerequisite code is available in the &lt;a href="https://github.com/JM01lab/aws-terraform-infrastructure" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Terraform installed on your machine (v0.12 or higher)&lt;/li&gt;
&lt;li&gt;Terragrunt installed, check the &lt;a href="https://terragrunt.gruntwork.io/docs/getting-started/install/" rel="noopener noreferrer"&gt;official installation guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An AWS account with suffficient permissions to create S3  buckets and DynamoDB tables&lt;/li&gt;
&lt;li&gt;AWS CLI configured with your credentials (aws configure)&lt;/li&gt;
&lt;li&gt;Visual Studio Codes as your code editor, with the &lt;a href="https://marketplace.visualstudio.com/items?itemName=HashiCorp.terraform" rel="noopener noreferrer"&gt;HashiCorp Terraform extension&lt;/a&gt; for syntax highlighting and autocompletion. &lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you have been working with Terraform for a while, you have probably faced this situation: you have a working configuration for your dev environment, and now you need to deploy the same infrastructure to staging and prod. So you copy the folder, update a few values, including the remote state backend configuration, and repeat. It works, but something feels wrong.&lt;br&gt;
That "something" is a violation of the DRY principle, don't repeat yourself. Every time you duplicate your backend configuration, you create a new opportunity for error and a new file to maintain. &lt;br&gt;
In this article, we will explore how Terragrunt solves this problem by allowing you to define your remote state configuration once and reuse it across all your environments.&lt;br&gt;
If you are new to terraform, I recommend exploring &lt;a href="https://github.com/JM01lab/aws-terraform-infrastructure" rel="noopener noreferrer"&gt;prerequisite code on GitHub&lt;/a&gt; before diving in.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The Problem: Repeat Remote State&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When managing multiple environments with Terraform, most developers end up with a structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;environments/
├── dev/
│   └── main.tf
├── staging/
│   └── main.tf
└── prod/
    └── main.tf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And inside each main.tf, the same backend block appears with only one line changing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dev/main.tf&lt;/span&gt;
&lt;span class="k"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-terraform-state"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev/terraform.tfstate"&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-west-2"&lt;/span&gt;
    &lt;span class="nx"&gt;use_lockfile&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; 
    &lt;span class="nx"&gt;encrypt&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same block is then copy-pasted into staging/main.tf and prod/main.tf, with only the key value changing (staging/terraform.tfstate, prod/terraform.tfsate). That's three files, three times the same configuration. And if you ever need to change the bucket name, the region, or encryption, you have to update every single file manually. This is exactly the kind of repetition that leads to human error and maintenance nightmares.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is Terragrunt?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Terragrunt is a thin wrapper around Terraform, developed by Gruntwork, It doesn't replace Terraform, it enhances it by providing additional tools to keep your configurations DRY, manageable, and consistent across environments.&lt;br&gt;
Think of it this way: Terraform is the engine, and Terragrunt is the intelligent framework built around it. You still write the same HCL code you know, but Terragrunt handless the repetitive parts for you.&lt;br&gt;
With Terragrunt you can: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define your remote state configuration once and reuse it across all environments&lt;/li&gt;
&lt;li&gt;Automatically create your S3 bucket and DynamoDB table if they don't exist&lt;/li&gt;
&lt;li&gt;Deploy multiple environments with a single command&lt;/li&gt;
&lt;li&gt;Keep your codebase clean, readable, and easy to maintain
The key concept we'll focus on in this article is the remote_state block — the feature that eliminates repeated backend configurations across environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The Solution: Centralized Remote State with Terragrunt&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of repeating the backend configuration in every environment, Terragrunt lets you define it once in a root terragrunt hcl file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── terragrunt.hcl        ← defined once here
├── dev/
│   └── terragrunt.hcl    ← only what changes
├── staging/
│   └── terragrunt.hcl    ← only what changes
└── prod/
    └── terragrunt.hcl    ← only what changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root terragrunt.hcl contains the full remote state configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# terragrunt.hcl (root)&lt;/span&gt;
&lt;span class="nx"&gt;remote_state&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt;

  &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"your-terraform-state-bucket"&lt;/span&gt;
  &lt;span class="nx"&gt;key&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${path_relative_to_include()}/terraform.tfstate"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-west-2"&lt;/span&gt;
  &lt;span class="nx"&gt;encrypt&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;use_lockfile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Native S3 locking — replaces DynamoDB (Terraform v1.11+)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;generate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"backend.tf"&lt;/span&gt;
    &lt;span class="nx"&gt;if_exists&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"overwrite_terragrunt"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; As of Terraform v1.11, DynamoDB-based state &lt;br&gt;
locking is deprecated. This example uses native S3 locking &lt;br&gt;
via &lt;code&gt;use_lockfile = true&lt;/code&gt;. Thanks to Paul Marcelin for &lt;br&gt;
pointing this out!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each environment file simply inherits from the root. Here is the dev example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dev/terragrunt.hcl&lt;/span&gt;
&lt;span class="nx"&gt;include&lt;/span&gt; &lt;span class="s2"&gt;"root"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;find_in_parent_folders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;environment&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t2.micro"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The staging and prod files follow the exact same structure, only the environment and instance_type values change. That's it. Three environments, three small files, each containing only what is unique to that environment. The backend configuration lives in one place and is never repeated.&lt;br&gt;
 &lt;em&gt;The full project structure with all environments is available in the GitHub repository.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Key Terragrunt Functions Explained&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Two functions make all of this possible. Understanding them is the key to mastering Terragrunt.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;find_in_parent_folders()&lt;/strong&gt;
This funtcion automatically searches parent directories for the root terragrunt.hcl file. It allows each environment file to inherit the root configuration without hardcoding the path.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;include&lt;/span&gt; &lt;span class="s2"&gt;"root"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;find_in_parent_folders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# finds ../../terragrunt.hcl automatically&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;No matter how deeply nested your environment folder is, Terragrunt will always find the root configuration.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;path_relative_to_include()&lt;/strong&gt;
This is the function that makes the state key dynamic. It returns the relative path of the current environment folder from the root.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${path_relative_to_include()}/terraform.tfstate"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Concretely, this means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Environment folder | Generated state key |
| :--- | :--- |
| `dev/` | `dev/terraform.tfstate` |
| `staging/` | `staging/terraform.tfstate` |
| `prod/` | `prod/terraform.tfstate` |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each environment automatically gets its own isolated state file in S3, with zero manual configuration.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;The generate Block&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;This block is often overlooked but extremely powerful. It tells Terragrunt to automatically generate a backend.tf file in each environment folder before running Terraform.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;generate&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"backend.tf"&lt;/span&gt;
  &lt;span class="nx"&gt;if_exists&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"overwrite_terragrunt"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means you never have to manually write a backend.tf file again. Terragrunt generates it for you, every time, with the correct values.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Deploying All Environments&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;once your configuration is in place, deploying all environments is as simple as running a single command from the root folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy all environments at once&lt;/span&gt;
terragrunt run-all apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terragrunt will automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect all terragrunt.hcl files in subdirectories&lt;/li&gt;
&lt;li&gt;Run terraform init for each environment&lt;/li&gt;
&lt;li&gt;Deploy each environment in the correct order&lt;/li&gt;
&lt;li&gt;Create the S3 bucket and DynamoDB table if they don't exist yet
you can also target a specific environment:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy only dev&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;dev/
terragrunt apply

&lt;span class="c"&gt;# Check outputs across all environments&lt;/span&gt;
terragrunt run-all output

&lt;span class="c"&gt;# Destroy all environments&lt;/span&gt;
terragrunt run-all destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare this to the old approach where you had to navigate into each folder manually, run terraform init, then terraform apply, and repeat for every environment. With Terragrunt, that entire workflow collapses into one command.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Managing Terraform remote state across multiple environments doesn't have to be painful. With Terragrunt's remotestate block, &lt;code&gt;find_in_parent_folders()&lt;/code&gt;, and &lt;code&gt;path_relative_to_include()&lt;/code&gt;, you can define your backend configuration once and let Terragrunt handle the rest.&lt;br&gt;
Let's recap what we covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The problem: repeated backend configuration across environments violate DRY principle&lt;/li&gt;
&lt;li&gt;The solution: a single root terragrunt.hcl that centralizes the remote state configuration&lt;/li&gt;
&lt;li&gt;The magic functions: &lt;code&gt;find_in_parent_folders()&lt;/code&gt; and &lt;code&gt;path_relative_to_include()&lt;/code&gt;that make everything dynamic&lt;/li&gt;
&lt;li&gt;The power of Terragrunt run-all apply: deploy all environments in one command.
This is just the beginning of what Terragrunt can do. In the next article, Part 2, we will go deeper and explore how to split your Terraform dependency blocks. You will learn why putting your VPC, your security groups, and your EC2 instances in the same state file is a risk and how to fix it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you found this article helpful, feel free to share it and follow me for Part 2. The code for this article is available on &lt;a href="https://github.com/JM01lab/aws-terragrunt-examples" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>infrastructureascode</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
