<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paril Sanghvi</title>
    <description>The latest articles on DEV Community by Paril Sanghvi (@paril_sanghvi_bf1c9276086).</description>
    <link>https://dev.to/paril_sanghvi_bf1c9276086</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3717139%2Fe7ff01b2-f5fa-4d7c-b437-a89c07a3ee91.png</url>
      <title>DEV Community: Paril Sanghvi</title>
      <link>https://dev.to/paril_sanghvi_bf1c9276086</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/paril_sanghvi_bf1c9276086"/>
    <language>en</language>
    <item>
      <title>We Cut AWS Onboarding from 7 Days to 1 Hour with Terragrunt (Here's How)</title>
      <dc:creator>Paril Sanghvi</dc:creator>
      <pubDate>Sat, 17 Jan 2026 21:04:56 +0000</pubDate>
      <link>https://dev.to/paril_sanghvi_bf1c9276086/we-cut-aws-onboarding-from-7-days-to-1-hour-with-terragrunt-heres-how-54f0</link>
      <guid>https://dev.to/paril_sanghvi_bf1c9276086/we-cut-aws-onboarding-from-7-days-to-1-hour-with-terragrunt-heres-how-54f0</guid>
      <description>&lt;p&gt;We re-architected our AWS infrastructure from manual provisioning → Terraform → Terragrunt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; New environment onboarding dropped from &lt;strong&gt;~7 days to ~1 hour&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But the path there wasn't obvious. We hit real problems that the Terragrunt docs don't warn you about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge: Multi-Account AWS with Real DR
&lt;/h2&gt;

&lt;p&gt;Our constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each client runs in &lt;strong&gt;their own AWS account&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Dev/UAT/Prod/DR all in the &lt;strong&gt;same account&lt;/strong&gt; (different regions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2-person platform team&lt;/strong&gt; (optimize for simplicity, not clever automation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual RTO/RPO targets&lt;/strong&gt; we had to meet (1 hour / &amp;lt;10 seconds)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manual infrastructure didn't scale. Terraform helped, but every new environment meant copying hundreds of lines of config.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dependency Cycle That Almost Broke Us
&lt;/h2&gt;

&lt;p&gt;Here's a problem the docs don't prepare you for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3 Cross-Region Replication creates a circular dependency:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Source + destination buckets must exist&lt;/li&gt;
&lt;li&gt;IAM policies reference both buckets&lt;/li&gt;
&lt;li&gt;Replication rules reference IAM roles&lt;/li&gt;
&lt;li&gt;Everything references everything 😅&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Terraform's dependency graph just... chokes.&lt;/p&gt;

&lt;p&gt;We had to split resources into "physical" and "logical" layers. I'll show you the exact pattern in the full article.&lt;/p&gt;




&lt;h2&gt;
  
  
  The "80 GB Cache" Problem
&lt;/h2&gt;

&lt;p&gt;Terragrunt's &lt;code&gt;.terragrunt-cache&lt;/code&gt; grew to &lt;strong&gt;75-80 GB&lt;/strong&gt; in our CI/CD pipeline and crashed build agents.&lt;/p&gt;

&lt;p&gt;The fix wasn't obvious, and it's not in the troubleshooting docs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Worked
&lt;/h2&gt;

&lt;p&gt;After 2 weeks of building this (with AI-powered tools to accelerate), here's what we landed on:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Repository Structure That Scales&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Infra/
  terragrunt.hcl              # Root config (DRY logic)
  &amp;lt;client&amp;gt;/
    &amp;lt;env&amp;gt;/
      &amp;lt;region&amp;gt;/
        &amp;lt;resource-type&amp;gt;/
          &amp;lt;resource&amp;gt;/
            terragrunt.hcl    # Leaf config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The folder structure IS the documentation. If you can navigate the filesystem, you understand the infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Pilot-Light DR Strategy&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Keep data warm (always replicating), keep compute cold (spin up on failover).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always on:&lt;/strong&gt; Aurora Global DB, S3 CRR, networking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provision on DR:&lt;/strong&gt; EKS, OpenSearch, Redis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This balances cost with our 1-hour RTO target.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Skip DynamoDB Locking (For Now)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Controversial take: For a 2-person team, human coordination beats complex locking.&lt;/p&gt;

&lt;p&gt;We enforced "one set of hands on infra" via agreement. If you're scaling, revisit this.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Lessons I Wish I Knew Earlier
&lt;/h2&gt;

&lt;p&gt;❌ &lt;strong&gt;Don't over-modularize too early.&lt;/strong&gt; Use Terraform first to stop variance, then Terragrunt to stop duplication.&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Don't ignore circular dependencies in DR.&lt;/strong&gt; S3 CRR and Aurora Global DB will break your dependency graph. You need layers.&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Do test your DR regularly.&lt;/strong&gt; We ran drills every 6 months. Untested DR is just a hypothesis.&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Do design for your team size.&lt;/strong&gt; 2 people need clarity &amp;gt; automation complexity.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Architecture Breakdown
&lt;/h2&gt;

&lt;p&gt;I wrote a detailed deep dive covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact Terragrunt folder structure we use&lt;/li&gt;
&lt;li&gt;How root inheritance keeps configs DRY (with code examples)&lt;/li&gt;
&lt;li&gt;The dependency cycle solutions for S3 CRR and Aurora Global DB&lt;/li&gt;
&lt;li&gt;Multi-region DR architecture (what stays on, what spins up)&lt;/li&gt;
&lt;li&gt;State management decisions (and why we skipped DynamoDB locking)&lt;/li&gt;
&lt;li&gt;The CI/CD cache cleanup fix that saved us&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;a href="https://www.parilsanghvi.in/blog/terragrunt-aws-dr" rel="noopener noreferrer"&gt;&lt;strong&gt;Read the full article here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Hit a Terragrunt challenge yourself?&lt;/strong&gt; Drop it in the comments. I'll share what worked (or didn't) for us.&lt;/p&gt;

&lt;p&gt;If this was useful, follow me for the next article: &lt;strong&gt;AWS/Azure cost optimization strategies&lt;/strong&gt; - how discount percentages translate to actual runtime hours (spoiler: Azure licensing between D-series and B-series is a trap).&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>devops</category>
      <category>infrastructure</category>
    </item>
  </channel>
</rss>
