<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: TerraformMonkey</title>
    <description>The latest articles on DEV Community by TerraformMonkey (@terraformmonkey).</description>
    <link>https://dev.to/terraformmonkey</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3130291%2F8f98e1fa-d172-4ec0-9804-194570f70eda.png</url>
      <title>DEV Community: TerraformMonkey</title>
      <link>https://dev.to/terraformmonkey</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/terraformmonkey"/>
    <language>en</language>
    <item>
      <title>Azure Disaster Recovery: Why Backup and Failover Aren’t Enough</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Fri, 15 May 2026 09:13:00 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/azure-disaster-recovery-why-backup-and-failover-arent-enough-28k9</link>
      <guid>https://dev.to/terraformmonkey/azure-disaster-recovery-why-backup-and-failover-arent-enough-28k9</guid>
      <description>&lt;h1&gt;
  
  
  Azure Disaster Recovery: Why Backup and Failover Aren’t Enough
&lt;/h1&gt;

&lt;p&gt;Azure disaster recovery is more than keeping workloads alive.&lt;/p&gt;

&lt;p&gt;Yes, workload recovery matters. But a complete Azure disaster recovery strategy also needs to restore the full operating environment around those workloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applications&lt;/li&gt;
&lt;li&gt;Data&lt;/li&gt;
&lt;li&gt;Identities&lt;/li&gt;
&lt;li&gt;Networks&lt;/li&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Routing&lt;/li&gt;
&lt;li&gt;Infrastructure configurations&lt;/li&gt;
&lt;li&gt;Governance controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because when a disaster hits, recovering a VM or restoring a database is only part of the story.&lt;/p&gt;

&lt;p&gt;If the app comes back online but users cannot authenticate, traffic cannot route, policies block deployments, or permissions are missing, you are still not recovered.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/image-link-placeholder" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/image-link-placeholder" alt="Layered Azure disaster recovery architecture" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR ⚡
&lt;/h2&gt;

&lt;p&gt;Azure disaster recovery should cover more than backup and failover.&lt;/p&gt;

&lt;p&gt;Azure provides strong DR building blocks, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Regions&lt;/li&gt;
&lt;li&gt;Availability Zones&lt;/li&gt;
&lt;li&gt;Storage redundancy&lt;/li&gt;
&lt;li&gt;Azure Site Recovery&lt;/li&gt;
&lt;li&gt;Azure Backup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But backup and failover alone do not fully restore the cloud environment.&lt;/p&gt;

&lt;p&gt;Teams also need a way to restore governance controls, network paths, IAM models, and infrastructure configuration within acceptable Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets.&lt;/p&gt;

&lt;p&gt;That is where configuration disaster recovery becomes critical.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Azure Handles Disaster Recovery 🧱
&lt;/h2&gt;

&lt;p&gt;Disaster recovery is not just about restoring data.&lt;/p&gt;

&lt;p&gt;When something breaks, the business also needs to restore the systems that make the environment usable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Networks that route traffic&lt;/li&gt;
&lt;li&gt;Identities that authenticate users and workloads&lt;/li&gt;
&lt;li&gt;Permissions that allow teams to act&lt;/li&gt;
&lt;li&gt;Infrastructure that reflects the last known working state&lt;/li&gt;
&lt;li&gt;Security and governance policies that keep the environment compliant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure provides the platform-level resilience. Microsoft is responsible for keeping Azure’s underlying cloud platform resilient.&lt;/p&gt;

&lt;p&gt;But customers are responsible for designing, protecting, and restoring their own workloads, configurations, access models, and cloud architecture.&lt;/p&gt;

&lt;p&gt;That shared responsibility is where many Azure disaster recovery plans become incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  RTO and RPO: The Two Metrics That Shape DR Strategy 📊
&lt;/h2&gt;

&lt;p&gt;Two metrics define how effective a disaster recovery strategy really is:&lt;/p&gt;

&lt;h3&gt;
  
  
  Recovery Time Objective
&lt;/h3&gt;

&lt;p&gt;RTO defines how quickly your system needs to recover after a disruption.&lt;/p&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How much downtime can the business tolerate?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Recovery Point Objective
&lt;/h3&gt;

&lt;p&gt;RPO defines how much data loss is acceptable.&lt;/p&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How far back in time can you restore without causing unacceptable damage?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The lower your RTO and RPO, the more advanced and costly your DR strategy usually becomes.&lt;/p&gt;

&lt;p&gt;For example, an airline reservation system cannot afford long downtime. Every second matters. That kind of system may require active failover, multi-region replication, and continuous testing.&lt;/p&gt;

&lt;p&gt;A reporting system may be different. If reports are unavailable for a few hours, the business may tolerate it. In that case, a backup-and-restore model may be enough.&lt;/p&gt;

&lt;p&gt;The key is matching the recovery model to the business impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Missing Layer: Infrastructure Configuration Recovery 🧩
&lt;/h2&gt;

&lt;p&gt;Data recovery is not enough if the infrastructure around the data is broken.&lt;/p&gt;

&lt;p&gt;Before restoring workloads and data, teams often need to restore the infrastructure configuration that makes the environment functional.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM roles and permissions&lt;/li&gt;
&lt;li&gt;Network security groups&lt;/li&gt;
&lt;li&gt;Route tables&lt;/li&gt;
&lt;li&gt;Private networking&lt;/li&gt;
&lt;li&gt;DNS records&lt;/li&gt;
&lt;li&gt;Policies&lt;/li&gt;
&lt;li&gt;Resource groups&lt;/li&gt;
&lt;li&gt;SaaS and third-party configuration&lt;/li&gt;
&lt;li&gt;Terraform state and cloud resource definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where ControlMonkey fits into Azure disaster recovery.&lt;/p&gt;

&lt;p&gt;ControlMonkey continuously tracks Azure cloud resources, automatically generates Terraform code, detects drift, and enables rollback to a known stable state.&lt;/p&gt;

&lt;p&gt;In other words, it adds configuration recovery to Azure disaster recovery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/image-link-placeholder" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/image-link-placeholder" alt="Azure configuration disaster recovery workflow" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Azure Disaster Recovery Architecture: Redundancy as Risk Management 🏗️
&lt;/h2&gt;

&lt;p&gt;The first step in building a strong Azure disaster recovery architecture is understanding the building blocks Azure provides.&lt;/p&gt;

&lt;p&gt;These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regions&lt;/li&gt;
&lt;li&gt;Availability Zones&lt;/li&gt;
&lt;li&gt;Storage redundancy&lt;/li&gt;
&lt;li&gt;Cross-region recovery capabilities&lt;/li&gt;
&lt;li&gt;Backup and restore services&lt;/li&gt;
&lt;li&gt;Failover orchestration services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure’s infrastructure is organized from physical infrastructure into logical resilience layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Availability Zones
&lt;/h3&gt;

&lt;p&gt;An Availability Zone contains one or more datacenters with independent power, cooling, and networking.&lt;/p&gt;

&lt;p&gt;If one zone fails, workloads can continue operating in another zone, assuming the application was designed for zone-level resilience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Regions
&lt;/h3&gt;

&lt;p&gt;An Azure Region contains multiple datacenters and may include multiple Availability Zones.&lt;/p&gt;

&lt;p&gt;For high-availability systems, teams often design workloads across zones or across regions, depending on the business requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Region Resilience
&lt;/h3&gt;

&lt;p&gt;For larger disruptions, a zonal design is not enough.&lt;/p&gt;

&lt;p&gt;Cross-region architecture helps protect against broader outages. Some Azure services support paired regions, geo-replication, or geo-redundancy. Others require more manual architecture decisions.&lt;/p&gt;

&lt;p&gt;The key point:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Zonal design protects against local failure. Cross-region design protects against larger regional disruption.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Storage Redundancy and Backup 💾
&lt;/h2&gt;

&lt;p&gt;Azure Storage supports several redundancy options, including local, zone, and geo-redundant replication.&lt;/p&gt;

&lt;p&gt;Azure Backup provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backup policies&lt;/li&gt;
&lt;li&gt;Retention policies&lt;/li&gt;
&lt;li&gt;Recovery points&lt;/li&gt;
&lt;li&gt;Point-in-time restore workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are essential for protecting data.&lt;/p&gt;

&lt;p&gt;But durable data copies do not guarantee that the full workload can be restored into a working, governed environment.&lt;/p&gt;

&lt;p&gt;If the data restores but the surrounding cloud configuration is missing or broken, recovery is still incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Existing Azure Disaster Recovery Solutions 🔁
&lt;/h2&gt;

&lt;p&gt;Azure provides several built-in disaster recovery services. Two of the most important are Azure Site Recovery and Azure Backup.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Azure Site Recovery?
&lt;/h2&gt;

&lt;p&gt;Azure Site Recovery is a managed disaster recovery service that replicates workloads and orchestrates failover and failback.&lt;/p&gt;

&lt;p&gt;It supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure VM replication&lt;/li&gt;
&lt;li&gt;On-premises to Azure recovery&lt;/li&gt;
&lt;li&gt;Recovery plans&lt;/li&gt;
&lt;li&gt;Test failovers&lt;/li&gt;
&lt;li&gt;Failback workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure Site Recovery is useful for warm or hot recovery patterns where speed matters and the cost of replication is acceptable.&lt;/p&gt;

&lt;p&gt;But Site Recovery mainly focuses on workload replication.&lt;/p&gt;

&lt;p&gt;It does not fully capture the surrounding cloud configuration, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM policies&lt;/li&gt;
&lt;li&gt;Network setups&lt;/li&gt;
&lt;li&gt;Routing rules&lt;/li&gt;
&lt;li&gt;Governance policies&lt;/li&gt;
&lt;li&gt;Cloud resource configuration drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means a workload may fail over successfully but still land in an incomplete environment.&lt;/p&gt;

&lt;p&gt;ControlMonkey helps close this gap by capturing the configuration layer that replication alone does not cover.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Azure Backup?
&lt;/h2&gt;

&lt;p&gt;Azure Backup is a cloud-based backup and recovery service for supported Azure workloads.&lt;/p&gt;

&lt;p&gt;It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backup policies&lt;/li&gt;
&lt;li&gt;Retention&lt;/li&gt;
&lt;li&gt;Recovery points&lt;/li&gt;
&lt;li&gt;Snapshot-based restore&lt;/li&gt;
&lt;li&gt;Protection against data loss, corruption, and ransomware scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure Backup is especially useful for cold restore scenarios and data protection.&lt;/p&gt;

&lt;p&gt;But backups protect data, not the full operating environment.&lt;/p&gt;

&lt;p&gt;A backup snapshot usually does not include the IAM model, network paths, routing configuration, SaaS dependencies, or governance controls needed to make the restored system fully usable.&lt;/p&gt;

&lt;p&gt;ControlMonkey fills that gap by capturing and versioning cloud infrastructure state, so the full environment can be reconstructed alongside the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending Azure Disaster Recovery With ControlMonkey 🐒
&lt;/h2&gt;

&lt;p&gt;ControlMonkey extends Azure disaster recovery into the configuration layer.&lt;/p&gt;

&lt;p&gt;It continuously tracks Azure cloud resource state and helps teams restore infrastructure configuration, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network settings&lt;/li&gt;
&lt;li&gt;Security settings&lt;/li&gt;
&lt;li&gt;Identity settings&lt;/li&gt;
&lt;li&gt;Resource definitions&lt;/li&gt;
&lt;li&gt;Terraform-based infrastructure representation&lt;/li&gt;
&lt;li&gt;Drifted or deleted configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the difference between traditional Azure DR and configuration-aware DR with ControlMonkey:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;ControlMonkey&lt;/th&gt;
&lt;th&gt;Traditional Azure DR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary focus&lt;/td&gt;
&lt;td&gt;Configuration and environment recovery&lt;/td&gt;
&lt;td&gt;Workload and data recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource discovery&lt;/td&gt;
&lt;td&gt;Continuous discovery of Azure resources&lt;/td&gt;
&lt;td&gt;Often manual or partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IaC representation&lt;/td&gt;
&lt;td&gt;Real environment converted into Terraform&lt;/td&gt;
&lt;td&gt;Repository-based and may be outdated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rollback&lt;/td&gt;
&lt;td&gt;Snapshot-based rollback&lt;/td&gt;
&lt;td&gt;Often manual restoration steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drift visibility&lt;/td&gt;
&lt;td&gt;Yes, across subscriptions&lt;/td&gt;
&lt;td&gt;Limited or none&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recovery outcome&lt;/td&gt;
&lt;td&gt;Complete, governed, reproducible environment&lt;/td&gt;
&lt;td&gt;Workloads may recover, but environment rebuild can remain manual&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ControlMonkey acts as an infrastructure recovery control plane for Azure disaster recovery.&lt;/p&gt;

&lt;p&gt;It continuously discovers Azure resources, generates Terraform from real environments, detects configuration drift, and enables rollback to a known reliable state.&lt;/p&gt;

&lt;p&gt;This changes what failover means.&lt;/p&gt;

&lt;p&gt;It is no longer only about redirecting traffic or bringing workloads back online.&lt;/p&gt;

&lt;p&gt;It is about restoring a complete environment that is reproducible, governed, and operational.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disaster Recovery Scenarios in Azure 🚨
&lt;/h2&gt;

&lt;p&gt;Different workloads need different recovery strategies.&lt;/p&gt;

&lt;p&gt;An internal tool, customer-facing application, financial system, and compliance-sensitive production environment should not all have the same DR model.&lt;/p&gt;

&lt;p&gt;Here are several common Azure disaster recovery scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Backup-Based Recovery
&lt;/h2&gt;

&lt;p&gt;Backup-based recovery is typically a cold restore model.&lt;/p&gt;

&lt;p&gt;After a disaster, teams restore data from backup and then rebuild or fix the infrastructure configuration around it.&lt;/p&gt;

&lt;p&gt;This is usually the most cost-effective option, but also the slowest.&lt;/p&gt;

&lt;p&gt;It works best for workloads where the business can tolerate lower RTO and RPO requirements, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Internal tools&lt;/li&gt;
&lt;li&gt;Development environments&lt;/li&gt;
&lt;li&gt;Archival systems&lt;/li&gt;
&lt;li&gt;Non-critical reporting systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The risk is that infrastructure configuration may still require manual restoration.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Replication-Based Disaster Recovery
&lt;/h2&gt;

&lt;p&gt;Replication-based DR uses warm or hot standby environments.&lt;/p&gt;

&lt;p&gt;Workloads are replicated to another Azure region or recovery target, allowing faster failover.&lt;/p&gt;

&lt;p&gt;This reduces RTO and RPO, but it also increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost&lt;/li&gt;
&lt;li&gt;Operational complexity&lt;/li&gt;
&lt;li&gt;Testing requirements&lt;/li&gt;
&lt;li&gt;Monitoring requirements&lt;/li&gt;
&lt;li&gt;Architecture complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure Site Recovery is commonly used for this model.&lt;/p&gt;

&lt;p&gt;This approach is stronger than basic backup and restore, but it still needs configuration recovery to ensure the failover environment is actually functional.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Active-Active Resilience
&lt;/h2&gt;

&lt;p&gt;In an active-active architecture, workloads operate across multiple active environments at the same time.&lt;/p&gt;

&lt;p&gt;This model helps support near-zero downtime and is often used for mission-critical systems where even a short outage can cause significant business damage.&lt;/p&gt;

&lt;p&gt;Active-active resilience is powerful, but it requires careful design around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic routing&lt;/li&gt;
&lt;li&gt;Data consistency&lt;/li&gt;
&lt;li&gt;Identity&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;Failover behavior&lt;/li&gt;
&lt;li&gt;Regional dependencies&lt;/li&gt;
&lt;li&gt;Cost management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is not just an infrastructure decision. It is a business continuity decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Full Region or Subscription Failure
&lt;/h2&gt;

&lt;p&gt;Some failures are bigger than a single resource or workload.&lt;/p&gt;

&lt;p&gt;An Azure Region issue or subscription-level access problem can disrupt many services at once.&lt;/p&gt;

&lt;p&gt;That is why local redundancy is not enough for mission-critical systems.&lt;/p&gt;

&lt;p&gt;Teams need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-region recovery paths&lt;/li&gt;
&lt;li&gt;Dependency maps&lt;/li&gt;
&lt;li&gt;Repeatable infrastructure restoration&lt;/li&gt;
&lt;li&gt;Restorable permissions&lt;/li&gt;
&lt;li&gt;Recovery environments that are tested before a crisis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the recovery region exists but the infrastructure configuration is incomplete, the failover can still fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Control Plane Failure and Configuration Loss
&lt;/h2&gt;

&lt;p&gt;Not every disaster affects the data plane.&lt;/p&gt;

&lt;p&gt;Sometimes the data is intact, but the surrounding environment is damaged.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A resource group is deleted&lt;/li&gt;
&lt;li&gt;A policy blocks deployments&lt;/li&gt;
&lt;li&gt;Route tables are misconfigured&lt;/li&gt;
&lt;li&gt;Role assignments disappear&lt;/li&gt;
&lt;li&gt;Network rules are changed&lt;/li&gt;
&lt;li&gt;Terraform state no longer matches reality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These incidents can create partial recovery states.&lt;/p&gt;

&lt;p&gt;On the surface, resources may appear available. But once users or systems try to do real work, the environment fails.&lt;/p&gt;

&lt;p&gt;That is why any serious Azure disaster recovery strategy should include configuration recovery.&lt;/p&gt;

&lt;p&gt;For teams building a broader resilience strategy, Azure disaster recovery should also connect to &lt;a href="https://controlmonkey.io/solution/cyber-resilience-solution/" rel="noopener noreferrer"&gt;cyber resilience&lt;/a&gt; planning. Recovery is not only about outages; it is also about restoring trusted infrastructure after misconfigurations, ransomware, unauthorized changes, or control-plane incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Compliance, Audit, and Regulatory Pressure
&lt;/h2&gt;

&lt;p&gt;Disaster recovery is not only an operations issue.&lt;/p&gt;

&lt;p&gt;For regulated teams, it is also a compliance issue.&lt;/p&gt;

&lt;p&gt;Auditors often expect evidence of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recovery procedures&lt;/li&gt;
&lt;li&gt;Backup coverage&lt;/li&gt;
&lt;li&gt;Tested restore records&lt;/li&gt;
&lt;li&gt;Change logs&lt;/li&gt;
&lt;li&gt;Recovery actions&lt;/li&gt;
&lt;li&gt;Access controls&lt;/li&gt;
&lt;li&gt;Governance enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A static recovery plan in a wiki is not enough.&lt;/p&gt;

&lt;p&gt;Teams need evidence that recovery works.&lt;/p&gt;

&lt;p&gt;That evidence becomes weaker when infrastructure state is not recorded and environments are rebuilt manually.&lt;/p&gt;

&lt;p&gt;In cloud environments, recovery readiness and audit readiness are becoming the same conversation.&lt;/p&gt;

&lt;p&gt;If you cannot prove recoverability, you may also have a compliance gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Hybrid Dependencies and Identity Risk
&lt;/h2&gt;

&lt;p&gt;Many Azure recovery failures come from outside the core application stack.&lt;/p&gt;

&lt;p&gt;The application may restore, but dependencies around it may fail.&lt;/p&gt;

&lt;p&gt;Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity services&lt;/li&gt;
&lt;li&gt;Certificates&lt;/li&gt;
&lt;li&gt;Key Vault access&lt;/li&gt;
&lt;li&gt;Private networking&lt;/li&gt;
&lt;li&gt;VPN connectivity&lt;/li&gt;
&lt;li&gt;ExpressRoute&lt;/li&gt;
&lt;li&gt;On-prem integrations&lt;/li&gt;
&lt;li&gt;Third-party dependencies&lt;/li&gt;
&lt;li&gt;SaaS configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where many DR plans fall short.&lt;/p&gt;

&lt;p&gt;Teams plan around compute and storage, but treat identity and networking as secondary details.&lt;/p&gt;

&lt;p&gt;Then during recovery, the application boots but cannot authenticate. Or it passes health checks but cannot connect to a downstream service.&lt;/p&gt;

&lt;p&gt;Azure disaster recovery needs to treat identity, networking, and dependency mapping as core recovery layers.&lt;/p&gt;

&lt;p&gt;Not as an appendix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Azure Disaster Recovery Architecture With ControlMonkey Embedded 🧠
&lt;/h2&gt;

&lt;p&gt;At enterprise scale, the strongest Azure disaster recovery architecture is layered.&lt;/p&gt;

&lt;p&gt;Azure-native services handle workload and data recovery:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure Site Recovery handles replication and orchestrated failover&lt;/li&gt;
&lt;li&gt;Azure Backup protects recovery points and restore paths&lt;/li&gt;
&lt;li&gt;Regions and Availability Zones improve resilience&lt;/li&gt;
&lt;li&gt;Storage redundancy protects data availability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the surrounding environment also needs protection.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;Permissions&lt;/li&gt;
&lt;li&gt;Policies&lt;/li&gt;
&lt;li&gt;Resource configuration&lt;/li&gt;
&lt;li&gt;Terraform representation&lt;/li&gt;
&lt;li&gt;Drift visibility&lt;/li&gt;
&lt;li&gt;Rollback capability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ControlMonkey adds this missing layer.&lt;/p&gt;

&lt;p&gt;It provides configuration backup, drift detection, rollback, and reproducible infrastructure recovery.&lt;/p&gt;

&lt;p&gt;The mature Azure DR model looks like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Azure Regions + Availability Zones
        ↓
Storage Redundancy + Azure Backup
        ↓
Azure Site Recovery + Failover Plans
        ↓
Identity + Network Dependency Mapping
        ↓
ControlMonkey Configuration Recovery
        ↓
Complete, Governed, Reproducible Recovery

The mature Azure DR model looks like this:

~~~text
Azure Regions + Availability Zones
        ↓
Storage Redundancy + Azure Backup
        ↓
Azure Site Recovery + Failover Plans
        ↓
Identity + Network Dependency Mapping
        ↓
ControlMonkey Configuration Recovery
        ↓
Complete, Governed, Reproducible Recovery
~~~

That is how cloud recovery actually works.

Workloads must recover.

Data must recover.

And the environment around them must recover too.

If you are evaluating [cloud DR products](https://controlmonkey.io/solution/disaster-recovery-solution/), make sure configuration recovery is part of the checklist. Backup and failover matter, but teams also need to restore IAM, networking, policies, routing, and infrastructure state.

## Final Thought 💡

Azure disaster recovery cannot stop at backup and failover.

Those are essential, but they are not enough on their own.

If the recovered environment is missing permissions, routing, policies, identity access, or infrastructure configuration, the business is still exposed.

The real goal is not just to bring workloads back online.

The real goal is to restore a complete, governed, and operational cloud environment.

That is why configuration recovery needs to be part of every serious Azure disaster recovery strategy.

&amp;gt; 💬 How does your team handle Azure configuration recovery today? Is it automated, documented, or still mostly manual? Let’s discuss in the comments.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>azure</category>
      <category>disasterrecovery</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>☁️ What Is Cloud Disaster Recovery?</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Thu, 14 May 2026 13:08:29 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/what-is-cloud-disaster-recovery-38o0</link>
      <guid>https://dev.to/terraformmonkey/what-is-cloud-disaster-recovery-38o0</guid>
      <description>&lt;p&gt;Cloud disaster recovery is the process of restoring cloud workloads after a failure so the business can keep running.&lt;/p&gt;

&lt;p&gt;But cloud DR is not only about restoring data.&lt;/p&gt;

&lt;p&gt;To bring an application back online, teams also need to recover the infrastructure configuration, permissions, DNS, networking, service dependencies, and control-plane settings that allow the workload to run again.&lt;/p&gt;

&lt;p&gt;A backup may restore your database.&lt;/p&gt;

&lt;p&gt;But if IAM roles, routes, DNS records, secrets, or cloud configurations are missing or misconfigured, the application can still stay down.&lt;/p&gt;

&lt;p&gt;That is why modern cloud disaster recovery needs to cover both data recovery and infrastructure recovery.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Cloud disaster recovery helps teams restore cloud workloads after failures.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data&lt;/li&gt;
&lt;li&gt;Infrastructure configuration&lt;/li&gt;
&lt;li&gt;IAM and permissions&lt;/li&gt;
&lt;li&gt;DNS&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;Service dependencies&lt;/li&gt;
&lt;li&gt;External control-layer services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Backups are important, but they do not guarantee recovery.&lt;/p&gt;

&lt;p&gt;A restore can still fail if the surrounding cloud configuration is missing, outdated, or drifted from the intended state.&lt;/p&gt;

&lt;p&gt;This is where infrastructure recovery becomes critical.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚨 Why Cloud Disaster Recovery Matters
&lt;/h2&gt;

&lt;p&gt;Traditional disaster recovery was built around backup sites, extra hardware, and manual runbooks.&lt;/p&gt;

&lt;p&gt;Cloud environments are different.&lt;/p&gt;

&lt;p&gt;Modern workloads depend on many moving parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM roles&lt;/li&gt;
&lt;li&gt;DNS records&lt;/li&gt;
&lt;li&gt;Network routes&lt;/li&gt;
&lt;li&gt;Load balancers&lt;/li&gt;
&lt;li&gt;Secrets&lt;/li&gt;
&lt;li&gt;Queues&lt;/li&gt;
&lt;li&gt;Cloud service configurations&lt;/li&gt;
&lt;li&gt;Automation accounts&lt;/li&gt;
&lt;li&gt;Third-party control-plane services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A cloud incident does not always start with a full outage.&lt;/p&gt;

&lt;p&gt;It can start with a bad configuration, a deleted DNS record, a permissions change, or an automated process making changes at scale.&lt;/p&gt;

&lt;p&gt;That means recovery is no longer just a platform issue. It is also a governance, compliance, and cyber resilience issue.&lt;/p&gt;

&lt;p&gt;For teams looking to strengthen this layer, ControlMonkey’s &lt;a href="https://controlmonkey.io/solution/cyber-resilience-solution/" rel="noopener noreferrer"&gt;Cyber resilience solution&lt;/a&gt; helps recover known-good infrastructure configurations and improve cloud recovery readiness.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Backups Are Not Enough
&lt;/h2&gt;

&lt;p&gt;One of the biggest mistakes in cloud disaster recovery is assuming that backups solve the whole problem.&lt;/p&gt;

&lt;p&gt;They do not.&lt;/p&gt;

&lt;p&gt;Backups protect data.&lt;/p&gt;

&lt;p&gt;But workloads also depend on configuration.&lt;/p&gt;

&lt;p&gt;Your data may be restored successfully, while the application still fails because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The IAM role was changed&lt;/li&gt;
&lt;li&gt;The DNS record was deleted&lt;/li&gt;
&lt;li&gt;A security group blocks traffic&lt;/li&gt;
&lt;li&gt;A required secret is missing&lt;/li&gt;
&lt;li&gt;The route table is wrong&lt;/li&gt;
&lt;li&gt;The load balancer points to the wrong target&lt;/li&gt;
&lt;li&gt;Live infrastructure no longer matches Terraform&lt;/li&gt;
&lt;li&gt;Manual changes created drift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where many DR plans break.&lt;/p&gt;

&lt;p&gt;The data exists.&lt;/p&gt;

&lt;p&gt;The workload still cannot run.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ How Cloud Disaster Recovery Works
&lt;/h2&gt;

&lt;p&gt;Cloud DR usually combines several recovery methods:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Main Purpose&lt;/th&gt;
&lt;th&gt;Recovery Speed&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backup&lt;/td&gt;
&lt;td&gt;Durable copy for later restore&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snapshot&lt;/td&gt;
&lt;td&gt;Point-in-time state capture&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replication&lt;/td&gt;
&lt;td&gt;Keep a secondary copy close to current state&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Backups, snapshots, and replication solve different problems.&lt;/p&gt;

&lt;p&gt;The right strategy depends on business impact, recovery targets, and how quickly the workload needs to return to service.&lt;/p&gt;

&lt;p&gt;But none of these methods fully solves infrastructure configuration recovery on its own.&lt;/p&gt;

&lt;p&gt;That is why cloud DR also needs visibility into the live environment, dependency mapping, drift detection, and rollback to known-good states.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 The Hidden Gap: Infrastructure Configuration
&lt;/h2&gt;

&lt;p&gt;As cloud environments grow across accounts, regions, teams, and unmanaged resources, recovery starts depending on tribal knowledge.&lt;/p&gt;

&lt;p&gt;That does not hold up well during an incident.&lt;/p&gt;

&lt;p&gt;A workload may depend on hundreds of configuration details that are not part of a database backup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM policies&lt;/li&gt;
&lt;li&gt;Role trust relationships&lt;/li&gt;
&lt;li&gt;DNS records&lt;/li&gt;
&lt;li&gt;CDN settings&lt;/li&gt;
&lt;li&gt;Network routes&lt;/li&gt;
&lt;li&gt;Firewall rules&lt;/li&gt;
&lt;li&gt;Kubernetes settings&lt;/li&gt;
&lt;li&gt;Observability alerts&lt;/li&gt;
&lt;li&gt;SaaS control-plane configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If these are missing or out of sync, recovery becomes manual, slow, and risky.&lt;/p&gt;

&lt;p&gt;ControlMonkey focuses on this gap by helping teams capture infrastructure state, roll back to known-good configurations, and improve recovery coverage across cloud environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Testing Cloud DR: Prove Recoverability
&lt;/h2&gt;

&lt;p&gt;A disaster recovery plan is only useful if it has been tested.&lt;/p&gt;

&lt;p&gt;Without testing, teams usually discover gaps during the actual incident.&lt;/p&gt;

&lt;p&gt;A better approach is to restore into a separate environment and validate the full recovery path.&lt;/p&gt;

&lt;p&gt;That means checking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can the workload start?&lt;/li&gt;
&lt;li&gt;Are dependencies available?&lt;/li&gt;
&lt;li&gt;Are secrets accessible?&lt;/li&gt;
&lt;li&gt;Are IAM permissions correct?&lt;/li&gt;
&lt;li&gt;Does DNS resolve correctly?&lt;/li&gt;
&lt;li&gt;Does traffic flow as expected?&lt;/li&gt;
&lt;li&gt;Does restored infrastructure match the intended state?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing gives engineering leaders and auditors what they actually need: verified recovery coverage, measured recovery time, known gaps, and evidence that recovery is controlled.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔁 Failover and Failback
&lt;/h2&gt;

&lt;p&gt;Failover is not just turning systems back on.&lt;/p&gt;

&lt;p&gt;Cloud workloads need to be restored in the right order.&lt;/p&gt;

&lt;p&gt;DNS may update before dependencies are ready. A service may come online before its permissions exist. A workload may start before the network path is complete.&lt;/p&gt;

&lt;p&gt;Small ordering mistakes can turn a short outage into a long one.&lt;/p&gt;

&lt;p&gt;Failback can be even harder.&lt;/p&gt;

&lt;p&gt;During an incident, teams often make emergency fixes. Data moves. Permissions change. Manual workarounds appear.&lt;/p&gt;

&lt;p&gt;To return to the primary environment safely, teams need to decide what the source of truth is and remove incident shortcuts before they become permanent drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏢 Cloud DR vs Traditional DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Traditional DR&lt;/th&gt;
&lt;th&gt;Cloud DR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure model&lt;/td&gt;
&lt;td&gt;Duplicate hardware and facilities&lt;/td&gt;
&lt;td&gt;Elastic cloud capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recovery work&lt;/td&gt;
&lt;td&gt;Manual procedures&lt;/td&gt;
&lt;td&gt;Automation and orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing cadence&lt;/td&gt;
&lt;td&gt;Often infrequent&lt;/td&gt;
&lt;td&gt;Easier to test more often&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drift risk&lt;/td&gt;
&lt;td&gt;Lower change velocity&lt;/td&gt;
&lt;td&gt;Higher change velocity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost model&lt;/td&gt;
&lt;td&gt;High fixed cost&lt;/td&gt;
&lt;td&gt;Variable operating cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Restore scope&lt;/td&gt;
&lt;td&gt;Systems and data center assets&lt;/td&gt;
&lt;td&gt;Data, infra config, identity, networking, and control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cloud DR gives teams more flexibility.&lt;/p&gt;

&lt;p&gt;But it also increases the need for visibility, automation, and configuration control.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Where ControlMonkey Fits
&lt;/h2&gt;

&lt;p&gt;ControlMonkey helps teams recover cloud infrastructure configurations across environments such as AWS, Azure, GCP, Cloudflare, Okta, and selected third-party platforms.&lt;/p&gt;

&lt;p&gt;This matters because many production incidents are configuration incidents.&lt;/p&gt;

&lt;p&gt;A workload can break because of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A bad IAM policy&lt;/li&gt;
&lt;li&gt;A deleted DNS record&lt;/li&gt;
&lt;li&gt;A wrong route&lt;/li&gt;
&lt;li&gt;A missing edge setting&lt;/li&gt;
&lt;li&gt;A drifted security group&lt;/li&gt;
&lt;li&gt;A rushed manual fix&lt;/li&gt;
&lt;li&gt;An unmanaged cloud resource&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your recovery plan only restores data, your team may still need to rebuild the rest of the environment under pressure.&lt;/p&gt;

&lt;p&gt;ControlMonkey helps teams improve cloud disaster recovery with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform-based infrastructure snapshots&lt;/li&gt;
&lt;li&gt;Rollback to known-good states&lt;/li&gt;
&lt;li&gt;Drift visibility&lt;/li&gt;
&lt;li&gt;Recovery coverage visibility&lt;/li&gt;
&lt;li&gt;Better alignment between cloud reality and IaC&lt;/li&gt;
&lt;li&gt;Audit-ready recovery evidence&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📋 Cloud DR and Compliance
&lt;/h2&gt;

&lt;p&gt;Cloud disaster recovery becomes especially important when teams need to prove readiness.&lt;/p&gt;

&lt;p&gt;The question is no longer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can we recover?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What can we recover, from where, by whom, how fast, and with what proof?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Compliance teams need evidence of tested restore procedures, recovery ownership, infrastructure state history, and known gaps.&lt;/p&gt;

&lt;p&gt;That is why cloud DR should be treated as part of cyber resilience, not just backup operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Cloud disaster recovery is not just about restoring data.&lt;/p&gt;

&lt;p&gt;It is about restoring the full cloud environment required to run the business.&lt;/p&gt;

&lt;p&gt;That includes infrastructure configuration, permissions, network paths, DNS, dependencies, and control-plane services.&lt;/p&gt;

&lt;p&gt;Backups help preserve data.&lt;/p&gt;

&lt;p&gt;Infrastructure recovery helps bring the workload back online.&lt;/p&gt;

&lt;p&gt;That is the difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Discussion
&lt;/h2&gt;

&lt;p&gt;How does your team test cloud disaster recovery today?&lt;/p&gt;

&lt;p&gt;Do you validate only data recovery, or do you also test IAM, DNS, networking, Terraform drift, and configuration dependencies?&lt;/p&gt;

&lt;p&gt;Let’s discuss in the comments.&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>clouddisaster</category>
      <category>outage</category>
      <category>programming</category>
    </item>
    <item>
      <title>Affected by the AWS Outage? 5 Things to Do Tomorrow for Your Cloud Resilience ⚡</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Wed, 26 Nov 2025 20:19:42 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/affected-by-the-aws-outage-5-things-to-do-tomorrow-for-your-cloud-resilience-2emb</link>
      <guid>https://dev.to/terraformmonkey/affected-by-the-aws-outage-5-things-to-do-tomorrow-for-your-cloud-resilience-2emb</guid>
      <description>&lt;p&gt;In a recent large-scale AWS outage, more than &lt;strong&gt;6.5 million disruption reports&lt;/strong&gt; were logged across banks, airlines, AI companies, and apps like Snapchat and Fortnite.  &lt;/p&gt;

&lt;p&gt;Root cause: a malfunction in &lt;strong&gt;AWS’s EC2 network monitoring subsystem&lt;/strong&gt; that cascaded across multiple regions.&lt;/p&gt;

&lt;p&gt;For DevOps and cloud teams, this wasn’t “a few minutes of downtime.”&lt;br&gt;&lt;br&gt;
It was a blunt reminder:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Disaster Recovery isn’t just about data.&lt;br&gt;&lt;br&gt;
Real &lt;strong&gt;cloud&lt;/strong&gt; disaster recovery means protecting your &lt;strong&gt;entire configuration&lt;/strong&gt; — infrastructure, policies, and dependencies — not just storage.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When configuration breaks, recovery breaks with it.&lt;/p&gt;

&lt;p&gt;This post walks through &lt;strong&gt;five things you can do tomorrow&lt;/strong&gt; to harden cloud resilience — not just data recovery, but &lt;strong&gt;fast configuration recovery&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. 🔍 Audit What You &lt;em&gt;Really&lt;/em&gt; Run
&lt;/h2&gt;

&lt;p&gt;Start with visibility.&lt;/p&gt;

&lt;p&gt;Use tools like the &lt;strong&gt;AWS Well-Architected Tool&lt;/strong&gt; to baseline your setup and map the resources your workloads depend on — across services, regions, and integrations.&lt;/p&gt;

&lt;p&gt;Questions to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which regions host your most critical workloads?&lt;/li&gt;
&lt;li&gt;Do you have single-region choke points?&lt;/li&gt;
&lt;li&gt;Are there “shadow” or untracked resources in production?&lt;/li&gt;
&lt;li&gt;Are staging and test environments included in your DR scope?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many teams found out the hard way that their most sensitive workloads lived in &lt;strong&gt;&lt;code&gt;us-east-1&lt;/code&gt; — the region most impacted in the outage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Untracked resources become silent risks for any Cloud DR strategy. You can’t protect what you can’t see.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Action item:&lt;/strong&gt; Build or refresh a &lt;strong&gt;single source of truth&lt;/strong&gt; for all cloud resources that matter to uptime.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  2. 🧱 Close the IaC Gap
&lt;/h2&gt;

&lt;p&gt;If you were forced to click around in the AWS console during the outage, it’s a warning sign:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Parts of your environment still live &lt;strong&gt;outside&lt;/strong&gt; Infrastructure as Code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Typical IaC gaps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Legacy stacks not migrated
&lt;/li&gt;
&lt;li&gt;ClickOps-created resources
&lt;/li&gt;
&lt;li&gt;“Temporary” patches that became permanent
&lt;/li&gt;
&lt;li&gt;Manually tuned network or security settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your goal: &lt;strong&gt;minimize the infrastructure that can’t be recreated from code.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bring those gaps under Terraform or another IaC tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: capturing a "previously manual" security group in Terraform&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"api_sg"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-sg"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Security group for public API"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow HTTPS"&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When everything lives in code, Cloud DR becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable
&lt;/li&gt;
&lt;li&gt;Repeatable
&lt;/li&gt;
&lt;li&gt;Region-agnostic
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. 🧪 Run a “Mini AWS Outage” Drill
&lt;/h2&gt;

&lt;p&gt;Don’t wait for the next global event to test your resilience.&lt;/p&gt;

&lt;p&gt;Pick one critical service and simulate a regional failure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Assume a region is down.
&lt;/li&gt;
&lt;li&gt;Try to bring the service up in an alternate region or environment.
&lt;/li&gt;
&lt;li&gt;Measure:

&lt;ul&gt;
&lt;li&gt;Time to detect
&lt;/li&gt;
&lt;li&gt;Time to fail over
&lt;/li&gt;
&lt;li&gt;Time to full restore
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Validate your assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did runbooks match reality?
&lt;/li&gt;
&lt;li&gt;Did scripts still work?
&lt;/li&gt;
&lt;li&gt;Were secrets, configs, and dependencies all accessible?
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These drills expose where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automation is missing
&lt;/li&gt;
&lt;li&gt;Documentation is outdated
&lt;/li&gt;
&lt;li&gt;Human steps introduce delays
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Action item:&lt;/strong&gt; Schedule a &lt;strong&gt;60–90 min mini-DR drill&lt;/strong&gt; this week for one critical system.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. 🌪️ Detect and Eliminate Drift
&lt;/h2&gt;

&lt;p&gt;Every outage reveals hidden &lt;strong&gt;drift&lt;/strong&gt; — when live infra no longer matches your IaC.&lt;/p&gt;

&lt;p&gt;Drift during recovery leads to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Failed redeployments
&lt;/li&gt;
&lt;li&gt;Security inconsistencies
&lt;/li&gt;
&lt;li&gt;Environments behaving unpredictably
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common drift sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hotfixes applied in the AWS console
&lt;/li&gt;
&lt;li&gt;Emergency manual security group changes
&lt;/li&gt;
&lt;li&gt;One-off scripts creating untracked resources
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep code and infra aligned by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously comparing live infra to your IaC
&lt;/li&gt;
&lt;li&gt;Alerting on unmanaged changes
&lt;/li&gt;
&lt;li&gt;Auto-remediating drift when safe
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When your code mirrors reality, recovery is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean
&lt;/li&gt;
&lt;li&gt;Fast
&lt;/li&gt;
&lt;li&gt;Auditable
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. ⏪ Automate Daily Snapshots and Recovery Workflows
&lt;/h2&gt;

&lt;p&gt;Traditional backups protect data — &lt;strong&gt;not operations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For real Cloud DR maturity, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily infrastructure snapshots&lt;/strong&gt; (configs, policies, dependencies)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated rebuild workflows&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capture Terraform state + config in a central versioned repo
&lt;/li&gt;
&lt;li&gt;Use nightly CI jobs to validate plans
&lt;/li&gt;
&lt;li&gt;Archive validated DR artifacts
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Nightly job example (simplified)&lt;/span&gt;
terraform init
terraform validate
terraform plan &lt;span class="nt"&gt;-out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nightly.tfplan

&lt;span class="c"&gt;# Archive plan &amp;amp; state for DR artifacts&lt;/span&gt;
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-czf&lt;/span&gt; dr-artifacts-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F&lt;span class="si"&gt;)&lt;/span&gt;.tar.gz &lt;span class="se"&gt;\&lt;/span&gt;
  nightly.tfplan terraform.tfstate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These snapshots are essentially a &lt;strong&gt;cloud time machine&lt;/strong&gt;, enabling quick rebuilds when (not if) outages occur.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Resilience Can’t Depend on One Provider
&lt;/h2&gt;

&lt;p&gt;The AWS outage showed the fragility of shared cloud infrastructure.&lt;/p&gt;

&lt;p&gt;Your systems might depend on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS, Azure, GCP
&lt;/li&gt;
&lt;li&gt;Datadog or other observability tools
&lt;/li&gt;
&lt;li&gt;Cloudflare or other CDNs
&lt;/li&gt;
&lt;li&gt;Managed databases
&lt;/li&gt;
&lt;li&gt;SaaS APIs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avoid single-region and single-AZ designs
&lt;/li&gt;
&lt;li&gt;Understand third-party blast radius
&lt;/li&gt;
&lt;li&gt;Treat DR as &lt;strong&gt;end-to-end&lt;/strong&gt;: infra, data, configs, dependencies
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 AWS Outage FAQs for DevOps Teams
&lt;/h2&gt;

&lt;h3&gt;
  
  
  💡 What caused the AWS outage?
&lt;/h3&gt;

&lt;p&gt;A failure in the &lt;strong&gt;EC2 network monitoring subsystem&lt;/strong&gt; disrupted instance communication and caused widespread downtime, especially in &lt;strong&gt;us-east-1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Always check the official AWS Service Health Dashboard for active incidents.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛡️ How can DevOps teams prepare for the next outage?
&lt;/h3&gt;

&lt;p&gt;A practical playbook includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visibility &amp;amp; audits
&lt;/li&gt;
&lt;li&gt;IaC coverage
&lt;/li&gt;
&lt;li&gt;Drift detection
&lt;/li&gt;
&lt;li&gt;Snapshots &amp;amp; automated recovery
&lt;/li&gt;
&lt;li&gt;Regular DR drills
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📚 Want to Go Deeper on Cloud Disaster Recovery?
&lt;/h2&gt;

&lt;p&gt;Long-form version of this article:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://controlmonkey.io/blog/aws-outage-cloud-disaster-recovery/" rel="noopener noreferrer"&gt;https://controlmonkey.io/blog/aws-outage-cloud-disaster-recovery/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Related deep dives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IaC &amp;amp; DR strategy: &lt;a href="https://controlmonkey.io/blog/infra-as-code-critical-aspect-for-your-disaster-recovery-plan/" rel="noopener noreferrer"&gt;https://controlmonkey.io/blog/infra-as-code-critical-aspect-for-your-disaster-recovery-plan/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Business continuity &amp;amp; DR guide: &lt;a href="https://controlmonkey.io/resource/cloud-business-continuity-and-disaster-recovery/" rel="noopener noreferrer"&gt;https://controlmonkey.io/resource/cloud-business-continuity-and-disaster-recovery/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💬 Let’s Talk: How Are You Preparing for the Next Outage?
&lt;/h2&gt;

&lt;p&gt;Outages are inevitable. Downtime doesn’t have to be.&lt;/p&gt;

&lt;p&gt;I’d love to hear from other DevOps leaders and platform teams:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have you run a DR drill in the last 6 months?
&lt;/li&gt;
&lt;li&gt;Where does your plan break first — infra, data, or people?
&lt;/li&gt;
&lt;li&gt;What tools or patterns helped the most?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop your lessons learned in the comments 👇&lt;/p&gt;




</description>
      <category>cloud</category>
      <category>aws</category>
      <category>ai</category>
    </item>
    <item>
      <title>Terraform Plan: Your Last Line of Defense Before Infrastructure Changes</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Wed, 26 Nov 2025 20:08:37 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/terraform-plan-your-last-line-of-defense-before-infrastructure-changes-5ge1</link>
      <guid>https://dev.to/terraformmonkey/terraform-plan-your-last-line-of-defense-before-infrastructure-changes-5ge1</guid>
      <description>&lt;p&gt;Terraform &lt;code&gt;plan&lt;/code&gt; is the guardrail between your code and your live infrastructure. Every time you run it, Terraform compares your desired configuration with the current state and shows you &lt;strong&gt;exactly&lt;/strong&gt; what’s going to change — before anything actually happens.&lt;/p&gt;

&lt;p&gt;If you want to avoid destructive changes, catch drift early, and prevent misconfigured variables from sneaking into production, this guide is for you. 🚀&lt;/p&gt;

&lt;p&gt;This post covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How the Terraform plan engine works
&lt;/li&gt;
&lt;li&gt;How to read plan output (add/change/destroy)
&lt;/li&gt;
&lt;li&gt;How to automate plan checks in CI/CD
&lt;/li&gt;
&lt;li&gt;Common flags you'll actually use
&lt;/li&gt;
&lt;li&gt;Real copy/paste examples
&lt;/li&gt;
&lt;li&gt;Team-friendly best practices
&lt;/li&gt;
&lt;li&gt;Bonus: risk-aware reviews with ControlMonkey
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s get into it.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 Terraform Plan Basics
&lt;/h2&gt;

&lt;p&gt;Terraform &lt;code&gt;plan&lt;/code&gt; generates an execution plan &lt;strong&gt;without changing any resources&lt;/strong&gt;. It refreshes state (unless disabled), evaluates providers and data sources, compares current vs. desired state, and prints a diff of proposed actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔧 Common &lt;code&gt;terraform plan&lt;/code&gt; flags you'll actually use
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-out=plan.tfplan        # Save a binary plan file for apply
-refresh=true|false     # Control state refresh before diff
-var / -var-file        # Pass inputs consistently
-target=addr            # Break-glass-only resource targeting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  📘 Exit codes with &lt;code&gt;-detailed-exitcode&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 → No changes  
1 → Error  
2 → Changes present  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👉 Official reference (recommended): &lt;em&gt;terraform plan command reference&lt;/em&gt;&lt;br&gt;&lt;br&gt;
👉 Also relevant: &lt;a href="https://controlmonkey.io/resource/how-to-use-atlantis-plan/" rel="noopener noreferrer"&gt;How to design a Terraform CI/CD pipeline for AWS&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  ⚙️ How Terraform Plan Works Under the Hood
&lt;/h2&gt;

&lt;p&gt;Terraform loads your state (local or remote), optionally refreshes it using provider APIs, and computes the diff.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key components involved:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State&lt;/strong&gt; → Terraform’s source of truth
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Providers&lt;/strong&gt; → Define schemas + CRUD operations
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data sources&lt;/strong&gt; → Read-only lookups executed during the plan
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt; → Infrastructure objects that may be created, updated, replaced, or destroyed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A refresh step pulls the actual state of resources from the provider, and Terraform compares it with what your code declares.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧭 How to Read Terraform Plan Output
&lt;/h2&gt;

&lt;p&gt;Terraform uses clear symbols in the diff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+   create
-   destroy
~   update in-place
-/+ replace (destroy + create)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A typical summary looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Plan: X to add, Y to change, Z to destroy.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🚨 Production rule of thumb
&lt;/h3&gt;

&lt;p&gt;Treat &lt;strong&gt;any destroy&lt;/strong&gt; or &lt;strong&gt;any replace (-/+ )&lt;/strong&gt; as a red-flag that requires a second reviewer.&lt;/p&gt;

&lt;p&gt;Small input changes (e.g., variable tweak, module version update) can cascade into unintended replacements — including databases or network resources.&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 Working With Plan Files + JSON (Automation-Ready)
&lt;/h2&gt;

&lt;p&gt;A recommended workflow is to save the plan, export it to JSON, and run validations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Canonical snippet:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform plan &lt;span class="nt"&gt;-out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;plan.tfplan
terraform show &lt;span class="nt"&gt;-json&lt;/span&gt; plan.tfplan &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; plan.json

&lt;span class="c"&gt;# Count destroys &amp;amp; replaces&lt;/span&gt;
jq &lt;span class="s1"&gt;'[.resource_changes[] | select(.change.actions|index("delete"))] | length'&lt;/span&gt; plan.json
jq &lt;span class="s1"&gt;'[.resource_changes[] | select(.change.actions|index("replace"))] | length'&lt;/span&gt; plan.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What you can automate with JSON:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Block destroys in production unless approved
&lt;/li&gt;
&lt;li&gt;Enforce mandatory tags/owners
&lt;/li&gt;
&lt;li&gt;Fail if predicted cost exceeds budget
&lt;/li&gt;
&lt;li&gt;Annotate PRs with risk indicators
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔄 Terraform Plan Examples: Local CLI → CI/CD
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Local workflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
terraform plan &lt;span class="nt"&gt;-out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;plan.tfplan
terraform show plan.tfplan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Review → approve → apply.&lt;/p&gt;

&lt;h3&gt;
  
  
  Minimal GitHub Actions gate using &lt;code&gt;-detailed-exitcode&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform-plan&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pull_request&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hashicorp/setup-terraform@v3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;terraform_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.6.6&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform init&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;set +e&lt;/span&gt;
          &lt;span class="s"&gt;terraform plan -detailed-exitcode -out=plan.tfplan&lt;/span&gt;
          &lt;span class="s"&gt;echo "code=$?" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;
          &lt;span class="s"&gt;set -e&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload Plan Artifact&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/upload-artifact@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tfplan&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan.tfplan&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fail If Changes Present&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;steps.plan.outputs.code == '2'&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;exit &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🧨 Troubleshooting &amp;amp; Best Practices for Stable Terraform Plans
&lt;/h2&gt;

&lt;p&gt;Here are proven ways to reduce surprises:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔️ Pin provider versions
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;terraform init -upgrade&lt;/code&gt; intentionally, not automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔️ Use a consistent remote backend
&lt;/h3&gt;

&lt;p&gt;S3 + DynamoDB locking, GCS, or Terraform Cloud.&lt;br&gt;&lt;br&gt;
Avoid local state in team environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✔️ Stabilize your plan
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Avoid volatile data sources
&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;depends_on&lt;/code&gt; when needed
&lt;/li&gt;
&lt;li&gt;Keep var-files consistent across environments
&lt;/li&gt;
&lt;li&gt;Align Terraform + provider versions across laptops &amp;amp; CI
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A stable plan = fewer loops of “why is this resource changing again?”&lt;/p&gt;




&lt;h2&gt;
  
  
  🦍 Where ControlMonkey Fits In (Optional but Powerful)
&lt;/h2&gt;

&lt;p&gt;ControlMonkey adds context around Terraform plans so teams spot risk instantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Highlights destroys &amp;amp; replacements
&lt;/li&gt;
&lt;li&gt;Surfaces drift before running plan
&lt;/li&gt;
&lt;li&gt;Enforces org-wide guardrails
&lt;/li&gt;
&lt;li&gt;Adds automatic insights during plan review
&lt;/li&gt;
&lt;li&gt;Runs across GitHub, GitLab, Bitbucket, Azure DevOps
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your team reviews plans daily, the noise reduction alone is a productivity unlock.&lt;/p&gt;

&lt;p&gt;🔗 Related reads:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IaC Risk Index → &lt;a href="https://controlmonkey.io/news/iac-risk-index/" rel="noopener noreferrer"&gt;https://controlmonkey.io/news/iac-risk-index/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Atlantis + Plan Guide → &lt;a href="https://controlmonkey.io/resource/how-to-use-atlantis-plan/" rel="noopener noreferrer"&gt;https://controlmonkey.io/resource/how-to-use-atlantis-plan/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Wrap-Up: Review Terraform Plans With Confidence
&lt;/h2&gt;

&lt;p&gt;Terraform plan is the most important checkpoint in IaC. Use it consistently, export JSON for policy checks, and fail PRs when risky changes appear.&lt;/p&gt;

&lt;p&gt;If you want faster reviews, automated guardrails, and risk-aware change visibility across teams, ControlMonkey can help — request a demo to see how it works.&lt;/p&gt;




&lt;p&gt;💬 &lt;strong&gt;What are your best Terraform plan tips or horror stories? Drop them in the comments — DevOps managers learn best from each other.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>tutorial</category>
      <category>cicd</category>
      <category>devops</category>
    </item>
    <item>
      <title>AWS Outage: Cloud DR — 5 Things to Do Tomorrow</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Mon, 20 Oct 2025 18:17:39 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/aws-outage-cloud-dr-5-things-to-do-tomorrow-24f</link>
      <guid>https://dev.to/terraformmonkey/aws-outage-cloud-dr-5-things-to-do-tomorrow-24f</guid>
      <description>&lt;h2&gt;
  
  
  🌀 Were You Affected by the AWS Outage Today? 5 Things to Do Tomorrow for Your Cloud Resilience
&lt;/h2&gt;

&lt;p&gt;If you were caught in today’s &lt;strong&gt;&lt;a href="https://edition.cnn.com/2025/10/20/tech/aws-outage/index.html" rel="noopener noreferrer"&gt;AWS outage&lt;/a&gt;&lt;/strong&gt;, you weren’t alone. CNN reported more than &lt;strong&gt;6.5 million disruption reports&lt;/strong&gt; worldwide — from banks and airlines to AI companies and popular apps like Snapchat and Fortnite.&lt;br&gt;&lt;br&gt;
The root cause? A malfunction in &lt;strong&gt;AWS’s EC2 network monitoring subsystem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For DevOps and cloud teams, this was more than downtime — it was a reminder that &lt;strong&gt;Disaster Recovery isn’t just about data&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Real &lt;strong&gt;Cloud Disaster Recovery&lt;/strong&gt; means protecting your &lt;em&gt;entire configuration&lt;/em&gt; — infrastructure, policies, and dependencies, not just your storage.&lt;/p&gt;

&lt;p&gt;When configuration breaks, recovery breaks with it.  &lt;/p&gt;

&lt;p&gt;Tomorrow, take these &lt;strong&gt;five practical steps&lt;/strong&gt; to build real resilience across your environment — not just to recover data, but to recover fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  1️⃣ Audit What You Really Run
&lt;/h2&gt;

&lt;p&gt;Start with &lt;strong&gt;visibility&lt;/strong&gt;. Use &lt;a href="https://aws.amazon.com/well-architected-tool/" rel="noopener noreferrer"&gt;AWS’s Well-Architected Tool&lt;/a&gt; to baseline your setup and map every resource your workloads rely on — services, regions, and dependencies.  &lt;/p&gt;

&lt;p&gt;Many organizations only discovered today that their most critical workloads lived in &lt;strong&gt;us-east-1&lt;/strong&gt;, the region most impacted by the AWS outage.&lt;/p&gt;

&lt;p&gt;Untracked or shadow resources are silent risks in any &lt;strong&gt;&lt;a href="https://controlmonkey.io/blog/cloud-disaster-recovery-plan/" rel="noopener noreferrer"&gt;Cloud Disaster Recovery plan&lt;/a&gt;&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Centralize your inventory, including staging and testing environments, so you always know what needs replication and protection.&lt;/p&gt;




&lt;h2&gt;
  
  
  2️⃣ Close the IaC Gap
&lt;/h2&gt;

&lt;p&gt;If you had to log into the AWS console and apply manual fixes today, that’s a clear signal:&lt;br&gt;&lt;br&gt;
parts of your environment are still outside your &lt;strong&gt;Infrastructure as Code (IaC)&lt;/strong&gt; coverage.&lt;/p&gt;

&lt;p&gt;Identify those gaps — legacy stacks, ClickOps-created resources, or untracked configurations — and bring them under &lt;strong&gt;&lt;a href="https://controlmonkey.io/terraform-errors-guide/" rel="noopener noreferrer"&gt;Terraform or another IaC tool&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;IaC coverage isn’t just about speed — it’s about precision.&lt;br&gt;&lt;br&gt;
When every configuration lives in code, your &lt;strong&gt;Cloud Disaster Recovery&lt;/strong&gt; process becomes predictable, repeatable, and multi-cloud ready.&lt;/p&gt;




&lt;h2&gt;
  
  
  3️⃣ Run a Mini Cloud DR Drill — “Mini AWS Outage”
&lt;/h2&gt;

&lt;p&gt;Don’t wait for another global AWS outage to test your readiness.  &lt;/p&gt;

&lt;p&gt;Pick one critical service tomorrow, simulate a regional failure, and measure how long it takes to restore full operations.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Did your failover scripts work? Were your runbooks current?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These short, focused drills turn theory into practice and highlight exactly where automation or documentation needs to improve.&lt;/p&gt;




&lt;h2&gt;
  
  
  4️⃣ Detect and Eliminate Drift
&lt;/h2&gt;

&lt;p&gt;Every outage exposes hidden &lt;strong&gt;drift&lt;/strong&gt; — when production no longer matches what’s defined in IaC.  &lt;/p&gt;

&lt;p&gt;During recovery, that mismatch can cause unpredictable behavior, failed redeployments, or security gaps.  &lt;/p&gt;

&lt;p&gt;Implement automated &lt;strong&gt;drift detection and remediation&lt;/strong&gt; to keep your configurations aligned with reality.&lt;br&gt;&lt;br&gt;
When your code and infrastructure mirror each other, your recovery is clean, fast, and verifiable.&lt;/p&gt;




&lt;h2&gt;
  
  
  5️⃣ Automate Daily Snapshots and Recovery Workflows
&lt;/h2&gt;

&lt;p&gt;Static backups protect &lt;em&gt;data&lt;/em&gt; but not &lt;em&gt;operations&lt;/em&gt;.&lt;br&gt;&lt;br&gt;
Automate daily &lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/aws-backup/latest/devguide/whatisbackup.html" rel="noopener noreferrer"&gt;infrastructure snapshots&lt;/a&gt;&lt;/strong&gt; across all environments.&lt;/p&gt;

&lt;p&gt;Capture every policy, dependency, and configuration so you can roll back instantly if another AWS outage hits.&lt;/p&gt;

&lt;p&gt;These automated snapshots create a &lt;strong&gt;&lt;a href="https://www.networkworld.com/article/3853808/controlmonkey-aims-to-bring-order-to-cloud-disaster-recovery-chaos.html" rel="noopener noreferrer"&gt;“time machine” for your cloud&lt;/a&gt;&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Combined with code-based recovery workflows, they turn &lt;strong&gt;Cloud Disaster Recovery&lt;/strong&gt; into a proactive discipline — not a panic-driven event.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌍 Resilience Can’t Depend on One Provider
&lt;/h2&gt;

&lt;p&gt;Today’s AWS outage was a reminder that the internet’s backbone is only as reliable as its weakest link.  &lt;/p&gt;

&lt;p&gt;Whether your systems run on &lt;strong&gt;AWS, Azure, GCP&lt;/strong&gt;, or depend on providers like &lt;strong&gt;&lt;a href="https://developers.cloudflare.com/terraform/" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href="https://docs.snowflake.com/en/user-guide/terraform" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt;&lt;/strong&gt;, or &lt;strong&gt;Datadog&lt;/strong&gt;, resilience must span your entire ecosystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 ControlMonkey’s Approach to Cloud Resilience
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ControlMonkey&lt;/strong&gt; helps DevOps teams achieve that resilience through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated drift detection
&lt;/li&gt;
&lt;li&gt;IaC-based recovery pipelines
&lt;/li&gt;
&lt;li&gt;Daily infrastructure snapshots
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, they ensure your cloud stays ready — no matter which provider goes down next.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://controlmonkey.io/blog/cloud-disaster-recovery-plan/" rel="noopener noreferrer"&gt;Learn how ControlMonkey automates Cloud Disaster Recovery and keeps your infrastructure resilient.&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;💬 &lt;strong&gt;What’s your team doing post-outage?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Share your lessons or plans to strengthen resilience in the comments — let’s make the next AWS outage a non-event.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>infrastructureascode</category>
      <category>disasterrecovery</category>
    </item>
    <item>
      <title>GCP Compute Engine Terraform 2025: Create a VM Instance</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Mon, 20 Oct 2025 02:08:00 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/gcp-compute-engine-terraform-2025-create-a-vm-instance-4k61</link>
      <guid>https://dev.to/terraformmonkey/gcp-compute-engine-terraform-2025-create-a-vm-instance-4k61</guid>
      <description>&lt;p&gt;&lt;em&gt;By Daniel Alfasi — Backend Developer &amp;amp; AI Researcher&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When teams need to spin up infrastructure quickly, nothing beats &lt;strong&gt;GCP Compute Engine with Terraform&lt;/strong&gt; for consistent, declarative deployments.&lt;/p&gt;

&lt;p&gt;By combining Terraform’s state management with &lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs" rel="noopener noreferrer"&gt;Google’s robust APIs&lt;/a&gt;, you can treat every Terraform GCP instance like code — repeatable in any environment.&lt;/p&gt;

&lt;p&gt;Whether you’re creating a small sandbox or a production-ready cluster, learning how to &lt;strong&gt;create a Compute Engine VM with Terraform&lt;/strong&gt; pays off immediately.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;For a broader view on managing Terraform with Google Cloud, check out the &lt;a href="https://controlmonkey.io/terraform-gcp-provider-best-practices/" rel="noopener noreferrer"&gt;GCP Terraform Provider Best Practices Guide&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Basic Compute Engine Terraform Configuration
&lt;/h2&gt;

&lt;p&gt;The snippet below shows the absolute minimum needed to define a Terraform GCP instance.&lt;br&gt;&lt;br&gt;
Once applied, &lt;a href="https://cloud.google.com/compute/docs?hl=he" rel="noopener noreferrer"&gt;Terraform talks to the Google Cloud API&lt;/a&gt; and delivers a ready-to-use &lt;a href="https://controlmonkey.io/terraform-projects-for-gcp/" rel="noopener noreferrer"&gt;Terraform VM in GCP&lt;/a&gt; — no console clicks required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.tf — minimal GCP Compute Engine Terraform example&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_compute_instance"&lt;/span&gt; &lt;span class="s2"&gt;"demo"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"demo-vm"&lt;/span&gt;
  &lt;span class="nx"&gt;machine_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"e2-small"&lt;/span&gt;
  &lt;span class="nx"&gt;zone&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1-a"&lt;/span&gt;

  &lt;span class="nx"&gt;boot_disk&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;initialize_params&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"debian-cloud/debian-12"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;network_interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;network&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;
    &lt;span class="nx"&gt;access_config&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before running &lt;code&gt;terraform apply&lt;/code&gt;, initialize your environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
terraform plan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you apply, you’ll have compute resources that can be shared, versioned, audited, and destroyed just as easily.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 Configuring Machine Types, Zones, and Metadata
&lt;/h2&gt;

&lt;p&gt;Scaling a Terraform VM in GCP is as simple as changing the &lt;code&gt;machine_type&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;machine_type&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"e2-medium"&lt;/span&gt;  &lt;span class="c1"&gt;# or "c3-standard-8" for more power&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Need to burst into another region?&lt;br&gt;&lt;br&gt;
Just update the &lt;code&gt;zone&lt;/code&gt;, and Terraform builds a twin — perfectly codified and drift-free.&lt;/p&gt;

&lt;p&gt;Teams can safely experiment, knowing that peer reviews catch issues &lt;strong&gt;before&lt;/strong&gt; production resources are created.&lt;/p&gt;

&lt;p&gt;If you store state in Cloud Storage with a backend block, teammates can collaborate safely and avoid conflicting writes.&lt;br&gt;&lt;br&gt;
Use a service account with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;roles/compute.admin
roles/storage.objectViewer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for least-privilege security.&lt;/p&gt;

&lt;p&gt;For more secure and automated access, read the &lt;a href="https://controlmonkey.io/gcp-terraform-authentication-guide/" rel="noopener noreferrer"&gt;GCP Terraform Authentication Guide&lt;/a&gt; and the &lt;a href="https://controlmonkey.io/gcp-pam-terraform-guide/" rel="noopener noreferrer"&gt;GCP PAM Terraform Guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Provisioning Startup Scripts and SSH in Terraform GCP Instances
&lt;/h2&gt;

&lt;p&gt;A common pattern when authoring Terraform VM blueprints is attaching a &lt;strong&gt;startup script&lt;/strong&gt; to install packages, configure logging, or register nodes in CI.&lt;/p&gt;

&lt;p&gt;You can keep the script inline or reference an external file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;metadata_startup_script&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"scripts/startup.sh"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you add startup scripts, you’ll realize how much manual setup disappears.&lt;br&gt;&lt;br&gt;
That’s when the &lt;strong&gt;repeatability of GCP Compute Engine with Terraform&lt;/strong&gt; really clicks.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion: Why Standardize on GCP Compute Engine Terraform
&lt;/h2&gt;

&lt;p&gt;With just ~20 lines of code, you’ve gone from nothing to a reproducible VM — all from your terminal.&lt;/p&gt;

&lt;p&gt;💡 Ready for production?&lt;br&gt;&lt;br&gt;
Check out &lt;a href="https://controlmonkey.io/news/announcing-controlmonkeys-terraform-private-modules-registry/" rel="noopener noreferrer"&gt;ControlMonkey’s GCP Compute Module&lt;/a&gt; for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in firewall rules
&lt;/li&gt;
&lt;li&gt;SSH key management
&lt;/li&gt;
&lt;li&gt;Monitoring hooks
&lt;/li&gt;
&lt;li&gt;Best-practice defaults
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clone it and start shipping infrastructure today!  &lt;/p&gt;

&lt;p&gt;💬 Questions or feedback? Drop a comment below or &lt;a href="https://controlmonkey.io/book-intro-meeting/" rel="noopener noreferrer"&gt;book a quick intro call&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Related reads:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://controlmonkey.io/terraform-gcp-provider-best-practices/" rel="noopener noreferrer"&gt;GCP Terraform Provider Best Practices&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://controlmonkey.io/terraform-projects-for-gcp/" rel="noopener noreferrer"&gt;Terraform Projects for GCP&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://controlmonkey.io/gcp-terraform-authentication-guide/" rel="noopener noreferrer"&gt;GCP Terraform Authentication Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://controlmonkey.io/gcp-pam-terraform-guide/" rel="noopener noreferrer"&gt;GCP PAM Terraform Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>terraform</category>
      <category>gcp</category>
      <category>cloud</category>
      <category>sre</category>
    </item>
    <item>
      <title>Start Safe: Terragrunt Import for Multi-Account AWS</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Thu, 16 Oct 2025 23:00:00 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/start-safe-terragrunt-import-for-multi-account-aws-197p</link>
      <guid>https://dev.to/terraformmonkey/start-safe-terragrunt-import-for-multi-account-aws-197p</guid>
      <description>&lt;p&gt;Terragrunt Import lets you bring brownfield infrastructure under Terraform control across multi-repo and multi-account setups. Done right, you’ll avoid state drift, unstable addresses, and risky access patterns.&lt;/p&gt;

&lt;p&gt;The goal is a reproducible, auditable workflow with clean plans and minimal permissions. Use a consistent remote state, pin tooling versions, and validate every step in CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔎 At a Glance: Terragrunt Import Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Standardize remote state &lt;strong&gt;and lock it&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;📌 Pin &lt;strong&gt;Terraform&lt;/strong&gt;, &lt;strong&gt;providers&lt;/strong&gt;, and &lt;strong&gt;Terragrunt&lt;/strong&gt; versions&lt;/li&gt;
&lt;li&gt;🧾 Document intent with &lt;strong&gt;Terraform import blocks&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🤖 Automate plans and &lt;strong&gt;halt on drift or diffs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🔐 Use &lt;strong&gt;least-privilege, short-lived credentials&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Mini-story: An engineer imported dozens of resources on a laptop with a newer provider than CI. The next pipeline showed a wall of “changes” — all caused by version drift. Pinning would have caught this earlier.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you’re managing multiple accounts and environments, keeping configuration clean and consistent can be a real challenge.&lt;br&gt;&lt;br&gt;
👉 Check out &lt;a href="https://controlmonkey.io/terragrunt-less-verbose/" rel="noopener noreferrer"&gt;&lt;strong&gt;Terragrunt Less Verbose&lt;/strong&gt;&lt;/a&gt; for tips to reduce boilerplate and simplify Terragrunt structure across large repos.&lt;/p&gt;


&lt;h2&gt;
  
  
  🧱 Do: Prepare State, Providers &amp;amp; Repo for Safe Terragrunt Import
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use a &lt;strong&gt;remote backend&lt;/strong&gt; with &lt;strong&gt;locking + encryption&lt;/strong&gt; (S3 + DynamoDB lock, GCS, or Azure Blob). Inherit backend config via a &lt;strong&gt;root &lt;code&gt;terragrunt.hcl&lt;/code&gt;&lt;/strong&gt; to avoid divergent state and concurrent writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin versions&lt;/strong&gt; for Terraform, providers, and Terragrunt. Run &lt;code&gt;terraform init -upgrade&lt;/code&gt; only in controlled windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate in CI&lt;/strong&gt; with &lt;code&gt;terraform validate&lt;/code&gt; and &lt;code&gt;terraform plan -detailed-exitcode&lt;/code&gt; gates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preflight with snapshots&lt;/strong&gt;: enable bucket/container versioning and take a &lt;strong&gt;state backup&lt;/strong&gt; before each import; start with a &lt;strong&gt;read-only discovery&lt;/strong&gt; run.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# example CI guard&lt;/span&gt;
terraform &lt;span class="nb"&gt;fmt&lt;/span&gt; &lt;span class="nt"&gt;-check&lt;/span&gt;
terraform validate
terraform plan &lt;span class="nt"&gt;-detailed-exitcode&lt;/span&gt;   &lt;span class="c"&gt;# exit 2 on diff; fail the job&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🧭 Do: Use Import Blocks + Terragrunt Hooks for Clear, Stable Addresses
&lt;/h2&gt;

&lt;p&gt;Prefer &lt;strong&gt;Terraform ≥ 1.5 import blocks&lt;/strong&gt; to document import intent in code and keep resource addresses stable across runs. Combine with &lt;strong&gt;Terragrunt hooks&lt;/strong&gt; to (a) generate import IDs, (b) run a plan immediately after import, and (c) fail on any unexpected diff.&lt;/p&gt;

&lt;p&gt;Start with a &lt;strong&gt;skeleton HCL&lt;/strong&gt;: declare essential arguments only; add temporary &lt;code&gt;lifecycle.ignore_changes&lt;/code&gt; for noisy attributes until parity is verified.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Caveat: import blocks require Terraform &lt;strong&gt;1.5+&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Canonical snippet (HCL):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# modules/storage/main.tf&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"logs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucket_name&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;ignore_changes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# temporary while achieving parity&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# modules/storage/import.tf (Terraform ≥ 1.5)&lt;/span&gt;
&lt;span class="nx"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logs&lt;/span&gt;
  &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-company-logs"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Live config with Terragrunt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# live/prod/storage/terragrunt.hcl&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../../modules/storage"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;inputs&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-company-logs"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Optional Terragrunt hook: force a plan after import and fail on drift&lt;/span&gt;
&lt;span class="nx"&gt;after_hook&lt;/span&gt; &lt;span class="s2"&gt;"after_import_plan"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;commands&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"import"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;execute&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-lc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"terraform plan -detailed-exitcode || exit 1"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Keep module paths stable across environments so resource &lt;strong&gt;addresses never change&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🚫 Don’t: Refactor Modules Mid-Import or Apply Without a Clean Plan
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Don’t refactor module names, move modules, or rename resources &lt;strong&gt;during&lt;/strong&gt; an import — it changes addresses and &lt;strong&gt;breaks state mapping&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Never apply after an import unless the &lt;strong&gt;plan is clean&lt;/strong&gt; (no unintended creates/destroys). Enforce with &lt;code&gt;-detailed-exitcode&lt;/code&gt; in CI.&lt;/li&gt;
&lt;li&gt;If you discover an address mismatch, fix it with:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform state &lt;span class="nb"&gt;mv&lt;/span&gt; &lt;span class="s1"&gt;'aws_s3_bucket.logs'&lt;/span&gt; &lt;span class="s1"&gt;'aws_s3_bucket.logs_new'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Caveat: &lt;code&gt;state mv&lt;/code&gt; ops should be reviewed in PRs and run from the &lt;strong&gt;same pinned toolchain&lt;/strong&gt; as your plans.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔐 Do: Enforce Least-Privilege &amp;amp; Short-Lived Access
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;assume-role&lt;/strong&gt; with external IDs/MFA and &lt;strong&gt;short sessions&lt;/strong&gt;, scoped to import-only APIs for the target services.&lt;/li&gt;
&lt;li&gt;Separate &lt;strong&gt;read-only discovery&lt;/strong&gt; from &lt;strong&gt;write&lt;/strong&gt; operations; rotate credentials; store secrets securely in CI.&lt;/li&gt;
&lt;li&gt;Keep &lt;strong&gt;audit trails&lt;/strong&gt;: confirm who imported what and when using provider/cloud logs (e.g., CloudTrail). Keep local CI run metadata as a cross-check (provider logs can lag).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example: short-lived AWS session (assume-role)&lt;/span&gt;
aws sts assume-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-arn&lt;/span&gt; arn:aws:iam::123456789012:role/terraform-importer &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-session-name&lt;/span&gt; terragrunt-import
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🗂️ Example: Importing a Storage Bucket (Pattern Applies Broadly)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;strong&gt;minimal&lt;/strong&gt; resource block and keep the &lt;strong&gt;terragrunt.hcl path stable&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Add a Terraform &lt;strong&gt;import block&lt;/strong&gt; with the bucket’s canonical ID.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;terraform init&lt;/code&gt;, then nudge Terragrunt to plan:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terragrunt run-all plan &lt;span class="nt"&gt;-detailed-exitcode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The plan should show &lt;strong&gt;no changes&lt;/strong&gt; except legitimate drift.&lt;/li&gt;
&lt;li&gt;If noise appears (e.g., tags or server-generated fields), add &lt;strong&gt;temporary&lt;/strong&gt; &lt;code&gt;ignore_changes&lt;/code&gt;, reconcile code to reality, then &lt;strong&gt;remove ignores&lt;/strong&gt; once parity is achieved.&lt;/li&gt;
&lt;li&gt;Commit the &lt;strong&gt;import block + configuration&lt;/strong&gt; together so future plans remain clean.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you hit unexpected diffs or failed imports, read &lt;a href="https://controlmonkey.io/terraform-errors-guide/" rel="noopener noreferrer"&gt;&lt;strong&gt;The Complete Terraform Errors Guide&lt;/strong&gt;&lt;/a&gt; to decode plan output, debug root causes, and avoid destructive applies.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧰 Bring It Together with Guardrails
&lt;/h2&gt;

&lt;p&gt;A disciplined Terragrunt Import flow yields &lt;strong&gt;reproducible, auditable&lt;/strong&gt; results with &lt;strong&gt;clean plans&lt;/strong&gt; and &lt;strong&gt;least-privilege&lt;/strong&gt; access. Codify intent with &lt;strong&gt;import blocks&lt;/strong&gt;, keep &lt;strong&gt;addresses stable&lt;/strong&gt;, and &lt;strong&gt;block applies on drift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking for acceleration?&lt;/strong&gt; ControlMonkey can help with discovery, safe sequencing, and policy guardrails across multi-account environments.&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://controlmonkey.io" rel="noopener noreferrer"&gt;Request a demo&lt;/a&gt; to see it in action.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Discussion
&lt;/h2&gt;

&lt;p&gt;What’s your &lt;strong&gt;Terragrunt import playbook&lt;/strong&gt; for multi-account AWS?&lt;br&gt;&lt;br&gt;
Which &lt;strong&gt;drift signals&lt;/strong&gt; or &lt;strong&gt;CI gates&lt;/strong&gt; have saved you from bad applies? Share your setup in the comments!&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>aws</category>
      <category>terraform</category>
    </item>
    <item>
      <title>⚙️ 7 AI-Powered Prompts That Supercharge Your Terraform Workflow</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Thu, 16 Oct 2025 13:56:57 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/7-ai-powered-prompts-that-supercharge-your-terraform-workflow-5f08</link>
      <guid>https://dev.to/terraformmonkey/7-ai-powered-prompts-that-supercharge-your-terraform-workflow-5f08</guid>
      <description>&lt;h1&gt;
  
  
  ⚙️ 7 AI-Powered Prompts That Supercharge Your Terraform Workflow
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;By &lt;a href="https://il.linkedin.com/in/daniel-alfasi-7a6056222" rel="noopener noreferrer"&gt;Daniel Alfasi&lt;/a&gt; — Backend Developer &amp;amp; AI Researcher&lt;/em&gt;  &lt;/p&gt;




&lt;p&gt;For years, &lt;strong&gt;Terraform&lt;/strong&gt; has been the backbone of Infrastructure as Code (IaC).&lt;br&gt;&lt;br&gt;
Now, with &lt;strong&gt;AI entering the workflow&lt;/strong&gt;, engineers no longer need to spend hours troubleshooting syntax, writing repetitive modules, or combing through verbose plan outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terraform + AI&lt;/strong&gt; brings the same revolution that developers already enjoy in their editors — directly into the world of cloud infrastructure.&lt;/p&gt;


&lt;h2&gt;
  
  
  🤖 LLMs for Terraform in IDEs &amp;amp; CLI
&lt;/h2&gt;

&lt;p&gt;AI copilots are no longer confined to browser tabs — they now sit &lt;em&gt;inside the tools you already use&lt;/em&gt; every day.&lt;/p&gt;
&lt;h3&gt;
  
  
  🧩 GitHub Copilot &amp;amp; Amazon CodeWhisperer
&lt;/h3&gt;

&lt;p&gt;Autocomplete HCL, Bash, and Go tests. Suggest variable names, generate resource blocks, and explain errors inline.&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; | &lt;a href="https://aws.amazon.com/codewhisperer/" rel="noopener noreferrer"&gt;Amazon CodeWhisperer&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  💡 Cursor AI &amp;amp; Continue (VS Code / JetBrains)
&lt;/h3&gt;

&lt;p&gt;Run one-shot refactors like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Extract these CIDRs into variables”&lt;br&gt;&lt;br&gt;
“Convert count loops to for_each”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Highlights hard-coded values as you type.&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://cursor.sh" rel="noopener noreferrer"&gt;Cursor AI&lt;/a&gt; | &lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  💬 OpenAI Chat in Editors
&lt;/h3&gt;

&lt;p&gt;Chat about the current file or diff.&lt;br&gt;&lt;br&gt;
Ask things like:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why is this plan destroying prod?”&lt;br&gt;&lt;br&gt;
and get an instant summary — no context switching.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  💻 Natural-Language CLI Wrappers
&lt;/h3&gt;

&lt;p&gt;Tools like Warp AI let you type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Add S3 bucket encryption”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…and get the exact Terraform or AWS CLI command.&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://www.warp.dev" rel="noopener noreferrer"&gt;Warp AI&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  🐵 ControlMonkey KoMo — The IaC Copilot
&lt;/h3&gt;

&lt;p&gt;Meet &lt;a href="https://controlmonkey.io/news/iac-ai-copilot-komo/" rel="noopener noreferrer"&gt;&lt;strong&gt;KoMo&lt;/strong&gt;&lt;/a&gt;, ControlMonkey’s AI Copilot for Terraform.&lt;br&gt;&lt;br&gt;
KoMo helps engineers &lt;strong&gt;tag resources, detect drift, and flag destructive changes&lt;/strong&gt; before merges — all &lt;em&gt;within governed workflows&lt;/em&gt; connected to policy checks and audit logs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🎥 &lt;em&gt;See KoMo in action — &lt;a href="https://controlmonkey.io/request-demo/" rel="noopener noreferrer"&gt;Request a Demo&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  🧠 7 AI Prompts to Level Up Your Terraform Workflow
&lt;/h2&gt;

&lt;p&gt;These prompts work with AI assistants like &lt;strong&gt;Cursor AI&lt;/strong&gt;, &lt;strong&gt;GitHub Copilot&lt;/strong&gt;, or &lt;strong&gt;Warp AI&lt;/strong&gt; — helping you write cleaner Terraform faster, with fewer mistakes.&lt;/p&gt;


&lt;h3&gt;
  
  
  🧮 Prompt 1: Convert Magic Numbers into Variables
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Highlight every hard-coded CIDR, AMI, or instance size and convert it to a variable.&lt;/span&gt;
&lt;span class="c"&gt;# Add sensible defaults in variables.tf and environment-specific values in dev.tfvars and prod.tfvars.&lt;/span&gt;
&lt;span class="c"&gt;# Then run: terraform validate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Hard-coding introduces fragility. Extracting values into variables improves reusability and prevents accidental rebuilds.&lt;/p&gt;

&lt;p&gt;✅ Promotes reusable modules&lt;br&gt;&lt;br&gt;
✅ Catches wiring mistakes early&lt;br&gt;&lt;br&gt;
✅ Reduces environment drift&lt;/p&gt;


&lt;h3&gt;
  
  
  🏷️ Prompt 2: Tag or Label All Resources
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scan this folder and list any resource that lacks a tags block (AWS) or labels block (GCP/Azure).&lt;/span&gt;
&lt;span class="c"&gt;# Show the file and line number, and generate a patch snippet for each offender.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Tagging is essential for FinOps, cost visibility, and cleanup automation.&lt;/p&gt;

&lt;p&gt;✅ Enforces tagging compliance&lt;br&gt;&lt;br&gt;
✅ Improves billing insights&lt;br&gt;&lt;br&gt;
✅ Enables automated lifecycle management&lt;/p&gt;


&lt;h3&gt;
  
  
  💥 Prompt 3: Detect Destructive Terraform Code Changes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Given this Terraform plan output (preferably terraform show -json plan.tfplan):&lt;/span&gt;
&lt;span class="c"&gt;# - List every resource marked for destruction or replacement (-, -/+, or delete actions).&lt;/span&gt;
&lt;span class="c"&gt;# - Explain the cause for each.&lt;/span&gt;
&lt;span class="c"&gt;# - Suggest safer alternatives:&lt;/span&gt;
&lt;span class="c"&gt;#     - terraform apply -replace=RESOURCE_ADDR&lt;/span&gt;
&lt;span class="c"&gt;#     - lifecycle { create_before_destroy = true }&lt;/span&gt;
&lt;span class="c"&gt;#     - lifecycle { prevent_destroy = true }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Prevents outages by highlighting destructive changes and offering safer alternatives.&lt;/p&gt;

&lt;p&gt;✅ Reduces production risk&lt;br&gt;&lt;br&gt;
✅ Improves plan review clarity&lt;br&gt;&lt;br&gt;
✅ Encourages safer lifecycle patterns&lt;/p&gt;


&lt;h3&gt;
  
  
  🔍 Prompt 4: AI-Powered Drift Detection
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Given this terraform plan output (ideally JSON via terraform show -json plan.tfplan):&lt;/span&gt;
&lt;span class="c"&gt;# - Highlight resources where current infra differs from desired config.&lt;/span&gt;
&lt;span class="c"&gt;# - Categorize drift: console change, autoscaling, or unknown.&lt;/span&gt;
&lt;span class="c"&gt;# - Suggest remediation for each category.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Drift detection ensures reproducibility and prevents “ClickOps chaos.”&lt;/p&gt;

&lt;p&gt;✅ Flags console changes early&lt;br&gt;&lt;br&gt;
✅ Keeps IaC in sync with cloud reality&lt;br&gt;&lt;br&gt;
✅ Supports import/revert workflows&lt;/p&gt;


&lt;h3&gt;
  
  
  📋 Prompt 5: Human-Readable Terraform Plan Summaries
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# You are an expert DevOps engineer.&lt;/span&gt;
&lt;span class="c"&gt;# Given the output of a Terraform plan:&lt;/span&gt;
&lt;span class="c"&gt;# - Explain it in plain language.&lt;/span&gt;
&lt;span class="c"&gt;# - List which resources are created, updated, or destroyed.&lt;/span&gt;
&lt;span class="c"&gt;# - Keep it concise and human-readable.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Plan outputs are notoriously verbose — AI can translate them into actionable English.&lt;/p&gt;

&lt;p&gt;✅ Improves cross-team visibility&lt;br&gt;&lt;br&gt;
✅ Builds confidence in IaC&lt;br&gt;&lt;br&gt;
✅ Simplifies code reviews&lt;/p&gt;




&lt;h2&gt;
  
  
  🔒 Want Prompts 6 &amp;amp; 7?
&lt;/h2&gt;

&lt;p&gt;The next two prompts cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Security &amp;amp; compliance scanning with AI&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dependency &amp;amp; version drift control&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Read the full article here:&lt;br&gt;&lt;br&gt;
➡️ &lt;a href="https://controlmonkey.io/news/iac-ai-copilot-komo/" rel="noopener noreferrer"&gt;&lt;strong&gt;7 AI-Powered Prompts That Supercharge Your Terraform Workflow →&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;💬 &lt;em&gt;Which of these prompts will you try first?&lt;/em&gt;&lt;br&gt;&lt;br&gt;
Share your favorite Terraform + AI tricks in the comments 👇&lt;/p&gt;

</description>
      <category>devops</category>
      <category>terraform</category>
      <category>ai</category>
      <category>sre</category>
    </item>
    <item>
      <title>GCP Cloud SQL + Terraform: Quick Start Guide</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Tue, 23 Sep 2025 08:50:30 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/gcp-cloud-sql-terraform-quick-start-guide-2mk4</link>
      <guid>https://dev.to/terraformmonkey/gcp-cloud-sql-terraform-quick-start-guide-2mk4</guid>
      <description>&lt;p&gt;So you want to spin up a &lt;strong&gt;Cloud SQL instance on GCP&lt;/strong&gt; but avoid endless ClickOps? Terraform has your back. With just a few lines of HCL, you can go from nothing → fully working database in minutes.&lt;/p&gt;

&lt;p&gt;This guide walks you through the essentials and gives you a safe, production-ready starting point.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Why use Terraform for Cloud SQL?
&lt;/h2&gt;

&lt;p&gt;Sure, you &lt;em&gt;could&lt;/em&gt; create your database via the GCP console... but that’s fragile and error-prone. Terraform gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Version control&lt;/strong&gt; – every DB change tracked in Git
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Repeatability&lt;/strong&gt; – no more “it works on my account” setups
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Collaboration&lt;/strong&gt; – teammates share the same IaC definitions
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Safety&lt;/strong&gt; – drift detection, plan previews, and easier rollbacks
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 For broader advice, check out &lt;a href="https://controlmonkey.io/terraform-gcp-provider-best-practices/" rel="noopener noreferrer"&gt;Terraform GCP Provider Best Practices&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Minimal Example: Postgres 15 on GCP
&lt;/h2&gt;

&lt;p&gt;Here’s the simplest setup to get you started:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_sql_database_instance"&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"example-sql"&lt;/span&gt;
  &lt;span class="nx"&gt;database_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"POSTGRES_15"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1"&lt;/span&gt;

  &lt;span class="nx"&gt;settings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;tier&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"db-f1-micro"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_sql_user"&lt;/span&gt; &lt;span class="s2"&gt;"users"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"example-user"&lt;/span&gt;
  &lt;span class="nx"&gt;instance&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_sql_database_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;db_password&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_sql_database"&lt;/span&gt; &lt;span class="s2"&gt;"database"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"example-db"&lt;/span&gt;
  &lt;span class="nx"&gt;instance&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_sql_database_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;default&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;📖 See the &lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs" rel="noopener noreferrer"&gt;Terraform Google Provider docs&lt;/a&gt; for all available options.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔒 Handling Passwords Safely
&lt;/h2&gt;

&lt;p&gt;⚠️ Never hardcode DB passwords in your &lt;code&gt;.tf&lt;/code&gt; files. Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://cloud.google.com/secret-manager" rel="noopener noreferrer"&gt;Google Secret Manager&lt;/a&gt; and fetch secrets at runtime
&lt;/li&gt;
&lt;li&gt;Or inject passwords as environment variables:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TF_VAR_db_password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"super-secret"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terraform automatically picks this up when you run &lt;code&gt;terraform apply&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Hardening for Production
&lt;/h2&gt;

&lt;p&gt;The above works fine for demos or dev environments. For production, consider adding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated backups&lt;/strong&gt; + Point-in-Time Recovery (PITR)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance windows&lt;/strong&gt; for predictable updates
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private IPs&lt;/strong&gt; to keep DB traffic off the public internet
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High availability&lt;/strong&gt; replicas
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These settings can all be added inside the &lt;code&gt;settings {}&lt;/code&gt; block of your instance.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Next Steps
&lt;/h2&gt;

&lt;p&gt;If you want to go beyond basics, you can modularize this setup and reuse it across projects. A Terraform module helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standardize DB configurations
&lt;/li&gt;
&lt;li&gt;Apply org-wide policies (naming, networking, backups)
&lt;/li&gt;
&lt;li&gt;Scale faster without duplicating boilerplate
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Learn more about scaling with &lt;a href="https://controlmonkey.io/scale-terraform-guide/" rel="noopener noreferrer"&gt;Terraform at scale&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Over to you!
&lt;/h2&gt;

&lt;p&gt;How are you managing &lt;strong&gt;Cloud SQL on GCP&lt;/strong&gt; today? Do you keep it simple with raw resources, or wrap things into reusable modules?  &lt;/p&gt;

&lt;p&gt;Drop your thoughts in the comments 👇  &lt;/p&gt;

&lt;p&gt;👉 Want the full version? Read the complete &lt;a href="https://controlmonkey.io/gcp-cloud-sql-terraform-quick-start-guide/" rel="noopener noreferrer"&gt;GCP Cloud SQL Terraform Quick Start Guide&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>terraform</category>
      <category>sql</category>
      <category>devops</category>
    </item>
    <item>
      <title>🌊 AI Is Coming Faster Than Your Infra Can Handle</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Tue, 09 Sep 2025 13:21:00 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/ai-is-coming-faster-than-your-infra-can-handle-3g1m</link>
      <guid>https://dev.to/terraformmonkey/ai-is-coming-faster-than-your-infra-can-handle-3g1m</guid>
      <description>&lt;p&gt;Everywhere I go, CIOs and DevOps leaders are asking the same question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Are we ready for AI?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(And honestly—it’s not just IT. Every exec in every division is asking it.)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After talking to hundreds of cloud teams this year, I had a strong hunch about the answer. But I wanted numbers. So we surveyed &lt;strong&gt;300 cloud and infra leaders across industries&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;The results? Clear as day:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Most teams aren’t ready for the AI surge at all.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 The AI Wave Is Bigger Than Most Realize
&lt;/h2&gt;

&lt;p&gt;Workloads aren’t just growing—they’re exploding.  &lt;/p&gt;

&lt;p&gt;Cloud leaders expect a &lt;strong&gt;50% increase in AI-driven workloads in the next 12–24 months&lt;/strong&gt;, with almost 40% predicting exponential growth.  &lt;/p&gt;

&lt;p&gt;That means: more clusters, pipelines, policies… and more risk.  &lt;/p&gt;

&lt;p&gt;AI doesn’t just add scale—it &lt;strong&gt;accelerates the pace of change&lt;/strong&gt;, magnifying every weakness in your infra.  &lt;/p&gt;

&lt;p&gt;If your team is already stretched thin, AI could break you.  &lt;/p&gt;

&lt;p&gt;This is why forward-looking orgs are leaning into &lt;a href="https://controlmonkey.io/blog/amazon-bedrock-terraform-controlmonkey-windward/" rel="noopener noreferrer"&gt;AWS transformation stories like Windward’s Amazon Bedrock journey&lt;/a&gt; as blueprints for what’s coming.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 The Numbers Confirm It
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://lp.controlmonkey.io/genai-cloud-infra-2025-report" rel="noopener noreferrer"&gt;our latest report&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only &lt;strong&gt;46% say they’re fully prepared&lt;/strong&gt; to automate at AI scale.
&lt;/li&gt;
&lt;li&gt;Average IaC coverage: &lt;strong&gt;51%&lt;/strong&gt; (half of infra is still manual).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;98% admit they face blockers&lt;/strong&gt; to scaling and resilience.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;27%&lt;/strong&gt; already see costs rising due to AI.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even the “ready” orgs have holes—performance, cost, compliance, skills…&lt;br&gt;&lt;br&gt;
There’s no such thing as “safe.”&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Infra Will Decide Who Wins AI
&lt;/h2&gt;

&lt;p&gt;AI will expose infra maturity more brutally than anything before it.  &lt;/p&gt;

&lt;p&gt;The companies that thrive won’t just be the ones with the biggest AI labs or data scientists. They’ll be the ones whose cloud teams can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reconcile infra continuously (no drift, no blind spots).
&lt;/li&gt;
&lt;li&gt;Automate everything: provisioning, scaling, rollback, compliance.
&lt;/li&gt;
&lt;li&gt;Give developers speed &lt;em&gt;and&lt;/em&gt; keep the business secure.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren’t nice-to-haves. They’re critical.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Because here’s the truth: If infra lags, AI fails.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  🛑 What’s Really Blocking Scale
&lt;/h3&gt;

&lt;p&gt;The biggest barriers aren’t GPUs or budgets. They’re the basics: &lt;strong&gt;security, governance, and visibility.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Nearly every team (98%) admits they’re hitting blockers to both scale and resilience.  &lt;/p&gt;

&lt;p&gt;Without automated compliance checks, real-time drift detection, and policy-driven scaling, you’re building on sand.  &lt;/p&gt;

&lt;p&gt;Until those gaps close, &lt;strong&gt;total automation isn’t optional—it’s survival.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp51m5v9a9wlb5zw2t2q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp51m5v9a9wlb5zw2t2q.png" alt="Scaling barriers chart" width="642" height="1000"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;What’s Stopping Organizations Scaling with Confidence?&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  👀 What Cloud Leaders Say They Need Most
&lt;/h3&gt;

&lt;p&gt;When asked what would actually move the needle, cloud leaders were clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More training (23%)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better visibility into infra + AI workloads (22%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words—skills and sightlines.  &lt;/p&gt;

&lt;p&gt;The fix isn’t a magic platform. It’s &lt;strong&gt;frameworks, playbooks, and &lt;a href="https://controlmonkey.io/blog/iac-modernization-guide/" rel="noopener noreferrer"&gt;IaC modernization strategies&lt;/a&gt;&lt;/strong&gt; that make readiness real.  &lt;/p&gt;

&lt;p&gt;The clock’s ticking—those gaps won’t close themselves.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔑 What Needs to Change Right Now
&lt;/h2&gt;

&lt;p&gt;If you’re a CIO or CTO staring down the AI wave, the takeaway isn’t &lt;em&gt;“buy more GPUs.”&lt;/em&gt; It’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expand IaC coverage until manual infra is gone.
&lt;/li&gt;
&lt;li&gt;Put guardrails in place so console changes can’t bypass policy.
&lt;/li&gt;
&lt;li&gt;Invest in skills + visibility, not just cost cutting.
&lt;/li&gt;
&lt;li&gt;Free DevOps teams from firefighting by automating repetitive tasks.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://controlmonkey.io/blog/devops-ai/" rel="noopener noreferrer"&gt;AI is already forcing DevOps to adapt and accelerate&lt;/a&gt;. The difference between scaling and drowning is what you do with your infra.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; AI is coming whether you’re ready or not.&lt;br&gt;&lt;br&gt;
The wave is here. The question is: will your infra ride it—or break under it?&lt;/p&gt;




&lt;p&gt;👉 &lt;a href="https://controlmonkey.io/genai-cloud-infrastructure-report-2025/" rel="noopener noreferrer"&gt;Download the full report&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;💬 What do you think—are most orgs underestimating how hard infra readiness will be for AI? Drop your thoughts below!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>🔧 Ending Engineering Toil in DevOps: Why Automation Matters</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Fri, 05 Sep 2025 08:35:00 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/ending-engineering-toil-in-devops-why-automation-matters-3fn3</link>
      <guid>https://dev.to/terraformmonkey/ending-engineering-toil-in-devops-why-automation-matters-3fn3</guid>
      <description>&lt;p&gt;If you’re leading a DevOps or platform team, chances are you’ve seen this cycle before:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tickets piling up for infra changes
&lt;/li&gt;
&lt;li&gt;Manual reviews dragging down delivery speed
&lt;/li&gt;
&lt;li&gt;Engineers stuck firefighting misconfigs instead of building
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s &lt;strong&gt;engineering toil&lt;/strong&gt;—work that’s &lt;em&gt;manual, repetitive, and adds little long-term value&lt;/em&gt;. And in cloud infrastructure, it’s everywhere.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🚨 The Cost of Toil
&lt;/h2&gt;

&lt;p&gt;Engineering toil might feel small in isolation (“just fix this drift,” “just patch that config”), but it compounds:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slow delivery&lt;/strong&gt; → Infra requests bottleneck in tickets
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Burnout&lt;/strong&gt; → Engineers spend more time debugging than innovating
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher risk&lt;/strong&gt; → Manual changes mean more room for error
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innovation drag&lt;/strong&gt; → Time that should go to features gets lost to maintenance
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As &lt;a href="https://sre.google/sre-book/eliminating-toil/" rel="noopener noreferrer"&gt;Google SRE principles&lt;/a&gt; put it: too much toil &lt;em&gt;kills scalability&lt;/em&gt;.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ How IaC Automation Reduces Toil
&lt;/h2&gt;

&lt;p&gt;The cure isn’t “work harder”—it’s &lt;strong&gt;automate the repeatable stuff&lt;/strong&gt;. With Infrastructure as Code (IaC) pipelines and guardrails in place, teams can:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Catch misconfigs early&lt;/strong&gt; with automated policy checks in CI/CD
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop chasing drift&lt;/strong&gt; with continuous drift detection &amp;amp; remediation
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eliminate tickets&lt;/strong&gt; by enabling self-service infra delivery
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retire legacy pain&lt;/strong&gt; through Terraform import of unmanaged resources
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ban ClickOps chaos&lt;/strong&gt; by surfacing console-created changes instantly
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: IaC automation pipeline snippet&lt;/span&gt;
&lt;span class="nx"&gt;workflow&lt;/span&gt; &lt;span class="s2"&gt;"terraform-ci"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="s2"&gt;"lint"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;runs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"terraform fmt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"terraform validate"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="s2"&gt;"plan"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;runs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"terraform plan -out=tfplan"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;step&lt;/span&gt; &lt;span class="s2"&gt;"apply"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;runs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"terraform apply tfplan"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every change flows through the same automated process → safer, faster, less toil.  &lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 The Payoff
&lt;/h2&gt;

&lt;p&gt;Teams that prioritize &lt;a href="https://controlmonkey.io/solution/reduce-engineering-toil/" rel="noopener noreferrer"&gt;reducing engineering toil&lt;/a&gt; see benefits across the board:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced manual work&lt;/strong&gt; → No more boilerplate Terraform or ticket loops
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster infra delivery&lt;/strong&gt; → Devs get self-service, compliant infra on demand
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less firefighting&lt;/strong&gt; → Issues are caught early, before they hit production
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Happier engineers&lt;/strong&gt; → Time is spent building, not cleaning up
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to go deeper, here’s a good read on &lt;a href="https://controlmonkey.io/resource/engineering-toil-signs/" rel="noopener noreferrer"&gt;signs of engineering toil&lt;/a&gt; and how to break the cycle with automation.  &lt;/p&gt;




&lt;h2&gt;
  
  
  💬 Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Engineering toil is a tax on innovation. The longer you ignore it, the more it grows.  &lt;/p&gt;

&lt;p&gt;The solution isn’t throwing more people at the problem—it’s building systems that make toil disappear for good.  &lt;/p&gt;

&lt;p&gt;💬 How does your team deal with infra toil today—scripts, tickets, or full IaC automation? Drop your thoughts below 👇&lt;/p&gt;

</description>
    </item>
    <item>
      <title>👀 Why IaC Visibility Is Critical for DevOps Teams</title>
      <dc:creator>TerraformMonkey</dc:creator>
      <pubDate>Thu, 04 Sep 2025 13:32:49 +0000</pubDate>
      <link>https://dev.to/terraformmonkey/why-iac-visibility-is-critical-for-devops-teams-49m3</link>
      <guid>https://dev.to/terraformmonkey/why-iac-visibility-is-critical-for-devops-teams-49m3</guid>
      <description>&lt;p&gt;If you’re running cloud infrastructure at scale, you’ve probably asked yourself a few of these questions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Do we know every resource running in our cloud accounts?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Which resources are actually managed by Terraform (or OpenTofu/Terragrunt)?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;What happens when someone does ClickOps in the console?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer, for most teams, is: &lt;em&gt;we don’t fully know&lt;/em&gt;. And that’s a problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌩️ The Hidden Risks of Blind Spots
&lt;/h2&gt;

&lt;p&gt;Without Infrastructure as Code (IaC) visibility, you’re basically flying blind:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unmanaged resources&lt;/strong&gt; → Someone spun up a database directly in AWS? Good luck finding it until the bill spikes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://controlmonkey.io/blog/enterprise-cloud-control-model/" rel="noopener noreferrer"&gt;Drift&lt;/a&gt;&lt;/strong&gt; &amp;amp; misconfigurations → Resources change outside of Terraform, leaving code and reality out of sync.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance gaps&lt;/strong&gt; → Auditors ask “show me all your cloud assets” … and you scramble through scripts, spreadsheets, and hope.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team burnout&lt;/strong&gt; → Engineers waste hours troubleshooting why infra doesn’t match the plan.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blind spots don’t just create chaos—they slow you down and make risk invisible.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 What IaC Visibility Really Means
&lt;/h2&gt;

&lt;p&gt;When we talk about &lt;strong&gt;IaC visibility&lt;/strong&gt;, we mean being able to answer—instantly and confidently:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;What resources exist across all accounts and regions?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Which ones are covered by Terraform, OpenTofu, or Terragrunt?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Which ones aren’t?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;What changed recently—and was it code or ClickOps?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This level of insight flips the script: instead of &lt;em&gt;finding problems reactively&lt;/em&gt;, you &lt;em&gt;govern proactively&lt;/em&gt;.  &lt;/p&gt;

&lt;p&gt;For a good primer on why &lt;strong&gt;&lt;a href="https://www.wiz.io/academy/cloud-visibility" rel="noopener noreferrer"&gt;cloud visibility&lt;/a&gt;&lt;/strong&gt; is foundational to security and governance, check out Wiz’s guide.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Why It Matters for DevOps Leaders
&lt;/h2&gt;

&lt;p&gt;For DevOps managers and platform engineers, IaC visibility directly impacts:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; compliance&lt;/strong&gt; → Full &lt;a href="https://controlmonkey.io/solution/cloud-inventory/" rel="noopener noreferrer"&gt;cloud inventory&lt;/a&gt; mapped to code means no unknowns during audits.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Productivity&lt;/strong&gt; → Engineers spend less time firefighting and more time building.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience&lt;/strong&gt; → Drift and ClickOps are detected early, before they break production.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling safely&lt;/strong&gt; → As cloud grows, visibility ensures you don’t lose control.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s the difference between reactive firefighting and confident, future-ready infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ How to Achieve It
&lt;/h2&gt;

&lt;p&gt;Here are a few practical steps you can take:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with a Cloud Inventory&lt;/strong&gt; → Use tools/scripts to scan accounts and regions.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Map resources to Terraform&lt;/strong&gt; → Identify what’s already in IaC and what’s unmanaged.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up Drift Detection&lt;/strong&gt; → Regularly compare code vs. cloud state.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor for ClickOps&lt;/strong&gt; → Track changes made outside Terraform.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review IaC Coverage&lt;/strong&gt; → Audit which providers, modules, and versions are in use.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: Drift check in Terraform&lt;/span&gt;
&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="nx"&gt;plan&lt;/span&gt; &lt;span class="nx"&gt;-detailed-exitcode&lt;/span&gt;

&lt;span class="c1"&gt;# Exit codes:&lt;/span&gt;
&lt;span class="c1"&gt;# 0 = No changes&lt;/span&gt;
&lt;span class="c1"&gt;# 2 = Drift detected (infrastructure has changed)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re operating across multiple providers, &lt;strong&gt;&lt;a href="https://controlmonkey.io/news/cross-cloud-visibility/" rel="noopener noreferrer"&gt;cross-cloud visibility&lt;/a&gt;&lt;/strong&gt; becomes even more important—blind spots multiply when AWS, Azure, and GCP all come into play.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Visibility isn’t a “nice-to-have” in modern cloud—it’s survival.&lt;br&gt;&lt;br&gt;
The bigger your infra, the more you need &lt;strong&gt;a single source of truth across cloud and code&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;💬 What about your team—do you feel you have &lt;em&gt;true visibility&lt;/em&gt; into your Terraform coverage, or are blind spots still hiding in the dark?  &lt;/p&gt;

&lt;p&gt;Let’s discuss 👇&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
