<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: lssh</title>
    <description>The latest articles on DEV Community by lssh (@lbcristaldo).</description>
    <link>https://dev.to/lbcristaldo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3763164%2F4102bafe-b360-4777-b64f-42c3d373bd0e.jpg</url>
      <title>DEV Community: lssh</title>
      <link>https://dev.to/lbcristaldo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lbcristaldo"/>
    <language>en</language>
    <item>
      <title>AlmaLinux: From Firmware Preparation to Continuous Auditing</title>
      <dc:creator>lssh</dc:creator>
      <pubDate>Sat, 28 Feb 2026 05:59:16 +0000</pubDate>
      <link>https://dev.to/lbcristaldo/almalinux-from-firmware-preparation-to-continuous-auditing-38c1</link>
      <guid>https://dev.to/lbcristaldo/almalinux-from-firmware-preparation-to-continuous-auditing-38c1</guid>
      <description>&lt;h1&gt;
  
  
  Setting Up an AlmaLinux System: From Hardware Planning to Enterprise Hardening
&lt;/h1&gt;

&lt;p&gt;Setting up an AlmaLinux system follows a structured process that spans from hardware planning to post-installation hardening, including resilience, reproducibility, and continuous auditing strategies. This guide integrates enterprise security best practices aligned with CIS, DISA STIG, and HIPAA benchmarks.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Preparation and Firmware (BIOS/UEFI)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 System Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimum RAM:&lt;/strong&gt; 1.5 GB (4 GB or more recommended for production)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disk space:&lt;/strong&gt; 10 GB minimum, 20 GB recommended for general use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supported architectures:&lt;/strong&gt; x86_64 (Intel/AMD), aarch64 (ARM64), ppc64le (PowerPC), s390x (IBM Z)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1.2 BIOS/UEFI Configuration
&lt;/h3&gt;

&lt;p&gt;It's essential to decide between legacy BIOS or UEFI before starting. UEFI is required to enable Secure Boot, a feature AlmaLinux supports to ensure only signed and authorized kernel modules are loaded. Verify boot order and disable Secure Boot if there are driver conflicts. The BIOS/UEFI choice determines the partitioning scheme: MBR for legacy BIOS, GPT for UEFI. Verify compatibility of critical controllers (Intel 8254x NIC, Atheros chips, storage HBAs), as some require additional firmware in isolated environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.3 ISO Download and Verification
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Available types:&lt;/strong&gt; boot (network installation), minimal (standalone base), dvd (full packages)&lt;/li&gt;
&lt;li&gt;Import the AlmaLinux public key and verify the SHA256 checksum of the downloaded image to ensure integrity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boot media creation:&lt;/strong&gt; &lt;code&gt;dd&lt;/code&gt; on Linux/macOS, or Rufus/Fedora Media Writer on Windows&lt;/li&gt;
&lt;li&gt;USB of at least 8 GB (12 GB recommended for convenience)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Configuration in the Anaconda Installer
&lt;/h2&gt;

&lt;p&gt;Once booted from the USB, Anaconda will guide the configuration through the "Installation Summary."&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Localization and Time
&lt;/h3&gt;

&lt;p&gt;Configure keyboard, language support, and time zone. Enable network time (NTP) for precise synchronization, which is critical for forensic log consistency.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Software Selection
&lt;/h3&gt;

&lt;p&gt;Base environment: Server with GUI, Server (no GUI), Minimal Installation, or Virtualization Host. Select add-ons according to the server role, applying the principle of minimal attack surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Partitioning Strategy with LVM
&lt;/h3&gt;

&lt;p&gt;Custom partitioning with LVM (Logical Volume Manager) is fundamental for operational resilience. LVM allows expanding disks online without downtime and applying restrictive mount options per volume.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate &lt;code&gt;/var&lt;/code&gt;, &lt;code&gt;/tmp&lt;/code&gt;, &lt;code&gt;/home&lt;/code&gt;, and &lt;code&gt;/var/log&lt;/code&gt; into independent LVM volumes&lt;/li&gt;
&lt;li&gt;Apply options in &lt;code&gt;/etc/fstab&lt;/code&gt;: for &lt;code&gt;/tmp&lt;/code&gt; and &lt;code&gt;/var/tmp&lt;/code&gt; use &lt;code&gt;noexec&lt;/code&gt; (blocks binary execution), &lt;code&gt;nosuid&lt;/code&gt; (ignores set-user-ID), &lt;code&gt;nodev&lt;/code&gt; (ignores special devices)&lt;/li&gt;
&lt;li&gt;For &lt;code&gt;/var/log&lt;/code&gt;, isolation prevents a disk-filling DoS attack from compromising the root partition&lt;/li&gt;
&lt;li&gt;The XFS filesystem (default in AlmaLinux) allows online growth with &lt;code&gt;xfs_growfs&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; filesystems like Ext3/4 reserve 5% of space exclusively for root, allowing recovery if a user fills the disk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2.4 Encryption
&lt;/h3&gt;

&lt;p&gt;Enable "Encrypt my data" in Anaconda to protect data at rest using LUKS encryption with a strong passphrase.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.5 Network and Hostname
&lt;/h3&gt;

&lt;p&gt;Enable detected network interfaces and assign an FQDN (Fully Qualified Domain Name) hostname.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.6 Security Profile (SCAP Compliance)
&lt;/h3&gt;

&lt;p&gt;From Anaconda, SCAP policies can be applied that automate the initial hardening of the system, establishing the security baseline from the first boot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Available profiles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CIS Benchmark Level 1 or Level 2&lt;/li&gt;
&lt;li&gt;DISA STIG&lt;/li&gt;
&lt;li&gt;HIPAA&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;CIS Level 2 is the most restrictive and suitable for environments with strict compliance requirements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2.7 User Settings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Define a strong password for root&lt;/li&gt;
&lt;li&gt;Create a regular user and mark them as Administrator for sudo access&lt;/li&gt;
&lt;li&gt;Root should never be used for routine operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2.8 FIPS (Highly Regulated Environments)
&lt;/h3&gt;

&lt;p&gt;In environments that require it, enabling FIPS mode ensures the system uses only certified cryptographic algorithms.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important warning:&lt;/strong&gt; FIPS must be enabled from the installer boot using the &lt;code&gt;fips=1&lt;/code&gt; kernel parameter, not as a post-installation step. Post-installation conversion can break software that depends on non-certified algorithms (legacy MD5 implementations, certain OpenSSL versions). Cold-initialized FIPS mode is significantly more reliable.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Post-Installation and Hardening Phase
&lt;/h2&gt;

&lt;p&gt;After the first reboot, proactive maintenance and layered security steps are executed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Initial Update
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;sudo dnf update&lt;/code&gt; immediately to apply the latest security patches and errata (ALSA). Audit active services with &lt;code&gt;systemctl list-units --type=service&lt;/code&gt; and disable everything not necessary for the server role (minimal attack surface principle).&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 SELinux Management
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Verify that SELinux is in &lt;strong&gt;Enforcing&lt;/strong&gt; mode&lt;/li&gt;
&lt;li&gt;Never disable it as a solution to problems&lt;/li&gt;
&lt;li&gt;Fix incorrect labels with &lt;code&gt;restorecon&lt;/code&gt;, not by disabling protection&lt;/li&gt;
&lt;li&gt;Adjust behaviors with SELinux booleans before resorting to policy changes&lt;/li&gt;
&lt;li&gt;Analyze denials with &lt;code&gt;ausearch -m avc&lt;/code&gt; or &lt;code&gt;sealert&lt;/code&gt; for precise diagnosis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.3 SSH Management
&lt;/h3&gt;

&lt;p&gt;SSH configuration is one of the most critical attack surfaces on any network-exposed server.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disable root login: &lt;code&gt;PermitRootLogin no&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Force use of Protocol 2&lt;/li&gt;
&lt;li&gt;Force public key authentication and disable password authentication&lt;/li&gt;
&lt;li&gt;Restrict access with &lt;code&gt;AllowUsers&lt;/code&gt; or &lt;code&gt;AllowGroups&lt;/code&gt; in &lt;code&gt;/etc/ssh/sshd_config&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Consider changing the default port (22) to reduce noise in access logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.4 Firewall Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Verify that &lt;code&gt;firewalld&lt;/code&gt; is active and enabled at boot&lt;/li&gt;
&lt;li&gt;Apply the principle of least network privilege: only strictly necessary services (SSH, HTTP/S, etc.) with open ports&lt;/li&gt;
&lt;li&gt;Review active zones and connections with &lt;code&gt;firewall-cmd --list-all&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.5 Auditing with auditd
&lt;/h3&gt;

&lt;p&gt;SELinux and firewalld protect the system, but &lt;code&gt;auditd&lt;/code&gt; provides the forensic event logging required by frameworks like CIS or STIG. Without &lt;code&gt;auditd&lt;/code&gt;, a system may be secure but not auditable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify that &lt;code&gt;auditd&lt;/code&gt; is active and configured to record critical security events&lt;/li&gt;
&lt;li&gt;Configure a remote log server via Syslog-ng with SSL/stunnel tunnels — this ensures logs are tamper-proof even if the host is compromised (auditd is useless if an attacker with root access can wipe local records)&lt;/li&gt;
&lt;li&gt;Validate post-installation time synchronization with &lt;code&gt;chronyc tracking&lt;/code&gt;, as precise timestamps are essential for forensic log consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.6 Least Privilege with sudo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Configure &lt;code&gt;/etc/sudoers&lt;/code&gt; with least privilege policies, limiting which commands each user or group can execute&lt;/li&gt;
&lt;li&gt;Enable logging of all commands executed via sudo for full traceability&lt;/li&gt;
&lt;li&gt;Always use &lt;code&gt;visudo&lt;/code&gt; to edit sudoers, avoiding syntax errors that could lock out access&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3.7 Kernel Live Patching
&lt;/h3&gt;

&lt;p&gt;In environments with high-availability SLAs, the traditional patch → reboot → maintenance window cycle represents a significant operational cost. Implement &lt;strong&gt;KernelCare&lt;/strong&gt; or &lt;strong&gt;TuxCare&lt;/strong&gt; to apply critical security patches to the kernel and system libraries (Glibc, OpenSSL) without needing to reboot. This maximizes availability and eliminates maintenance windows for urgent security patches.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Resilience and Operational Continuity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Backup Strategy
&lt;/h3&gt;

&lt;p&gt;A hardened system without backups represents an equally serious operational risk as a system without hardening. Backups must be defined before going to production.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configure LVM snapshots for quick captures of the system state before critical changes&lt;/li&gt;
&lt;li&gt;Implement regular external backups with tools like &lt;strong&gt;Restic&lt;/strong&gt;, &lt;strong&gt;Bacula&lt;/strong&gt;, or automated &lt;code&gt;rsync&lt;/code&gt; policies&lt;/li&gt;
&lt;li&gt;Periodically verify the integrity and restorability of backups&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4.2 Continuity with LVM
&lt;/h3&gt;

&lt;p&gt;LVM allows extending disks and growing XFS filesystems online with minimal or no downtime. Plan volumes with anticipated growth space to avoid capacity emergencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Long-Term Update Strategy
&lt;/h3&gt;

&lt;p&gt;For migrations between major versions (e.g., AlmaLinux 8 to 9), use the &lt;strong&gt;ELevate&lt;/strong&gt; project, which allows in-place upgrades without reinstalling the system. Plan update windows and document the pre-migration state with LVM snapshots.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Reproducibility and Configuration Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Automation with Ansible
&lt;/h3&gt;

&lt;p&gt;The manual guide becomes real value when translated into reusable, idempotent code. Ansible allows reproducing exactly the same state across any number of servers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automate configuration of SELinux labels, booleans, and firewall rules&lt;/li&gt;
&lt;li&gt;Ensure every deployed server is identical to the previous one, eliminating configuration drift&lt;/li&gt;
&lt;li&gt;Version the playbooks in a source control system for change traceability&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5.2 Integration with Existing Infrastructure
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Domain joining with &lt;code&gt;realm join&lt;/code&gt; for integration with Active Directory (via Winbind or SSSD) or FreeIPA, centralizing identity management&lt;/li&gt;
&lt;li&gt;Configure internal package repositories for air-gapped environments or those with software approval policies&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. Active Vulnerability Management
&lt;/h2&gt;

&lt;p&gt;Everything above covers initial configuration and maintenance. A complete enterprise cycle requires continuous evaluation — a server that meets CIS Level 2 today may not meet it six months later after updates and operational changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Scanning with OpenSCAP
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;oscap xccdf eval&lt;/code&gt; periodically against the same CIS/STIG profile applied during installation. Reports identify configuration drift from the initial baseline. Integrate scans into CI/CD pipelines or schedule them with &lt;code&gt;cron&lt;/code&gt; for automatic evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Auditing with Lynis
&lt;/h3&gt;

&lt;p&gt;Lynis complements OpenSCAP with a broader system security approach. Run &lt;code&gt;lynis audit system&lt;/code&gt; and review the additional hardening recommendations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This order of operations ensures the system is not only functional from the first minute, but meets enterprise security standards from its initial deployment. The sequence covers the four dimensions of a mature production system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔒 &lt;strong&gt;Secure by design&lt;/strong&gt; — hardening from Anaconda&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Auditable&lt;/strong&gt; — auditd + remote logs&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Resilient&lt;/strong&gt; — LVM + backups + live patching&lt;/li&gt;
&lt;li&gt;♻️ &lt;strong&gt;Reproducible&lt;/strong&gt; — Ansible + configuration management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Active vulnerability management with OpenSCAP and Lynis closes the cycle by ensuring compliance is maintained over time, not just at the moment of deployment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published in &lt;a href="https://lbcristaldo.hashnode.dev/" rel="noopener noreferrer"&gt;https://lbcristaldo.hashnode.dev/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>linux</category>
      <category>ansible</category>
      <category>sysadmin</category>
    </item>
    <item>
      <title>De NGINX Ingress a Gateway API</title>
      <dc:creator>lssh</dc:creator>
      <pubDate>Sun, 22 Feb 2026 03:13:45 +0000</pubDate>
      <link>https://dev.to/lbcristaldo/de-nginx-ingress-a-gateway-api-4m1i</link>
      <guid>https://dev.to/lbcristaldo/de-nginx-ingress-a-gateway-api-4m1i</guid>
      <description>&lt;p&gt;Guía de migración: por qué, cómo y qué esperar&lt;/p&gt;

&lt;p&gt;&lt;u&gt;1. Por qué migrar: el fin de NGINX Ingress&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;El controlador Ingress NGINX de la comunidad entra en fase de mantenimiento de "mejor esfuerzo" hasta su retiro oficial en marzo de 2026. A partir de esa fecha, no habrá más correcciones de errores ni parches de seguridad. Cualquier CVE descubierto quedará sin respuesta.&lt;br&gt;
Esto convierte la migración en una prioridad de &lt;strong&gt;cumplimiento y seguridad, no solo una actualización técnica opcional.&lt;/strong&gt; Postergarla es acumular deuda técnica insostenible: un sistema de enrutamiento expuesto, sin soporte, corriendo tráfico de producción.&lt;br&gt;
La migración a Gateway API no es un cambio de versión. Es una re-arquitectura del manejo de tráfico en Kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;2. El cambio de paradigma: del monolito al modelo desacoplado&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;El modelo Ingress combinaba en un solo objeto todo lo relacionado con el enrutamiento: aprovisionamiento de infraestructura, configuración de red y reglas de aplicación. Funcionaba, pero con fricciones importantes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;El problema con Ingress&lt;/em&gt;&lt;br&gt;
    • Dependía de anotaciones propietarias y específicas de cada proveedor para funciones avanzadas (canary, reescrituras, cabeceras). &lt;strong&gt;Eso generaba vendor lock-in: si cambiabas de controlador, reescribías todo.&lt;/strong&gt;&lt;br&gt;
    • Al vivir en un solo objeto, un error en una ruta podía desestabilizar el controlador completo. El radio de impacto era global.&lt;br&gt;
    • Estaba limitado a HTTP/HTTPS. No había soporte nativo para TCP, UDP, gRPC ni TLS terminado en capa 4.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lo que trae Gateway API&lt;/em&gt;&lt;br&gt;
Gateway API separa las responsabilidades en tres capas bien definidas:&lt;br&gt;
    •  &lt;strong&gt;GatewayClass:&lt;/strong&gt; define el proveedor de infraestructura (quién opera el controlador).&lt;br&gt;
    •  &lt;strong&gt;Gateway:&lt;/strong&gt; gestionado por el equipo de plataforma o clúster. Aquí viven las políticas globales de TLS, WAF y seguridad.&lt;br&gt;
    •  &lt;strong&gt;Routes (HTTPRoute, TCPRoute, etc.):&lt;/strong&gt; gestionadas por los equipos de aplicación. Cada equipo controla sus propias rutas sin tocar la configuración global.&lt;/p&gt;

&lt;p&gt;Este desacoplamiento tiene consecuencias concretas:&lt;br&gt;
    • Un desarrollador ya no puede sobreescribir accidentalmente la configuración de TLS del clúster.&lt;br&gt;
    • Un error de sintaxis en el namespace de un equipo no afecta las rutas de otros.&lt;br&gt;
    • Las funciones avanzadas —división de tráfico, modificación de cabeceras, enrutamiento por peso— son campos de primera clase en la especificación, no anotaciones ad hoc.&lt;br&gt;
    • Soporte nativo de protocolos L4 y L7: HTTP, HTTPS, TCP, UDP, TLS, gRPC.&lt;br&gt;
La portabilidad es otra ganancia directa: al basarse en un estándar de SIG-Network, cambiar de controlador o de proveedor de nube ya no implica reescribir todos los manifiestos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7t46gu9asvrq7wnhe8c7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7t46gu9asvrq7wnhe8c7.png" alt=" " width="800" height="1381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;3.Cómo se hace: proceso de migración&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;La recomendación es clara: evitar el enfoque "Big-Bang". Una migración progresiva y en paralelo reduce el riesgo a niveles manejables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;El Step by Step&lt;/strong&gt;&lt;br&gt;
    1. &lt;strong&gt;Auditoría:&lt;/strong&gt; Inventariar todas las dependencias del Ingress actual. Identificar anotaciones personalizadas o "exóticas", configuración de DNS, certificados TLS y casos de uso especiales. No todas las anotaciones tienen equivalentes directos en Gateway API; es mejor descubrirlo antes que después.&lt;br&gt;
    2. &lt;strong&gt;Conversión asistida:&lt;/strong&gt; Usar la herramienta ingress2gateway, que toma manifiestos de Ingress NGINX y genera recursos de Gateway API equivalentes. El output requiere revisión y ajustes manuales; es un punto de partida, no un resultado final.&lt;br&gt;
    3. &lt;strong&gt;Despliegue en paralelo ("Double Run"):&lt;/strong&gt; Instalar el nuevo controlador de Gateway API junto al controlador NGINX existente. Crear los objetos Gateway y HTTPRoute sin dirigir tráfico real todavía. Validar configuración en un entorno de staging.&lt;br&gt;
    4. &lt;strong&gt;Cambio progresivo vía DNS:&lt;/strong&gt; Redirigir pequeños porcentajes de tráfico al nuevo stack (comenzar con 1-5%) y monitorear métricas de latencia (p99) y tasas de error 5xx antes de continuar. Escalar gradualmente hasta el 100%.&lt;br&gt;
    5. &lt;strong&gt;Validación y limpieza:&lt;/strong&gt; Una vez que el 100% del tráfico corre estable sobre Gateway API, eliminar los recursos Ingress y desinstalar el controlador antiguo. Esto reduce la superficie de ataque y elimina la deuda técnica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Puntos críticos de la configuración&lt;/strong&gt;&lt;br&gt;
Dos mecanismos de Gateway API merecen atención especial durante la migración:&lt;br&gt;
    •  permite de forma explícita que un Gateway en un namespace de infraestructura envíe tráfico a un servicio en un namespace de aplicación. Sin este recurso, el enrutamiento entre namespaces está bloqueado por defecto. Es seguridad por diseño.ReferenceGrant:&lt;br&gt;
    •  los objetos de Gateway API reportan su estado mediante condiciones (Accepted, Programmed, ResolvedRefs). Esto permite diagnosticar problemas directamente sobre el recurso, sin necesidad de revisar logs del controlador.Status Conditions:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;4. Comportamiento esperado tras la migración&lt;/u&gt;&lt;br&gt;
Una vez completada la migración, la infraestructura opera de forma cualitativamente distinta:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gobernanza y seguridad granular&lt;/strong&gt;&lt;br&gt;
Los equipos de infraestructura definen y protegen las políticas globales en el objeto Gateway. Los equipos de desarrollo gestionan sus HTTPRoutes de forma autónoma. El RBAC de Kubernetes hace cumplir esta separación de forma nativa, sin parches ni convenciones informales.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reducción del blast radius&lt;/strong&gt;&lt;br&gt;
Una configuración errónea en el namespace de un equipo afecta únicamente sus propias rutas. El controlador no entra en estado inestable; simplemente reporta el error en el campo de status del objeto afectado.&lt;br&gt;
**&lt;br&gt;
&lt;strong&gt;Convergencia con Service Mesh (iniciativa GAMMA)&lt;/strong&gt;&lt;br&gt;
La iniciativa GAMMA del proyecto Gateway API permite usar la misma sintaxis para gestionar tanto el tráfico de entrada Norte-Sur como el tráfico interno Este-Oeste entre servicios. Esto simplifica la pila tecnológica y elimina la necesidad de herramientas separadas para cada tipo de tráfico.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integración de WAF y seguridad centralizada&lt;/strong&gt;&lt;br&gt;
Implementaciones como NGINX App Protect o las políticas de seguridad de Envoy Gateway permiten centralizar la protección contra amenazas (OWASP Top 10) directamente en el punto de entrada, gestionado por el objeto Gateway y sin necesidad de configuración por cada servicio.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;5. Consideraciones y puntos de atención&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;Gateway API resuelve problemas reales de Ingress, pero su mayor granularidad introduce complejidades propias que conviene tener presentes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Carga cognitiva&lt;/strong&gt;&lt;br&gt;
Donde antes existía un solo objeto Ingress, ahora hay tres capas (GatewayClass, Gateway, Routes) que deben estar correctamente interconectadas. La curva de aprendizaje es real. Equipos sin experiencia previa con el modelo necesitan tiempo de adopción y documentación interna clara.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Escalabilidad del plano de control&lt;/strong&gt;&lt;br&gt;
En clústeres grandes con miles de rutas, la traducción de objetos Kubernetes a configuración de plano de datos (por ejemplo, protocolo xDS de Envoy) puede ser computacionalmente costosa. Esto puede generar picos de CPU y latencia en la propagación de actualizaciones durante períodos de alta dinámica de cambios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anotaciones sin equivalente directo&lt;/strong&gt;&lt;br&gt;
No todas las anotaciones avanzadas de NGINX Ingress tienen un campo equivalente en la especificación actual de Gateway API. Algunos casos de uso requieren rediseño, no conversión. La herramienta ingress2gateway ayuda, pero no cubre el 100% de los escenarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fragilidad bajo carga dinámica&lt;/strong&gt;&lt;br&gt;
Algunas implementaciones han mostrado comportamientos inestables ante un flujo muy alto de cambios de rutas o picos de conexiones. En modelos de proxy compartido, un namespace con tráfico excesivo puede afectar la disponibilidad de recursos para otros. El monitoreo continuo durante y después de la migración no es opcional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cómo la comunidad gestiona estos riesgos&lt;/strong&gt;&lt;br&gt;
El proyecto Gateway API, liderado por SIG-Network, aborda estos riesgos mediante mecanismos concretos:&lt;br&gt;
    • Pruebas de conformidad estrictas que garantizan comportamiento consistente entre distintos controladores, reduciendo la fragmentación.&lt;br&gt;
    • El modelo de Policy Attachment permite aplicar políticas de seguridad y rate limiting de forma declarativa y jerárquica, reduciendo el impacto de configuraciones erróneas.&lt;br&gt;
    • Reglas de scope claras: un controlador solo reporta errores sobre objetos dentro de su cadena de propiedad, evitando conflictos entre múltiples implementaciones en el mismo clúster.&lt;/p&gt;




&lt;p&gt;La migración a Gateway API no es urgente por moda técnica; es urgente porque marzo de 2026 es una fecha concreta, y los CVEs no esperan ventanas de mantenimiento.&lt;/p&gt;

&lt;p&gt;Kubernetes SIG-Network · Gateway API · 2025–2026&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>nginx</category>
      <category>gatewayapi</category>
    </item>
    <item>
      <title>Cloud workstation on AWS for $36/month: Windows EC2, static IP and Denver egress explained</title>
      <dc:creator>lssh</dc:creator>
      <pubDate>Sun, 15 Feb 2026 22:21:18 +0000</pubDate>
      <link>https://dev.to/lbcristaldo/cloud-workstation-on-aws-for-36month-windows-ec2-static-ip-and-denver-egress-explained-36f3</link>
      <guid>https://dev.to/lbcristaldo/cloud-workstation-on-aws-for-36month-windows-ec2-static-ip-and-denver-egress-explained-36f3</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built this for a client based in Denver. The project fell through (as they do). But the setup was too pretty to waste, so here it is: a Windows cloud workstation with a static Denver IP, built from Argentina, for a use case that no longer exists. Turns out it's useful anyway.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instance:&lt;/strong&gt; &lt;code&gt;t3.large&lt;/code&gt; — 2 vCPU, 8 GB RAM, Windows Server 2022&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static inbound IP:&lt;/strong&gt; Elastic IP (free while the instance is running)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Denver egress:&lt;/strong&gt; a $3/month static residential proxy — &lt;em&gt;not&lt;/em&gt; a VPN, not a second EC2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total monthly cost:&lt;/strong&gt; ~$36–39 running 8h/day on weekdays, or ~$118 if you leave it on 24/7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No overengineering. No enterprise fluff. Just a clean, hardened Windows box that does what it needs to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  This is what we're building
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfgf8n0alt7tu6jw0xjo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcfgf8n0alt7tu6jw0xjo.png" alt=" " width="800" height="852"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three independent flows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You → EC2&lt;/strong&gt; via RDP (encrypted, locked to your IP only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chrome → Proxy → Web&lt;/strong&gt; (all browser traffic exits from a real Denver residential IP)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EC2 → EBS&lt;/strong&gt; (persistent disk that survives stops and restarts)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight the diagram makes obvious: AWS region and Denver geolocation are completely separate concerns. Your instance lives in &lt;code&gt;us-east-1&lt;/code&gt; (Virginia) for billing reasons. Denver happens at the proxy layer, outside AWS entirely. These two decisions don't interfere with each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who is this actually for?
&lt;/h2&gt;

&lt;p&gt;Before going further, this setup makes sense if you need any of these&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;persistent Windows environment&lt;/strong&gt; with a stable identity, especially if you travel frequently and your IP changes constantly. Your cloud machine stays in "Denver" while you're physically in Buenos Aires, Bangkok, or a airport lounge somewhere in between&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser-based workflows that benefit from a consistent IP identity.&lt;/strong&gt; Think platforms that tie account trust to IP history, or any service that behaves differently depending on your apparent location&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;static outbound IP&lt;/strong&gt; so you can whitelist yourself on external services&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;geo-specific IP&lt;/strong&gt; (Denver or otherwise) without paying for a full dedicated server&lt;/li&gt;
&lt;li&gt;A starting point you can scale. This setup is deliberately one node. The same pattern extends to a fleet: one workstation per city, or multiple instances for a distributed remote team. Today it's one VM, later, it's infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's &lt;em&gt;not&lt;/em&gt; the right call if you need GPU compute, if you're running heavy desktop software, or if multiple people need to access simultaneously (look at WorkSpaces for that).&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture decisions (and what I ruled out)
&lt;/h2&gt;

&lt;p&gt;This is the section most tutorials skip. They tell you &lt;em&gt;what&lt;/em&gt; to do but not &lt;em&gt;why this and not that&lt;/em&gt;. Here's every real decision I made.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instance type: why &lt;code&gt;t3.large&lt;/code&gt; and not smaller or bigger
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;t3&lt;/code&gt; family is burstable — you get a baseline of CPU performance with the ability to spike when needed. For browser-based work, that's ideal: mostly idle, occasionally intense when loading heavy pages or running multiple tabs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instance&lt;/th&gt;
&lt;th&gt;vCPU&lt;/th&gt;
&lt;th&gt;RAM&lt;/th&gt;
&lt;th&gt;Windows On-Demand (us-east-1)&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3.medium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;~$55/mo&lt;/td&gt;
&lt;td&gt;Too little RAM for Chrome + RDP overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;t3.large&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8 GB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$110/mo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Sweet spot&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;t3.xlarge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;~$220/mo&lt;/td&gt;
&lt;td&gt;Overkill — 2x cost, same use case&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Windows Server itself consumes ~2 GB at idle. RDP session adds ~500 MB. Chrome with a few tabs adds another 1–2 GB. On a &lt;code&gt;t3.medium&lt;/code&gt; you're already at the ceiling before doing any work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Region: &lt;code&gt;us-east-1&lt;/code&gt;, not whatever's closest to Denver
&lt;/h3&gt;

&lt;p&gt;Counter-intuitive but important: &lt;strong&gt;don't choose your AWS region based on geographic proximity to your target city.&lt;/strong&gt; Region determines compute pricing. Denver egress is handled at the proxy layer. These are independent.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;us-east-1&lt;/code&gt; (N. Virginia) is the cheapest AWS region for Windows On-Demand. There's no AWS region in Denver, and even if there were, AWS IP geolocation at the city level is unreliable — you cannot guarantee a city-level geo from a raw AWS IP regardless of region.&lt;/p&gt;

&lt;h3&gt;
  
  
  Denver egress: proxy, not VPN, not a second EC2
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Denver because that's where the client was. The client is gone. The architecture isn't. Swap the city for wherever you need. The setup is identical.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This was the most important decision. Three options I evaluated:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: VPN client on EC2 (Mullvad, ProtonVPN)&lt;/strong&gt;&lt;br&gt;
Installs a VPN client on the Windows instance, routes all traffic through a Denver server. Works, but: adds latency overhead, costs $5–10/mo, routes &lt;em&gt;all&lt;/em&gt; traffic including RDP (which you don't need geo'd), and VPN IPs are well-known to commercial services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: Second EC2 instance in a hypothetical Denver region&lt;/strong&gt;&lt;br&gt;
Doesn't exist. AWS has no Denver region. Even the closest region (us-west-2, Oregon) wouldn't give you a Denver IP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option C: Static residential ISP proxy (chosen)&lt;/strong&gt;&lt;br&gt;
A single HTTP/SOCKS5 endpoint from a provider like IPRoyal or Webshare. Configured once in Chrome. Costs $2–5/mo for a single static Denver IP with unlimited bandwidth. The exit IP is a real residential ISP address — not a datacenter IP — which matters for services like LinkedIn that flag datacenter ranges.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why residential matters:&lt;/strong&gt; Many platforms cross-reference your IP against known datacenter CIDR blocks. A residential proxy from an actual Denver ISP looks like a person in Denver, not a server in Denver.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Storage: 50 GB gp3, not the default
&lt;/h3&gt;

&lt;p&gt;AWS will suggest &lt;code&gt;gp2&lt;/code&gt; by default. Use &lt;code&gt;gp3&lt;/code&gt; — same performance baseline, 20% cheaper, and you can provision throughput independently if needed later. 50 GB gives Windows room to breathe (baseline install + updates + Chrome profile + downloads).&lt;/p&gt;
&lt;h3&gt;
  
  
  Elastic IP: attach it, don't skip it
&lt;/h3&gt;

&lt;p&gt;Without an Elastic IP, your instance gets a new public IP every time it starts. That means updating your RDP bookmark, your Security Group rules, and any external whitelists every single time. One EIP solves all of that permanently. It's free while attached to a running instance.&lt;/p&gt;


&lt;h2&gt;
  
  
  Execution: step by step, both CLI and console
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; AWS account, AWS CLI configured (&lt;code&gt;aws configure&lt;/code&gt;), an RDP client (built into Windows; &lt;a href="https://apps.apple.com/app/microsoft-remote-desktop/id1295203466" rel="noopener noreferrer"&gt;Microsoft Remote Desktop&lt;/a&gt; on Mac).&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h3&gt;
  
  
  Step 1: Create a Security Group
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Console:&lt;/strong&gt; EC2 → Security Groups → Create Security Group&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the security group&lt;/span&gt;
aws ec2 create-security-group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; windows-workstation-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"Windows cloud workstation"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; vpc-xxxxxxxxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add RDP inbound rule — YOUR IP ONLY&lt;/span&gt;
&lt;span class="nv"&gt;MY_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://checkip.amazonaws.com&lt;span class="si"&gt;)&lt;/span&gt;
aws ec2 authorize-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; windows-workstation-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 3389 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MY_IP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/32"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Critical:&lt;/strong&gt; Never open port 3389 to &lt;code&gt;0.0.0.0/0&lt;/code&gt;. Bots will attempt brute-force login within minutes. Your IP only, always.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Step 2: Launch the EC2 instance!
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Console:&lt;/strong&gt; EC2 → Launch Instance → Windows Server 2022 Base → t3.large → 50 GB gp3 → select your security group → launch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the latest Windows Server 2022 AMI ID for us-east-1&lt;/span&gt;
aws ec2 describe-images &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--owners&lt;/span&gt; amazon &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=name,Values=Windows_Server-2022-English-Full-Base-*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"sort_by(Images, &amp;amp;CreationDate)[-1].ImageId"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Launch instance (replace ami-xxxxxxxxx with the ID from above)&lt;/span&gt;
aws ec2 run-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image-id&lt;/span&gt; ami-xxxxxxxxx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-type&lt;/span&gt; t3.large &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--key-name&lt;/span&gt; your-key-pair-name &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-group-ids&lt;/span&gt; sg-xxxxxxxxx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--block-device-mappings&lt;/span&gt; &lt;span class="s1"&gt;'[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":50,"VolumeType":"gp3","DeleteOnTermination":true}}]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-associate-public-ip-address&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tag-specifications&lt;/span&gt; &lt;span class="s1"&gt;'ResourceType=instance,Tags=[{Key=Name,Value=windows-workstation}]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 3: Allocate and attach an Elastic IP
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Console:&lt;/strong&gt; EC2 → Elastic IPs → Allocate → Associate → select your instance&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Allocate a new Elastic IP&lt;/span&gt;
&lt;span class="nv"&gt;ALLOC_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 allocate-address &lt;span class="nt"&gt;--domain&lt;/span&gt; vpc &lt;span class="nt"&gt;--query&lt;/span&gt; AllocationId &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Allocation ID: &lt;/span&gt;&lt;span class="nv"&gt;$ALLOC_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Get your instance ID&lt;/span&gt;
&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=windows-workstation"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"Reservations[0].Instances[0].InstanceId"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Associate the EIP&lt;/span&gt;
aws ec2 associate-address &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-id&lt;/span&gt; &lt;span class="nv"&gt;$INSTANCE_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allocation-id&lt;/span&gt; &lt;span class="nv"&gt;$ALLOC_ID&lt;/span&gt;

&lt;span class="c"&gt;# Get your permanent IP&lt;/span&gt;
aws ec2 describe-addresses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allocation-ids&lt;/span&gt; &lt;span class="nv"&gt;$ALLOC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"Addresses[0].PublicIp"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save that IP. It's yours permanently until you explicitly release it.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4: Get the Windows password
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Console:&lt;/strong&gt; EC2 → Instances → select instance → Actions → Security → Get Windows password → upload .pem → decrypt&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 get-password-data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-id&lt;/span&gt; &lt;span class="nv"&gt;$INSTANCE_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--priv-launch-key&lt;/span&gt; /path/to/your-key.pem &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; PasswordData &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Wait 4–5 minutes after launch before this works — the instance needs to finish initializing and encrypt the password.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Step 5: Connect via RDP and baseline setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows&lt;/span&gt;
mstsc /v:&amp;lt;your-elastic-ip&amp;gt;

&lt;span class="c"&gt;# Mac — open Microsoft Remote Desktop, add PC with your Elastic IP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Open PowerShell as Administrator, then:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 1. Disable IE Enhanced Security (lets you download Chrome)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$AdminKey&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HKLM:\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A7-37EF-4b3f-8CFC-4F3A74704073}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-ItemProperty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$AdminKey&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"IsInstalled"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Stop-Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Explorer&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 2. Install Chrome silently&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$installer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;TEMP&lt;/span&gt;&lt;span class="s2"&gt;\ChromeSetup.exe"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Invoke-WebRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://dl.google.com/chrome/install/ChromeStandaloneSetup64.exe"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-OutFile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$installer&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Start-Process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$installer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ArgumentList&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/silent /install"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Wait&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 3. Disable unnecessary services&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Stop-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spooler"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spooler"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-StartupType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Disabled&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Stop-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fax"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt;  
&lt;/span&gt;&lt;span class="n"&gt;Set-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fax"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-StartupType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Disabled&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# 4. Set account lockout policy (5 attempts, 30 min lockout)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;accounts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/lockoutthreshold:5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/lockoutduration:30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;/lockoutwindow:30&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 6: Configure the Denver proxy
&lt;/h3&gt;

&lt;p&gt;Purchase a static residential proxy with Denver, CO targeting from IPRoyal (~$2/mo) or Webshare (~$3/mo). You'll get: a host, a port, a username, and a password.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Chrome only (recommended, most surgical)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a desktop shortcut pointing to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"C:\Program Files\Google\Chrome\Application\chrome.exe" --proxy-server="socks5://USERNAME:PASSWORD@HOST:PORT"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use this shortcut for all geo-sensitive work. Regular Chrome remains proxy-free for anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: System-wide proxy (all traffic)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set system proxy via PowerShell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$proxyAddress&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HOST:PORT"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-ItemProperty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ProxyServer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$proxyAddress&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-ItemProperty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ProxyEnable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verify Denver egress:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Open Chrome (via proxy shortcut) → navigate to https://ipinfo.io
Expected output: city: Denver, region: Colorado
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 7: OS hardening
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Disable Remote Assistance (separate from RDP — you don't need it)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-ItemProperty&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HKLM:\SYSTEM\CurrentControlSet\Control\Remote Assistance"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fAllowToGetHelp"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;0&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Enable Windows Defender real-time protection&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Set-MpPreference&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DisableRealtimeMonitoring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="bp"&gt;$false&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Disable Xbox services (not needed on a server)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$xboxServices&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;@(&lt;/span&gt;&lt;span class="s2"&gt;"XblAuthManager"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"XblGameSave"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"XboxGipSvc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"XboxNetApiSvc"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="kr"&gt;foreach&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$svc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$xboxServices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;Stop-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$svc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ErrorAction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;SilentlyContinue&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;Set-Service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$svc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-StartupType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Disabled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-ErrorAction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;SilentlyContinue&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Rename Administrator account (minor but effective hardening)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Rename-LocalUser&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Administrator"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-NewName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cloudadmin"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Monthly cost scenarios
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;EC2 Compute&lt;/th&gt;
&lt;th&gt;EBS&lt;/th&gt;
&lt;th&gt;Elastic IP&lt;/th&gt;
&lt;th&gt;Proxy&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Always on (730 hrs)&lt;/td&gt;
&lt;td&gt;$109.79&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$117/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business hours (175 hrs)&lt;/td&gt;
&lt;td&gt;$26.32&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;$3.60*&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$37/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimal use (80 hrs)&lt;/td&gt;
&lt;td&gt;$12.03&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;$3.60*&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$23/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instance stopped entirely&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;$4.00&lt;/td&gt;
&lt;td&gt;$3.60&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$11/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;EIP charges $0.005/hr while the instance is stopped (~$3.60/mo if stopped all non-working hours)&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The single most effective cost optimization:&lt;/strong&gt; stop the instance when you're done working. Not terminate — &lt;em&gt;stop&lt;/em&gt;. The EBS volume persists. Your Chrome profile, your files, your proxy config — all intact. You pick up exactly where you left off.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Want to automate stop/start?&lt;/strong&gt; A Lambda function + EventBridge rule can auto-stop at 8 PM and auto-start at 8 AM on weekdays. Adds maybe 30 minutes of setup. Saves ~$80/month if you'd otherwise leave it running.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting and maintenance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "I can't RDP in"
&lt;/h3&gt;

&lt;p&gt;Most common cause: your local IP changed (happens with residential ISPs).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get your current IP&lt;/span&gt;
curl https://checkip.amazonaws.com

&lt;span class="c"&gt;# Update the Security Group rule&lt;/span&gt;
aws ec2 revoke-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; windows-workstation-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="nt"&gt;--port&lt;/span&gt; 3389 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; OLD_IP/32

aws ec2 authorize-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; windows-workstation-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="nt"&gt;--port&lt;/span&gt; 3389 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; NEW_IP/32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  "The proxy isn't working / ipinfo.io shows wrong city"
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Confirm the Chrome shortcut includes the full &lt;code&gt;--proxy-server&lt;/code&gt; flag&lt;/li&gt;
&lt;li&gt;Check proxy credentials haven't expired (some providers rotate them)&lt;/li&gt;
&lt;li&gt;Try &lt;code&gt;curl --proxy socks5://USER:PASS@HOST:PORT https://ipinfo.io&lt;/code&gt; from PowerShell to isolate whether it's a Chrome config issue or a proxy issue&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  "Instance is slow / Chrome is laggy"
&lt;/h3&gt;

&lt;p&gt;Check CPU credit balance — t3 instances use CPU credits and can throttle if you've sustained high load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/EC2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; CPUCreditBalance &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;InstanceId,Value&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$INSTANCE_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'1 hour ago'&lt;/span&gt; +%Y-%m-%dT%H:%M:%SZ&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%Y-%m-%dT%H:%M:%SZ&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 300 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Average
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If credits are near zero, either wait for them to replenish (they refill at ~24 credits/hr on t3.large) or temporarily upgrade to a &lt;code&gt;t3.large&lt;/code&gt; with unlimited burst mode enabled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Routine maintenance (monthly, 10 minutes)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows Update — run in PowerShell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Install-Module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PSWindowsUpdate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Get-WUInstall&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-AcceptAll&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-AutoReboot&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Check disk space&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Get-PSDrive&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;C&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Select-Object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Free&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Rotate your Administrator password if shared&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cloudadmin&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;NewStrongPassword123&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  "I accidentally clicked Terminate instead of Stop"
&lt;/h3&gt;

&lt;p&gt;I'm sorry. The instance is gone. The EBS volume may still exist if you unchecked "Delete on termination" during setup — check EC2 → Volumes for an available volume and attach it to a new instance. This is why snapshots exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Take a snapshot before anything risky&lt;/span&gt;
aws ec2 create-snapshot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--volume-id&lt;/span&gt; vol-xxxxxxxxx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"workstation-backup-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this once a week. It costs ~$0.05/GB/month and has saved me more than once.&lt;/p&gt;




&lt;p&gt;The project that started this is gone. The setup isn't. If it saves you three hours of clicking around the AWS console and a surprise bill, it did its job.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All pricing based on AWS us-east-1 On-Demand rates as of early 2026. Proxy pricing based on IPRoyal single static residential IP. Your numbers may vary slightly — always verify at &lt;a href="https://aws.amazon.com/ec2/pricing" rel="noopener noreferrer"&gt;aws.amazon.com/ec2/pricing&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Infrastructure Archaeology: Diagnosing Multi-Layer CI/CD Failures</title>
      <dc:creator>lssh</dc:creator>
      <pubDate>Thu, 12 Feb 2026 00:50:50 +0000</pubDate>
      <link>https://dev.to/lbcristaldo/infrastructure-archaeology-diagnosing-multi-layer-cicd-failures-3ahi</link>
      <guid>https://dev.to/lbcristaldo/infrastructure-archaeology-diagnosing-multi-layer-cicd-failures-3ahi</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;u&gt;The Pattern&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Modern cloud infrastructure often evolves through incremental additions. &lt;br&gt;
A team starts with basic CI/CD, adds Terraform for IaC, integrates &lt;br&gt;
security scanning, sets up monitoring—each piece works in isolation, &lt;br&gt;
but the system as a whole becomes fragile.&lt;/p&gt;

&lt;p&gt;Here's a failure pattern I've observed across multiple production &lt;br&gt;
GCP environments: what appears to be "a few broken configs" is actually &lt;br&gt;
a multi-layer architectural problem spanning Docker, Terraform, GitHub &lt;br&gt;
Actions, and cloud-native security tooling.&lt;/p&gt;

&lt;p&gt;Let's dissect it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DISCLAIMER:&lt;/strong&gt; All code examples, project names, domains, and configurations in this article are sanitized examples for educational purposes. No real client data or proprietary information is exposed. This analysis is based on publicly available documentation and common infrastructure patterns.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;&lt;u&gt;The Symptom List&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this pattern, teams typically surface a cluster of related failures:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build &amp;amp; Container Issues:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Docker multi-stage build misconfigurations&lt;/strong&gt; — CI/CD pipelines reference non-existent stage names in Dockerfiles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate or conflicting CMD instructions&lt;/strong&gt; — containers exhibit unpredictable startup behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image scanning pipeline breaks&lt;/strong&gt; — security tools block pushes but jobs still succeed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure-as-Code Failures:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Terraform module reference errors&lt;/strong&gt; — output files reference modules that don't exist in the configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Variable interface mismatches&lt;/strong&gt; — calling code passes variables that modules don't accept&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrong execution context&lt;/strong&gt; — CI runs IaC commands in incorrect directories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider version drift&lt;/strong&gt; — different environments use incompatible provider versions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;CI/CD Architecture Gaps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Missing deployment automation&lt;/strong&gt; — builds succeed but nothing triggers actual deployments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No quality gates&lt;/strong&gt; — tests and builds run in parallel; failures don't block progression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardcoded deployment paths&lt;/strong&gt; — only specific branches trigger deploys; others require manual intervention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration drift&lt;/strong&gt; — production URLs and domains missing from automation config&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Security Tooling Integration Conflicts:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Overlapping vulnerability detection&lt;/strong&gt; — Trivy, GCP Container Analysis, and Security Command Center all scan the same images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime security false positives&lt;/strong&gt; — Falco rules trigger on legitimate Cloud Run startup syscalls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fragmented security reporting&lt;/strong&gt; — findings appear in multiple systems with no single source of truth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy enforcement gaps&lt;/strong&gt; — security scans run but don't actually block deployments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Tech stack representative of this pattern:&lt;/strong&gt; GitHub Actions, GCP Cloud Run, Artifact Registry, Terraform, Firebase Hosting, containerized microservices with pnpm/npm monorepo structure.&lt;/p&gt;

&lt;p&gt;Seems like a lot of small fixes, right? The reality is more complex.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;&lt;u&gt;What I Actually Found: The 3-Layer Problem&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These aren't isolated bugs. They're symptoms of failures at three distinct levels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1:&lt;/strong&gt; The Obvious (Syntax &amp;amp; Configuration Errors)&lt;/p&gt;

&lt;p&gt;These are the errors you see immediately when you run the tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker Target Mismatch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile declares:&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:20-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runner&lt;/span&gt;

&lt;span class="c"&gt;# GitHub Action requests:&lt;/span&gt;
with:
  target: app &lt;span class="c"&gt;# ❌ Stage "app" doesn't exist&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Terraform Module Reference:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# outputs.tf tries to reference:&lt;/span&gt;
&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"api_url"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloud_run_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_url&lt;/span&gt; &lt;span class="c1"&gt;# ❌ Module doesn't exist&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# main.tf actually has:&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"api_service"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;# Different name!&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../modules/cloud-run"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Variable Name Mismatch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# envs/prod/main.tf sends:&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-prod"&lt;/span&gt; &lt;span class="c1"&gt;# ❌ Module doesn't accept this&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# modules/cloud-run/variables.tf expects:&lt;/span&gt;
&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;# Different variable!&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are &lt;strong&gt;language and consistency errors&lt;/strong&gt;. Terraform requires that any resource or module referenced in output files be explicitly declared in the active configuration. When you refactor and change module names in &lt;code&gt;main.tf&lt;/code&gt; but forget to update &lt;code&gt;outputs.tf&lt;/code&gt;, you get this.&lt;/p&gt;

&lt;p&gt;The fix? Run &lt;code&gt;terraform validate&lt;/code&gt; — it catches these immediately without even connecting to the cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Platform Changes (Hidden Causes)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where it gets interesting. Some failures aren't in the code — they're in how GCP's platform has evolved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GCP Service Account Permission Changes:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GCP recently changed how Cloud Build uses service accounts. What used to work automatically now fails because the build service account no longer has default permissions to write logs or read from Artifact Registry.&lt;/p&gt;

&lt;p&gt;The missing piece: &lt;code&gt;iam.serviceaccounts.actAs&lt;/code&gt; permission, required for one identity to assume the role of a runtime service account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Organization Policy Restrictions:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That "Firebase region conflict" isn't a typo in your Terraform. It's a collision with &lt;code&gt;constraints/gcp.resourceLocations&lt;/code&gt; — an organization policy that blocks deployments to certain regions, even if your Terraform syntax is perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPC Service Controls:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the project sits inside a VPC Service Controls perimeter, Cloud Run deployments can fail silently with confusing 403/404 errors. The perimeter blocks communication between Google services — like the Cloud Run agent trying to read images from Artifact Registry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Tooling Conflicts:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When security tools are added incrementally — each solving a specific &lt;br&gt;
problem in isolation — they create overlapping responsibilities and &lt;br&gt;
contradictory enforcement policies.&lt;/p&gt;

&lt;p&gt;A typical pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trivy&lt;/strong&gt; is added to CI to scan container images before push&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Falco&lt;/strong&gt; is added to monitor runtime behavior in Cloud Run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP Container Analysis API&lt;/strong&gt; scans images automatically on push 
to Artifact Registry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Command Center&lt;/strong&gt; aggregates findings across the project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool works. The integration doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The failure cascade:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trivy finds a CVE and is configured to block the push&lt;/li&gt;
&lt;li&gt;The GitHub Action reports success anyway (exit code not wired correctly)&lt;/li&gt;
&lt;li&gt;Image gets pushed to Artifact Registry&lt;/li&gt;
&lt;li&gt;Container Analysis API scans the same image 10 minutes later&lt;/li&gt;
&lt;li&gt;Falco triggers alerts on normal Cloud Run startup syscalls 
(false positive)&lt;/li&gt;
&lt;li&gt;Security Command Center reports the same CVE 3 hours later&lt;/li&gt;
&lt;li&gt;Three different alerting systems fire&lt;/li&gt;
&lt;li&gt;No one knows which finding to trust or act on first&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; No centralized security policy. Each tool was added &lt;br&gt;
without defining ownership, enforcement boundaries, or a single &lt;br&gt;
source of truth for findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hidden cost:&lt;/strong&gt; Security tools that don't actually gate deployments give a false sense of protection. The pipeline feels secure. It isn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GCP Resource Name Limits:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GCP has a 63-character limit for resource names. If your Terraform generates names that exceed this (long prefixes like &lt;code&gt;baseInstanceName&lt;/code&gt;), the system truncates them, causing duplicate name conflicts and deployment failures.&lt;/p&gt;

&lt;p&gt;These aren't bugs in your code. They're &lt;strong&gt;platform governance and technical constraints&lt;/strong&gt; that interact badly with naive configurations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Architectural Debt (The Root Problem)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The deepest layer isn't about syntax or permissions — it's about &lt;strong&gt;missing architecture&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No CI/CD Gates:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The build and CI workflows are decoupled. Tests can fail, but images still get built and pushed. There's no &lt;code&gt;needs:&lt;/code&gt; dependency chain enforcing that tests pass before builds run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# What's happening:&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt; &lt;span class="c1"&gt;# ❌ Runs in parallel, doesn't wait for tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Wrong Directory Context:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GitHub Actions runs &lt;code&gt;terraform plan&lt;/code&gt; in the repository root instead of &lt;code&gt;envs/staging/&lt;/code&gt;. Terraform is directory-dependent — without the right context, it validates an empty or incomplete configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardcoded Feature Branch:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Only one deployment path works: a specific feature branch → staging. There's no &lt;code&gt;development&lt;/code&gt; → staging automation, no &lt;code&gt;main&lt;/code&gt; → production workflow. Everything else is manual.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing Environment Variables:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Production URLs and domains aren't defined anywhere in the automation. Cloud Run services deploy without knowing their actual domain mappings, leaving SSL certificates stuck in provisioning or external access failing with 404/502.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;lifecycle orchestration failure&lt;/strong&gt;. Someone built pieces that "worked" in isolation but never architected how they fit together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rriy1fijjldwxyc0ffx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rriy1fijjldwxyc0ffx.png" alt=" " width="800" height="672"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Why Fixing Order Matters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't just "fix what's broken." Here's why sequence matters:&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Fix production Terraform first&lt;/strong&gt; → Staging still broken, can't test changes&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;Wire up CI gates first&lt;/strong&gt; → Builds still fail, nothing to gate&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;Add domain configs first&lt;/strong&gt; → Deployments fail before they even reach the domain mapping step&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Fix build errors&lt;/strong&gt; → then CI validation → then deployment automation → then configuration gaps&lt;/p&gt;

&lt;p&gt;Think of it like renovating a house: you can't install the roof if the foundation is cracked. You can't paint the walls if the plumbing leaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Remediation Strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1-2:&lt;/strong&gt; Fix blocking issues (foundation)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Day 3-4:&lt;/strong&gt; Wire up automation (plumbing)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Day 5:&lt;/strong&gt; Clean up medium issues (finishing touches)&lt;/p&gt;

&lt;p&gt;This bottom-up approach ensures each layer is stable before building on top of it.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;How to Actually Fix This&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Issue #1: Docker Target Mismatch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"AS "&lt;/span&gt; apps/api/Dockerfile &lt;span class="c"&gt;# See what stage names actually exist&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"target:"&lt;/span&gt; .github/workflows/&lt;span class="k"&gt;*&lt;/span&gt;.yml &lt;span class="c"&gt;# See what CI requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Option A: Fix the composite action (recommended)&lt;/span&gt;
&lt;span class="c1"&gt;# .github/actions/build-push/action.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;runner&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Match Dockerfile stage name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option B: Fix the Dockerfile&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:20-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;app # ✅ Match action target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Docker multi-stage builds use &lt;code&gt;FROM ... AS &amp;lt;name&amp;gt;&lt;/code&gt; to label stages. The &lt;code&gt;--target&lt;/code&gt; flag tells Docker which stage to stop at. Mismatched names = build failure.&lt;/p&gt;




&lt;p&gt;Issue #2: Staging Terraform Undefined Module&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;envs/staging
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"module&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; outputs.tf &lt;span class="c"&gt;# Find all module references&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s1"&gt;'module "'&lt;/span&gt; main.tf &lt;span class="c"&gt;# Find all module declarations&lt;/span&gt;
&lt;span class="c"&gt;# Names must match exactly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# outputs.tf (BEFORE)&lt;/span&gt;
&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"api_url"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloud_run_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_url&lt;/span&gt; &lt;span class="c1"&gt;# ❌&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# outputs.tf (AFTER)&lt;/span&gt;
&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"api_url"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_url&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Match actual module name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Validation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
terraform validate &lt;span class="c"&gt;# Must pass&lt;/span&gt;
terraform plan &lt;span class="c"&gt;# Should show changes, not errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Terraform's output system requires module references to exist in the configuration. This is caught during the validation phase, which checks internal consistency without cloud access.&lt;/p&gt;




&lt;p&gt;Issue #3: Production Variable Mismatch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check what the module expects&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;modules/cloud-run/variables.tf

&lt;span class="c"&gt;# Check what production sends&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 10 &lt;span class="s1"&gt;'module "api"'&lt;/span&gt; envs/prod/main.tf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# envs/prod/main.tf (BEFORE)&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../modules/cloud-run"&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-prod"&lt;/span&gt; &lt;span class="c1"&gt;# ❌ Module doesn't have this variable&lt;/span&gt;
  &lt;span class="nx"&gt;container_port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt; &lt;span class="c1"&gt;# ❌&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# envs/prod/main.tf (AFTER)&lt;/span&gt;
&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../modules/cloud-run"&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-prod"&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Match module's variable.tf&lt;/span&gt;
  &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt; &lt;span class="c1"&gt;# ✅&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Terraform modules define a contract through &lt;code&gt;variables.tf&lt;/code&gt;. The calling code must provide values that match these declared variables. Interface mismatches halt plan generation.&lt;/p&gt;




&lt;p&gt;Issue #4: Wrong Directory in CI&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if workflow sets working directory&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 5 &lt;span class="s2"&gt;"defaults:"&lt;/span&gt; .github/workflows/terraform-ci.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/terraform-ci.yml (BEFORE)&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform init&lt;/span&gt; &lt;span class="c1"&gt;# ❌ Runs in repo root&lt;/span&gt;

&lt;span class="c1"&gt;# .github/workflows/terraform-ci.yml (AFTER)&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;working-directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./envs/staging&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Set context&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform init&lt;/span&gt; &lt;span class="c1"&gt;# Now runs in correct directory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Terraform is context-dependent. Without explicit directory specification, commands run in &lt;code&gt;$GITHUB_WORKSPACE&lt;/code&gt; (repo root), where no &lt;code&gt;.tf&lt;/code&gt; files exist for the specific environment.&lt;/p&gt;




&lt;p&gt;Issue #5-6: Missing Deployment Automation&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create:&lt;/strong&gt; &lt;code&gt;.github/workflows/deploy-staging.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to Staging&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;apps/**'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;packages/**'&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup Node&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;20'&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pnpm'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm test&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm lint&lt;/span&gt;

  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Only runs if tests pass&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Auth to GCP&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google-github-actions/auth@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;workload_identity_provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.WIF_PROVIDER }}&lt;/span&gt;
          &lt;span class="na"&gt;service_account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GCP_SA_EMAIL }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build API&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./.github/actions/build-push&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/api/Dockerfile&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-central1-docker.pkg.dev/${{ secrets.GCP_PROJECT }}/images/api&lt;/span&gt;
          &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging-${{ github.sha }}&lt;/span&gt;
          &lt;span class="na"&gt;build-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;runner&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Fix for issue #1&lt;/span&gt;

  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt; &lt;span class="c1"&gt;# ✅ Only runs if build succeeds&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;working-directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./envs/staging&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Auth to GCP&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google-github-actions/auth@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;workload_identity_provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.WIF_PROVIDER }}&lt;/span&gt;
          &lt;span class="na"&gt;service_account&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GCP_SA_EMAIL }}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup Terraform&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hashicorp/setup-terraform@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform init&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform plan -var="image_tag=staging-${{ github.sha }}" -out=tfplan&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform apply -auto-approve tfplan&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; The &lt;code&gt;needs:&lt;/code&gt; keyword creates job dependencies. GitHub Actions won't run &lt;code&gt;build&lt;/code&gt; until &lt;code&gt;test&lt;/code&gt; succeeds, won't run &lt;code&gt;deploy&lt;/code&gt; until &lt;code&gt;build&lt;/code&gt; succeeds. This is the "gating" that was missing.&lt;/p&gt;




&lt;p&gt;Issue #7: CI Doesn't Gate Deployments&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Already solved&lt;/strong&gt; in Issue #5-6. The key is the &lt;code&gt;needs:&lt;/code&gt; chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;test → build → deploy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step must complete successfully before the next begins.&lt;/p&gt;




&lt;p&gt;Issue #8: URL Configuration Gaps&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create centralized config:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# envs/staging/terraform.tfvars&lt;/span&gt;
&lt;span class="nx"&gt;project_id&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"myproject-staging"&lt;/span&gt;
&lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-central1"&lt;/span&gt;

&lt;span class="nx"&gt;domains&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"api-staging.myapp.com"&lt;/span&gt;
  &lt;span class="nx"&gt;web&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"staging.myapp.com"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use in module:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# modules/cloud-run/main.tf&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_cloud_run_service"&lt;/span&gt; &lt;span class="s2"&gt;"service"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;

  &lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;containers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;image&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;image&lt;/span&gt;

        &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"API_URL"&lt;/span&gt;
          &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://${var.api_domain}"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"WEB_URL"&lt;/span&gt;
          &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://${var.web_domain}"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_cloud_run_domain_mapping"&lt;/span&gt; &lt;span class="s2"&gt;"domain"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;custom_domain&lt;/span&gt;

  &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;route_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_cloud_run_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Update GitHub Secrets:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh secret &lt;span class="nb"&gt;set &lt;/span&gt;STAGING_API_URL &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"https://api-staging.myapp.com"&lt;/span&gt;
gh secret &lt;span class="nb"&gt;set &lt;/span&gt;STAGING_WEB_URL &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s2"&gt;"https://staging.myapp.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Cloud Run requires domain validation and DNS configuration. Without these URLs in Terraform, the platform can't set up SSL certificates or route external traffic correctly.&lt;/p&gt;

&lt;p&gt;Issues #12-15: Security Tooling Integration Conflicts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick diagnosis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if Trivy actually fails the job on findings&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 10 &lt;span class="s2"&gt;"trivy"&lt;/span&gt; .github/workflows/&lt;span class="k"&gt;*&lt;/span&gt;.yml
&lt;span class="c"&gt;# Look for: exit-code: '1' and severity threshold&lt;/span&gt;

&lt;span class="c"&gt;# Check for duplicate scanning&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"scan&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;trivy&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;falco&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;vulnerability"&lt;/span&gt; .github/workflows/&lt;span class="k"&gt;*&lt;/span&gt;.yml

&lt;span class="c"&gt;# Check Falco rules for Cloud Run compatibility&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;falco-rules/custom-rules.yaml | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"container&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;syscall"&lt;/span&gt;

&lt;span class="c"&gt;# Check if Container Analysis is enabled&lt;/span&gt;
gcloud services list &lt;span class="nt"&gt;--enabled&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;containeranalysis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix — Option A: GCP Native (simpler):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consolidate on GCP's built-in security tooling and remove &lt;br&gt;
redundant third-party tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/deploy-staging.yml&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;security-scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Scan image with Trivy&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@master&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;image-ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.IMAGE_TAG }}&lt;/span&gt;
          &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sarif'&lt;/span&gt;
          &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;        &lt;span class="c1"&gt;# ✅ Actually fails the job&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CRITICAL,HIGH'&lt;/span&gt;
          &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;trivy-results.sarif'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Upload results to Security Command Center&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github/codeql-action/upload-sarif@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;sarif_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;trivy-results.sarif'&lt;/span&gt;

  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;security-scan&lt;/span&gt;  &lt;span class="c1"&gt;# ✅ Deploy only if scan passes&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;...&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# modules/cloud-run/main.tf&lt;/span&gt;
&lt;span class="c1"&gt;# Use GCP Binary Authorization instead of Falco for deploy-time enforcement&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_binary_authorization_policy"&lt;/span&gt; &lt;span class="s2"&gt;"policy"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;

  &lt;span class="nx"&gt;default_admission_rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;evaluation_mode&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"REQUIRE_ATTESTATION"&lt;/span&gt;
    &lt;span class="nx"&gt;enforcement_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ENFORCED_BLOCK_AND_AUDIT_LOG"&lt;/span&gt;

    &lt;span class="nx"&gt;require_attestations_by&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="nx"&gt;google_binary_authorization_attestor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;trivy_passed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix — Option B: Trivy + Falco (more control):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Keep both tools but define clear ownership boundaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Trivy owns: pre-deploy image scanning (CI gate)&lt;/span&gt;
&lt;span class="c1"&gt;# Falco owns: runtime anomaly detection (post-deploy monitoring)&lt;/span&gt;
&lt;span class="c1"&gt;# Security Command Center owns: compliance reporting (audit trail)&lt;/span&gt;
&lt;span class="c1"&gt;# Container Analysis: disabled (redundant with Trivy)&lt;/span&gt;

&lt;span class="c1"&gt;# .github/workflows/deploy-staging.yml&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Trivy scan&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@master&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;image-ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.IMAGE_TAG }}&lt;/span&gt;
          &lt;span class="na"&gt;exit-code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;          &lt;span class="c1"&gt;# ✅ Hard gate&lt;/span&gt;
          &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CRITICAL'&lt;/span&gt;
          &lt;span class="na"&gt;ignore-unfixed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;    &lt;span class="c1"&gt;# Reduce noise&lt;/span&gt;

  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scan&lt;/span&gt;  &lt;span class="c1"&gt;# ✅ Trivy must pass&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;...&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# falco-rules/cloud-run-rules.yaml&lt;/span&gt;
&lt;span class="c1"&gt;# Tune Falco to ignore Cloud Run startup behavior&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Unexpected syscall in container&lt;/span&gt;
  &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Detect anomalous syscalls at runtime&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;spawned_process and container&lt;/span&gt;
    &lt;span class="s"&gt;and not proc.name in (cloud_run_allowed_processes)&lt;/span&gt;
    &lt;span class="s"&gt;and not container.image.repository contains "gcr.io/cloudrun"&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unexpected&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;process&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;%proc.name&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;%container.name"&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;WARNING&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;macro&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cloud_run_allowed_processes&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;proc.name in (node, python, java, nginx, sh, bash)&lt;/span&gt;
    &lt;span class="s"&gt;and not proc.cmdline contains "curl metadata"  # Block SSRF attempts&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix for Security Command Center duplicate findings:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Disable Container Analysis if using Trivy (avoid duplicates)&lt;/span&gt;
gcloud services disable containeranalysis.googleapis.com

&lt;span class="c"&gt;# OR: Configure SCC to deduplicate findings&lt;/span&gt;
gcloud scc settings update &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--organization&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_ORG_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-asset-discovery&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Each security tool has a defined role with clear &lt;br&gt;
enforcement boundaries. Trivy gates at build time. Falco monitors &lt;br&gt;
at runtime. Security Command Center handles compliance reporting. &lt;br&gt;
No overlaps, no gaps, no false sense of security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The architectural principle:&lt;/strong&gt; &lt;br&gt;
Security tools should be additive in coverage, not redundant in scope.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;&lt;u&gt;Common Gotchas During Remediation&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"I fixed the Dockerfile but CI still fails"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Check if the composite action caches the old target name. Clear workflow cache or update the action's default input.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Terraform validate passes but plan fails"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ You're probably in the wrong directory. Check &lt;code&gt;pwd&lt;/code&gt; in your CI logs and verify &lt;code&gt;working-directory&lt;/code&gt; is set.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Images build but Cloud Run deployment fails"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Service account permissions (Layer 2). Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud projects get-iam-policy YOUR_PROJECT &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--flatten&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bindings[].members"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bindings.members:serviceAccount:*@cloudbuild.gserviceaccount.com"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚩 &lt;strong&gt;"Firebase deployment fails with region conflict"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Check org policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud resource-manager org-policies describe &lt;span class="se"&gt;\&lt;/span&gt;
  constraints/gcp.resourceLocations &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_PROJECT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🚩 &lt;strong&gt;"Variables are undefined in running container"&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
→ Don't put them in the Dockerfile. Inject via Terraform's &lt;code&gt;env&lt;/code&gt; blocks in the Cloud Run service definition.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Trivy scan passes but vulnerable images still get deployed"&lt;/strong&gt;&lt;br&gt;
→ Check exit-code configuration. Trivy reports findings by default &lt;br&gt;
but doesn't fail the job unless &lt;code&gt;exit-code: '1'&lt;/code&gt; is explicitly set &lt;br&gt;
with a severity threshold.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Falco generates hundreds of alerts on Cloud Run startup"&lt;/strong&gt;&lt;br&gt;
→ Cloud Run has a specific startup sequence that triggers generic &lt;br&gt;
Falco rules. Add Cloud Run-specific macros to your custom rules &lt;br&gt;
to filter legitimate startup behavior.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Security Command Center shows the same CVE from 3 different sources"&lt;/strong&gt;&lt;br&gt;
→ You have overlapping scanners. Decide on a single source of truth &lt;br&gt;
(Trivy OR Container Analysis, not both) and disable the redundant one.&lt;/p&gt;

&lt;p&gt;🚩 &lt;strong&gt;"Binary Authorization blocks deployment after security scan passes"&lt;/strong&gt;&lt;br&gt;
→ The attestor isn't linked to your Trivy results. The attestation &lt;br&gt;
step must explicitly create a Binary Authorization attestation after &lt;br&gt;
a successful scan.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;u&gt;What This Analysis Doesn't Cover&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If this was real infrastructure, you would need to check the next points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform state drift (manual changes in GCP)&lt;/li&gt;
&lt;li&gt;Networking/DNS configuration details&lt;/li&gt;
&lt;li&gt;Secret management implementation&lt;/li&gt;
&lt;li&gt;The full history of how the system reached this state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;But:&lt;/strong&gt; For declared issues, these are all the documented root causes according to official Terraform, Docker, GitHub Actions, and GCP documentation.&lt;/p&gt;

&lt;p&gt;Think of this as: &lt;strong&gt;symptoms → probable diagnosis&lt;/strong&gt;. The real fix needs hands on the actual system.&lt;/p&gt;




&lt;p&gt;Visual: The 3-Layer Problem&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdewdmxnw7mkgzlrb001c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdewdmxnw7mkgzlrb001c.png" alt=" " width="355" height="1860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fix bottom-up, not top-down.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;u&gt;Conclusion&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Infrastructure failures rarely have a single cause. What looks like "broken Terraform" is usually a combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuration errors (Layer 1)&lt;/li&gt;
&lt;li&gt;Platform evolution you didn't track (Layer 2)&lt;/li&gt;
&lt;li&gt;Missing architectural decisions (Layer 3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix isn't just correcting syntax — it's understanding how these layers interact and building a system that's resilient to change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Diagnose in layers.&lt;/strong&gt; Don't stop at the obvious errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix in order.&lt;/strong&gt; Foundation before plumbing before paint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build in gates.&lt;/strong&gt; Make it impossible for broken code to reach production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document decisions.&lt;/strong&gt; Future you (or the next developer) needs context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope honestly.&lt;/strong&gt; Complex infrastructure work takes time. Price accordingly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal isn't just to fix what's broken today — it's to build a system that won't break the same way tomorrow.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>gcp</category>
      <category>githubactions</category>
      <category>docker</category>
    </item>
    <item>
      <title>Static IP Addresses for GKE Outbound Traffic: A Practical Guide to Cloud NAT</title>
      <dc:creator>lssh</dc:creator>
      <pubDate>Tue, 10 Feb 2026 03:01:31 +0000</pubDate>
      <link>https://dev.to/lbcristaldo/static-ip-addresses-for-gke-outbound-traffic-a-practical-guide-to-cloud-nat-1ie8</link>
      <guid>https://dev.to/lbcristaldo/static-ip-addresses-for-gke-outbound-traffic-a-practical-guide-to-cloud-nat-1ie8</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To get a &lt;strong&gt;fixed public IP&lt;/strong&gt; for your GKE cluster's outbound traffic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reserve a regional static IP
&lt;/li&gt;
&lt;li&gt;Create a Cloud Router in the same region
&lt;/li&gt;
&lt;li&gt;Configure Cloud NAT with &lt;strong&gt;Manual IP assignment&lt;/strong&gt; using that reserved IP
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Done! All outbound traffic from your pods will always exit through the same IP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;The problem:&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your application running on Google Kubernetes Engine (GKE) needs to connect to an external database that requires IP whitelisting. But pods in Kubernetes have ephemeral IPs that change constantly. The solution? &lt;strong&gt;Cloud NAT with manual static IP assignment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Why is this necessary?&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In modern microservice architectures, it's common for Kubernetes applications to need access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managed databases in other GCP projects&lt;/li&gt;
&lt;li&gt;Third-party APIs with strict firewall policies&lt;/li&gt;
&lt;li&gt;Legacy services that only allow access from known IPs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge: GKE nodes (especially in private clusters) don't have fixed public IPs, making it impossible to maintain a stable whitelist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;The solution:&lt;/u&gt;&lt;/strong&gt; Cloud NAT with manual assignment&lt;/p&gt;

&lt;p&gt;Cloud NAT (Network Address Translation) acts as a gateway that translates your cluster's internal private addresses to a fixed, predictable public IP address.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Step-by-step implementation:&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reserve a static IP address&lt;/p&gt;

&lt;p&gt;First, reserve a regional IP that we'll use as the public "face" of our cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute addresses create nat-static-ip &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important note:&lt;/strong&gt; The IP must be in the same region as your GKE cluster.&lt;/p&gt;

&lt;p&gt;— Create a Cloud Router&lt;/p&gt;

&lt;p&gt;Cloud NAT requires a Cloud Router, which acts as the control plane for NAT configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute routers create nat-router &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-vpc &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;— Configure Cloud NAT with manual assignment&lt;/p&gt;

&lt;p&gt;This is the critical step. You must choose &lt;strong&gt;manual assignment&lt;/strong&gt; (not automatic) to ensure the IP remains fixed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute routers nats create nat-config &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--router&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nat-router &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--nat-external-ip-pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nat-static-ip &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--nat-all-subnet-ip-ranges&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--nat-external-ip-pool&lt;/code&gt; flag specifies the static IP we reserved in step 1.&lt;/p&gt;

&lt;p&gt;— Add the IP to your destination's whitelist&lt;/p&gt;

&lt;p&gt;Once Cloud NAT is configured, all outbound traffic from your cluster will use the static IP. You can now confidently add it to your database or external service's firewall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Key benefits&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Persistence: The IP won't change even if the cluster restarts or nodes are recreated.&lt;/p&gt;

&lt;p&gt;Security: Your GKE nodes can remain in private subnets without public IPs, reducing your attack surface.&lt;/p&gt;

&lt;p&gt;Scalability: Cloud NAT is a managed service that scales automatically without impacting performance.&lt;/p&gt;

&lt;p&gt;No application changes: If you use GitOps with ArgoCD, you don't need to modify your deployments. Configuration is entirely at the infrastructure level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Important considerations&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Capacity management: In manual assignment mode, you're responsible for calculating how many IPs/ports you need. If your cluster grows significantly, you might experience  &lt;code&gt;OUT_OF_RESOURCES&lt;/code&gt; errors.&lt;/p&gt;

&lt;p&gt;Monitoring: Set up alerts for NAT port utilization to detect issues before they impact production.&lt;/p&gt;

&lt;p&gt;Alternatives: For very specific use cases (like custom NAT logic or complex firewall requirements), consider whether a manual NAT instance might be more appropriate, though this increases operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;When to use this solution?&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✅ You need to communicate with services requiring IP whitelisting&lt;br&gt;&lt;br&gt;
✅ You run private GKE clusters&lt;br&gt;&lt;br&gt;
✅ You want a scalable, managed solution&lt;br&gt;&lt;br&gt;
✅ You need compliance and centralized auditing of outbound traffic  &lt;/p&gt;

&lt;p&gt;❌ You have extremely custom NAT logic&lt;br&gt;&lt;br&gt;
❌ You need granular control the managed service doesn't offer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;How to verify it's working&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once configured, you can easily test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a temporary pod and check your public IP&lt;/span&gt;
kubectl run curl-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;radial/busyboxplus:curl &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-s&lt;/span&gt; ifconfig.me

&lt;span class="c"&gt;# Or run continuous checks to confirm the IP stays consistent&lt;/span&gt;
kubectl run &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;--tty&lt;/span&gt; curl-test &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;radial/busyboxplus:curl &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"while true; do curl -s ifconfig.me; echo; sleep 2; done"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see your reserved static IP returned consistently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Common issues and how to fix them&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IP keeps changing&lt;/strong&gt; → Double-check that you selected &lt;strong&gt;"Manual"&lt;/strong&gt; (not "Automatic") in your Cloud NAT configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reserved IP in wrong region&lt;/strong&gt; → The static IP and Cloud NAT must be in the &lt;strong&gt;same region&lt;/strong&gt; as your GKE cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pods still using dynamic IPs&lt;/strong&gt; → Ensure the NAT is applied to the subnetwork where your GKE cluster runs (NAT configuration → "Selected subnetworks").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using GKE Autopilot&lt;/strong&gt; → It works exactly the same. No special configuration needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No traffic showing in NAT&lt;/strong&gt; → Wait 2-3 minutes after applying changes (Cloud NAT takes a moment to propagate).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Conclusion&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud NAT with manual IP assignment is GCP's standard solution for this common use case. It's reliable, scalable, and relatively simple to configure. Most importantly: it allows you to keep your resources secure in private networks while maintaining controlled connectivity to the outside world.&lt;/p&gt;

&lt;p&gt;Have you implemented Cloud NAT in your infrastructure? What challenges did you encounter? &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcn9zuox7foh3j3iw9sq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcn9zuox7foh3j3iw9sq.png" alt=" " width="679" height="840"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gcp</category>
      <category>cloudnative</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
