Argo CD 3.3 Changed the Source Hydrator — Here's What to Audit Before You Upgrade
Argo CD v3.3.2 shipped on February 22nd. The release notes are reasonable. The Source Hydrator behavior change gets a few lines. What those lines represent in practice is worth a slower read.
If you're running the Source Hydrator in production — meaning you're using it to generate or transform manifests before they land in your application path — this is the one upgrade note that deserves a dedicated conversation with your team before you merge the Helm chart bump.
Here's what changed, why it matters, and how to audit your setup before upgrading.
What Is the Source Hydrator?
The Source Hydrator is a feature in Argo CD that handles the transformation step between your source repository and your rendered manifests. In a standard Argo CD setup, you point an Application at a Git repo and Argo CD renders the manifests directly. The Source Hydrator adds a middle layer: a controller that runs before sync, processes sources (Helm templates, Kustomize overlays, or custom plugins), and writes the rendered output into a specific path before the sync loop picks it up.
It's designed for teams that want to decouple the manifest generation step from the sync step — useful for auditing rendered output, enforcing policy checks between generation and deployment, or building custom rendering pipelines.
For most clusters, it's not in the critical path. But for teams that have built workflows around it, it's fundamental.
The Old Behavior: Delete First, Write Second
Before v3.3, the Source Hydrator operated with a specific sequence:
- Receive a sync trigger
- Delete all files in the application path
- Run the hydration pipeline
- Write new manifests to the now-empty path
- Signal Argo CD to proceed with sync
Step 2 was deliberate. The idea was clean state: every hydration starts from scratch. No stale manifests from a previous run. No partial overlaps from a configuration change that removed a resource.
# Example Application spec using Source Hydrator
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
source:
repoURL: https://github.com/my-org/my-app
targetRevision: main
path: config/base
destination:
server: https://kubernetes.default.svc
namespace: my-app
hydrator:
enabled: true
outputPath: config/rendered # <-- this path was auto-cleared before v3.3
syncPolicy:
automated:
prune: true
selfHeal: true
The hydrator would clear config/rendered completely before writing the new output. That's the behavior that changed.
Why the Old Behavior Was a Problem
Clean-state semantics sound correct. The failure mode is subtle.
The deletion and the write are not atomic. They happen sequentially. If the write phase fails — for any reason — after the deletion has already completed, you're left with an empty path.
What does an empty application path mean for Argo CD?
It means the sync loop sees no manifests for that application. Depending on your sync policy, this can result in:
- The application entering a
MissingorUnknownstate - Argo CD pruning live resources if
prune: trueis set (it will delete what's running in the cluster because nothing in Git says it should exist) - Automated sync loops re-triggering repeatedly against the empty path
None of these outcomes are recoverable without manual intervention once the sync has propagated. And the failure mode is timing-dependent — most of the time it won't happen. That makes it harder to detect in testing and much more surprising when it fires in production.
The specific conditions that can trigger this:
- Network interruption between the hydration controller and the Git repository during the write phase
- Plugin timeout — a custom rendering plugin that takes longer than expected, causing the hydrator to time out after deletion but before the write completes
- API rate limiting — if your hydration pipeline makes API calls and hits a rate limit after the deletion step
- Partial manifest set — an edge case where the hydration pipeline writes some manifests before failing midway
I've seen the plugin timeout variant. A cluster with a custom rendering step that parsed external config during hydration. Under load, the config fetch would occasionally stall past the hydrator's timeout. The deletion would complete. The write would not. Four minutes passed before the monitoring alert surfaced — and by then the automated sync had already pruned two deployments.
The alert was set on sync state, not path health. That's a gap worth closing regardless of which Argo CD version you're running.
What v3.3 Changes
Argo CD 3.3 removes the automatic deletion step. The Source Hydrator now writes new manifests to the application path without clearing it first.
The new sequence:
- Receive a sync trigger
- Run the hydration pipeline
- Write new manifests into the existing path (overwriting changed files, leaving unchanged files in place)
- Signal Argo CD to proceed with sync
This eliminates the empty-path failure window. If the write fails midway, the previous manifests are still in place. The sync loop operates against a known state.
The tradeoff: stale manifests are no longer cleaned up automatically. If a resource was removed from your source but the rendered manifest file still exists in the output path, it won't be deleted by the hydrator. You need Argo CD's own prune setting to handle that — which it does, but only after the sync runs against the stale manifest.
In practice, for most setups, this is the correct tradeoff. Argo CD's prune behavior handles stale resources correctly. The hydrator's job should be writing manifests, not managing the lifecycle of the path itself.
But if you have custom logic that depends on the auto-deletion — a script that expects the path to be empty before hydration, or a policy check that runs on "fresh" output — you'll need to handle that explicitly in v3.3.
How to Audit Your Setup Before Upgrading
Before upgrading to Argo CD 3.3, run through these checks.
1. Identify all Applications using the Source Hydrator
# List all Applications with the hydrator enabled
kubectl get applications -n argocd -o json | \
jq -r '.items[] | select(.spec.hydrator.enabled == true) | .metadata.name'
For each application that returns, review the output path and any downstream tooling that depends on it.
2. Check for downstream dependencies on the auto-deletion
Look for:
- Pre-sync hooks that assume an empty output path
- CI/CD scripts that generate manifests into the output path and expect the hydrator to clean up previous generations
- Policy scanners that are invoked after hydration and treat the output directory as authoritative (if stale files can persist, the scanner may approve stale resources)
3. Review your monitoring coverage
This is the gap that made the failure silent for too long. Check whether you have alerting on:
- Source Hydrator controller errors (not just application sync state)
- Path health — specifically whether the output path contains a minimum expected number of files
- Time-to-sync — if a hydration run takes longer than expected, alert before it fails
# Example PrometheusRule for Argo CD hydrator errors
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-hydrator-alerts
namespace: argocd
spec:
groups:
- name: argocd-hydrator
rules:
- alert: ArgoCDHydratorError
expr: |
increase(argocd_hydrator_error_total[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "Argo CD Source Hydrator errors detected"
description: "Hydrator errors in the last 5 minutes. Check if manifests were written successfully."
- alert: ArgoCDHydratorSyncDuration
expr: |
argocd_hydrator_duration_seconds{quantile="0.95"} > 60
for: 5m
labels:
severity: warning
annotations:
summary: "Argo CD Source Hydrator taking longer than expected"
description: "P95 hydration duration exceeds 60s — investigate plugin performance."
These won't catch everything, but they close the most obvious visibility gap.
4. Read the migration guide
The v3.2 to v3.3 migration guide in the Argo CD docs has a Source Hydrator section. It's short — three paragraphs. Read it before upgrading. The actual upgrade instructions are straightforward; the value is in understanding the behavioral expectations around the new defaults.
The Broader Point About GitOps Behavior Changes
The Source Hydrator change is a good example of a class of bugs that are easy to miss: behavioral changes in the write path of a GitOps controller.
GitOps tools operate on the assumption that Git is the source of truth. What actually happens between "Git says X" and "cluster state is X" is a pipeline — and each step in that pipeline has behavior that can change between versions.
The sync state is usually well-monitored. The steps before sync — hydration, transformation, validation — often aren't. And they're where behavioral changes tend to hide.
A few habits worth building for any GitOps upgrade:
- Read the full changelog, not just the "what's new" section. Behavior changes often appear under bug fixes.
- Identify every controller in your GitOps pipeline that has a write step. Know what it writes and what it reads.
- Test upgrades in a staging environment that mirrors your production sync policies exactly — including
prunesettings. - Monitor the pipeline, not just the outcome. If the only alert you have is on Application sync state, you're monitoring too late in the chain.
This isn't specific to Argo CD. The same applies to Flux, Helm operators, or any GitOps tooling with a non-trivial sync pipeline. The reliability of your cluster is only as good as your understanding of what the reconciliation loop is actually doing.
Upgrading?
Argo CD 3.3.2 is the current stable release. If you're on 3.2.x and using the Source Hydrator, the upgrade path is documented and straightforward — the behavioral change itself is a safety improvement. The main work is auditing what you've built around the old behavior.
If you're not using the Source Hydrator, this change doesn't affect you. Upgrade normally.
And if you're evaluating whether to adopt the Source Hydrator feature for a new pipeline — v3.3's write behavior is the one you want to build around. The old delete-first model had a correctness problem that made it unsuitable for production pipelines where write failures were possible under load.
The feature is now safer to adopt. That's the right direction.
Top comments (0)