daniel jeong

Posted on Jun 1 • Originally published at manoit.co.kr

Argo CD 3.4 Deep Dive: Cluster Pause Reconciliation, Helm valueFiles Globs & Source Hydrator Commit Authorship

#kubernetes #cicd #devops #gitops

Anyone who has moved GitOps from demo to production knows the hard part isn't deploying — it's everything after, the so-called Day-2 operations. An incident hits at midnight, but the Argo CD controller keeps stubbornly reconciling everything back to its "desired state." Your values files have multiplied into dozens per environment and you'd kill for a single glob. Hydration commits give no clue who authored them. And on dual-stack clusters, mysterious DNS timeouts quietly eat away at the controller.

Argo CD 3.4 (GA in May 2026, first stable tag v3.4.1) takes direct aim at exactly these operational pains. As the official v3.4.1 release notes put it, the focus of this cycle is Day-2: incident management, alert routing, and Helm template flexibility. This article breaks down the root cause behind five of 3.4's key features from an operations and architecture angle, then lays out how ManoIT rolled them into our internal multi-cluster GitOps.

1. Why 3.4 — Quarterly Cadence, Center of Gravity Shifts to Day-2

Some context first. Argo CD ships a minor release once per quarter (every 3 months), and only the three most recent minor versions get patches. If 3.2 was about UI and performance and 3.3 established the Source Hydrator (the rendered-manifests pattern that hydrates manifests into a separate branch), then 3.4 sits on top of that and asks: "in production, what do we pause, what do we track, and what do we route?" The feature freeze locked at v3.4.0-rc2, GA landed early May 2026, and patches followed quickly — v3.4.3 arrived on May 28, 2026.

Version	GA	Theme	Headline
3.2	H2 2025	UI / performance	UI overhaul, controller perf
3.3	Early 2026	Rendered manifests	Source Hydrator, PreDelete Hooks
3.4	2026-05	Day-2 operations	Cluster pause, Helm globs, hydrator commit author, AppSet Watch, gRPC DNS TXT off

3.4 is an operations-hardening release with few breaking changes, but two environment shifts must be checked before upgrading (see section 7). First, the new features.

2. Per-Cluster Pause Reconciliation — A New Standard for Incident Response

Until now, "pausing reconciliation" in Argo CD meant per-application (switching an Application's sync policy to manual, or applying a sync window). The problem: the unit of an incident is often an entire cluster. When a target cluster is unstable but the controller keeps pushing hundreds of its apps toward desired state, unintended rollbacks and redeploys pile on in the middle of an outage and make things worse.

3.4 introduces an annotation that pauses reconciliation for an entire cluster (PR #26442). Add the pause annotation to the cluster secret (or target resource) and the controller stops attempting reconciles against that cluster. It's exactly "hitting the brakes at the cluster level."

# Add the pause annotation to a cluster secret -> reconcile halts for that cluster
apiVersion: v1
kind: Secret
metadata:
  name: cluster-prod-apac
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
  annotations:
    # WARNING: pausing only stops "automatic convergence to desired state."
    # Already-running workloads keep running, and drift can accumulate.
    argocd.argoproj.io/pause-reconciliation: "true"
type: Opaque
stringData:
  name: prod-apac
  server: https://k8s-prod-apac.internal:6443

Ops tip: read pause as "observation continues, only auto-convergence stops." Drift accumulates during the incident, so right after un-pausing, always inspect the diff first and sync only the intended changes.

3. Helm valueFiles Wildcard Globs — Taming the values File Explosion

Run multi-env, multi-region and your values files grow exponentially: values-base.yaml, values-prod.yaml, values-prod-apac.yaml, values-feature-x.yaml… Previously you had to list each one in valueFiles, and every new file meant editing the Application manifest too.

3.4 supports wildcard glob patterns in valueFiles (PR #26768, cherry-picked to 3.4 as #26919). Get your directory convention right and you can pull in "every environment file under values/" with a single line.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments
  namespace: argocd
spec:
  source:
    repoURL: https://git.internal/manoit/payments-chart
    targetRevision: main
    path: chart
    helm:
      valueFiles:
        - values/base.yaml
        # 3.4: collect env values via glob (sort order = merge order)
        - "values/prod-*.yaml"
  destination:
    server: https://kubernetes.default.svc
    namespace: payments

WARNING: globs merge matched files in sorted order. Helm lets later values override earlier ones, so control merge precedence explicitly with filename prefixes (e.g. 10-, 20-). 90% of "why isn't my prod value taking effect?" is a merge-order problem.

3.4 also added the ability to send custom User-Agent headers for Helm repository requests (PR #25473) — handy when an internal artifact proxy or OCI registry requires client identification.

4. Source Hydrator — Commit Authorship and UI Integration

The Source Hydrator that landed in 3.3 is a first-class implementation of the rendered-manifests pattern: it renders the source (dry source) kept in Git and commits it to a separate hydrated branch. Put it into production, though, and one thing grates immediately — every hydration commit has the same anonymous author, rendering audit logs and git blame meaningless.

3.4 makes the authorName/Email used for hydration commits configurable (PR #25746). It stamps an identity into the commit metadata — "this hydrate commit was made by which environment/bot" — restoring audit trails and accountability. After applying the setting, you can verify the identity is stamped correctly straight from the hydrated branch's commit log.

# Verify the hydrated branch's commit author is stamped with the bot identity
# (Source Hydrator renders the dry source and commits it to a separate hydrated branch)
git fetch origin environments/prod
git log origin/environments/prod --pretty='%h | %an <%ae> | %s' -n 5

# Expected output - before authorName/Email is set: anonymous/identical author
#   3f1a2b9 | argocd <argocd@noreply> | hydrate: payments @ a1b2c3d
# After: environment/bot identity is clearly distinguished
#   9c8d7e6 | argocd-hydrator-prod <gitops-bot+prod@manoit.co.kr> | hydrate: payments @ a1b2c3d

On top of that came UI integration — you can enable the hydrator from the app-create panel (#26485) and view hydrator properties directly in the Summary tab (#26152). On the stability side, GetDrySource() was fixed to preserve all source-type fields (cherry-pick #27189→#27196), and a batch of 3.3-era hydrator bugs (missing hydrated SHA on no-ops, missing creds) were cleaned up.

Area	3.4 change	Operational effect
Commit metadata	authorName/Email configurable (#25746)	Restores audit log / blame
UI	Hydrator toggle in create panel (#26485), Summary tab exposure (#26152)	Config visibility, fewer mistakes
Stability	GetDrySource field preservation, no-op SHA/creds fixes	Higher hydration reliability

5. ApplicationSet Operability — Health Field, Watch, listResourceEvents

The real unit of large-scale GitOps isn't a single Application — it's the ApplicationSet (AppSet). In a structure that stamps out tens to hundreds of apps at once via cluster/directory/Git generators, 3.4 elevates AppSet into a first-class, operable object.

Health field added to status (#25753) — read overall AppSet health directly from status, no need to manually aggregate hundreds of child apps.
ApplicationSet Watch API (#26409) and listResourceEvents API (#25537) — standard APIs to stream/query AppSet changes and events. External dashboards and automation attach via watch instead of polling.
Controller performance/correctness — the path that fetches cluster secrets was optimized, and AppSets in disallowed namespaces no longer trigger unnecessary reconciles on cluster-secret changes (#25622). A DuckType generator panic on non-string values was also fixed (cherry-pick #27265→#27526).

The UI gained an AppSet slide-out summary, a tree-view detail page, and a list page, completing the "operate AppSets visually" experience.

6. Notification & Networking — appProject Access and gRPC DNS TXT Opt-Out

Notifications are also core to Day-2. 3.4 lets notification templates access appProject information (#26470) — so you can put "which project's app failed to sync" directly into the alert body, sharpening routing accuracy. It also exposes the notifications controller's processors count as a command parameter (#26798) to tune throughput in high-volume alert environments.

The most operationally relevant networking change is disabling gRPC service-config DNS TXT lookups by default (#26077). It looks small but the root cause runs deep — in dual-stack (IPv4+IPv6) Kubernetes environments, gRPC clients excessively queried DNS TXT records looking for service config, causing timeouts and latency. 3.4 turns that lookup off by default, improving controller stability on dual-stack clusters.

If you've experienced "Argo CD intermittently slowing down" on a dual-stack cluster, the 3.4 upgrade alone may make the symptom disappear. This change is a default, so no extra configuration is required.

7. Upgrade Watch-Outs — Helm 3.19 K8s Version Interpretation, Dex 2.45, MS Teams O365 Connectors

3.4 is an operations-hardening release with a light migration burden, but check these three before upgrading.

Item	Change	Action
Helm 3.19.0	How Helm interprets the K8s cluster version changed → Argo CD aligns to it	Regression-test charts that depend on `.Capabilities.KubeVersion` rendering
Dex 2.45.0	Bundled Dex version upgrade (SSO)	Validate Dex connector config / OIDC flow in staging
MS Teams notifications	Microsoft deprecates and removes legacy Office 365 Connectors	Migrate Teams webhook delivery to the new mechanism

WARNING: if you were sending Teams notifications via O365 Connector webhooks, this is not "optional" but "required." Microsoft's deprecation breaks the existing path, so alert delivery itself may stop independent of the 3.4 upgrade.

8. ManoIT Internal Adoption Checklist

#	Task	Owner	Done criteria
1	Upgrade to 3.4.x in staging, reflect non-HA→HA manifest diffs	Platform	All apps Synced/Healthy post-upgrade
2	Helm 3.19 K8s version interpretation — regression-test KubeVersion-dependent charts	Chart owners	Render diff = 0 (excl. intended changes)
3	Add cluster pause annotation to the incident runbook	SRE	Pause/resume/diff procedure validated in a mock incident
4	Reorganize per-env values into glob rules (prefix ordering)	Domain owners	Deterministic merge order (snapshot test)
5	Assign per-env bot identity for Source Hydrator commit author (authorName/Email)	Platform	Identity visible in hydrated-branch blame
6	Move external dashboards from polling to AppSet Health field + Watch API	Observability	Lower dashboard latency, fewer API calls
7	Add appProject context to notification templates, migrate MS Teams path	SRE	Per-project routing + Teams delivery working
8	Measure gRPC DNS TXT-off effect (latency p99) on dual-stack clusters	Network	Controller reconcile latency stabilized

9. Conclusion — "The Next GitOps Challenge Isn't Deployment, It's Operations"

In one line, Argo CD 3.4 is a declaration that deployment automation is already a solved problem, and what remains is the Day-2 work of safely pausing, tracking, and routing that automation in the middle of an incident. Per-cluster pause aligns the unit of incident response with reality (the cluster); Helm valueFiles globs collapse the environment explosion into one line; the Source Hydrator's commit authorship returns audit trails to the rendered-manifests pattern. ApplicationSet's Health/Watch/listResourceEvents elevate the real unit of large-scale GitOps to a first-class object, and the gRPC DNS TXT default opt-out quietly removes invisible latency on dual-stack environments.

Three closing recommendations. (1) Before upgrading, check the Helm 3.19 impact and the MS Teams O365 Connector deprecation first — neither tolerates "later." (2) Put cluster pause into your runbook first — it's the change that raises incident-response capability the most for the least code. (3) If you use the Source Hydrator, set the commit author first — auto-commits without an audit trail are a powder keg for operational incidents. The shortest one-line recommendation: "This sprint, bump staging to 3.4 and run the cluster-pause → diff → selective-sync procedure once in a mock incident."

Originally published at ManoIT Tech Blog.

DEV Community