DEV Community

Falolu Olaitan
Falolu Olaitan

Posted on

From Helm AGIC Headaches to the AKS Add-on: a Real-World Migration + Troubleshooting Playbook

This write-up distills exactly what we just did: triaging an aging Helm-based AGIC install, fixing identity and tooling gotchas, and cleanly migrating to the AKS ingress-appgw add-on while keeping the same Application Gateway and public IP. I’m keeping it practical—commands, failure modes, and what to check next.


The situation we started with

  • AGIC (Helm) was old (1.5.x era) and running in default namespace.
  • It still used AAD Pod Identity patterns (aadpodidbinding, USE_MANAGED_IDENTITY_FOR_POD), which are deprecated in favor of Azure Workload Identity (WI).
  • A bunch of confusing errors popped up:
  • AGIC couldn’t get tokens (“Identity not found”) after UAMI changes.
  • APPGW_RESOURCE_ID was corrupted to C:/Program Files/Git/... (Git Bash path conversion).
  • An invalid API version warning (older CLI/extensions) during scripting.
  • AGIC logs showed malformed ARM targets like Subscription="Git" or empty Name—classic signs of a broken config map.
  • We also needed to keep the same IP (52.157.252.178) on the existing App Gateway.

Key decisions

  1. Stop fighting the old chart. Microsoft moved AGIC Helm charts to OCI on MCR; the old blob repo is retired. If you stay on Helm, pull from oci://mcr.microsoft.com/azure-application-gateway/charts/ingress-azure and use Workload Identity
  2. Prefer the AKS add-on for simplicity (identity + RBAC wiring handled for you). You can point it at an existing App Gateway—no new IP if you pass --appgw-id.
  3. Ensure the gateway is v2 SKU (Standard_v2 or WAF_v2); AGIC requires v2.

What actually fixed things (chronologically)

  1. Kill Git Bash path mangling If you must use Git Bash on Windows, disable MSYS path conversion so Azure resource IDs don’t become C:\Program Files\Git..
export MSYS_NO_PATHCONV=1
export MSYS2_ARG_CONV_EXCL="*"

Enter fullscreen mode Exit fullscreen mode
  1. Stop the old Helm controller Running two controllers (Helm + add-on) leads to churn. Uninstall Helm, or at minimum ensure only one is active:
helm uninstall ingress-azure -n default
kubectl -n default delete deploy,sa,cm,clusterrole,clusterrolebinding -l app=ingress-azure --ignore-not-found=true

Enter fullscreen mode Exit fullscreen mode

If you keep Helm instead of the add-on, upgrade to the OCI chart and Workload Identity

  1. Enable the AKS add-on against the existing App Gateway This reuses the same gateway and keeps your IP
APPGW_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/applicationGateways/<name>"
az aks enable-addons -g <rg> -n <cluster> -a ingress-appgw --appgw-id "$APPGW_ID"

Enter fullscreen mode Exit fullscreen mode

Microsoft’s tutorial covers enabling the add-on on an existing AKS and existing App Gateway (even in separate VNets)

  1. Check identity/RBAC for the add-on The add-on wires a user-assigned identity in the node resource group (MC_...). Give it rights on the gateway:
ADDON_MI="/subscriptions/<sub>/resourceGroups/<mc_rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<addon-mi>"
ADDON_PRINCIPAL=$(az identity show --ids "$ADDON_MI" --query principalId -o tsv)

# Required at minimum:
az role assignment create --assignee "$ADDON_PRINCIPAL" --role "Contributor" --scope "$APPGW_ID"

# Helpful read scope at RG (prevents odd read failures of related objects):
az role assignment create --assignee "$ADDON_PRINCIPAL" --role "Reader" \
  --scope "/subscriptions/<sub>/resourceGroups/<gateway-rg>"

Enter fullscreen mode Exit fullscreen mode
  1. Confirm AGIC is actually watching your Ingress
    AGIC processes Ingresses with kubernetes.io/ingress.class: azure/application-gateway or spec.ingressClassName: azure/application-gateway. Your manifest already has the legacy annotation, which is fine.

  2. Make sure your Services have Endpoints
    Most “backend not updated” cases are just Services resolving to zero endpoints (selectors don’t match pods, wrong targetPort, probes failing). AGIC won’t add pool members without endpoints:

kubectl -n default get svc <name> -o wide
kubectl -n default get endpoints <name> -o wide

Enter fullscreen mode Exit fullscreen mode

Why the original errors happened (and how to recognize them)

If you stay on Helm instead of the add-on
Use the OCI chart and Workload Identity:

# Enable OIDC + WI
az aks update -g <rg> -n <cluster> --enable-oidc-issuer --enable-workload-identity

# Federate your UAMI to the service account AGIC uses
AKS_OIDC_ISSUER=$(az aks show -g <rg> -n <cluster> --query oidcIssuerProfile.issuerUrl -o tsv)
az identity federated-credential create \
  --name agic \
  --identity-name <your-uami> \
  --resource-group <rg> \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject "system:serviceaccount:<ns>:<sa>"

IDENTITY_CLIENT_ID=$(az identity show -g <rg> -n <your-uami> --query clientId -o tsv)
APPGW_ID="/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Network/applicationGateways/<name>"

# Upgrade/install from OCI chart on MCR
helm upgrade --install ingress-azure oci://mcr.microsoft.com/azure-application-gateway/charts/ingress-azure \
  -n <ns> \
  --set appgw.applicationGatewayID="$APPGW_ID" \
  --set armAuth.type=workloadIdentity \
  --set armAuth.identityClientID="$IDENTITY_CLIENT_ID" \
  --set rbac.enabled=true

Enter fullscreen mode Exit fullscreen mode

NOTE: When you enable the add-on with --appgw-id, it reuses your existing App Gateway and therefore keeps the same public IP. Your DNS records pointing at that IP don’t need to change. Creating the add-on without --appgw-id would create a new gateway (and new IP)

Top comments (0)