How to Fix Azure Application Gateway Stuck in "Failed" Provisioning State (Without Downtime)
If you're using Azure Application Gateway with AKS and AGIC, you might encounter the dreaded "Failed" provisioning state. Here's a quick fix that saved my Sunday.
The Problem
After deploying a new container image, my Application Gateway got stuck:
az network application-gateway show \
--name ingress-appgateway \
--resource-group MC_my-aks-cluster_uksouth \
--query "provisioningState" -o tsv
# Output: Failed
The gateway was still running (operational), but any configuration updates would fail with:
"Last configuration update operation on this Application Gateway failed."
This meant:
- AGIC couldn't sync new pod IPs to backend pools → 502 errors
- SSL certificate renewals couldn't complete
- Any ingress changes were blocked
The Nuclear Option (What I Almost Did)
Stop and start the gateway:
az network application-gateway stop --name ingress-appgateway --resource-group MC_...
az network application-gateway start --name ingress-appgateway --resource-group MC_...
This works, but causes ~2 minutes of downtime for all services behind the gateway.
The Better Fix (Zero Downtime)
Force a re-commit of the configuration by making a trivial change:
az network application-gateway update \
--name ingress-appgateway \
--resource-group MC_my-aks-cluster_uksouth \
--set tags.reset="reset-$(date +%s)"
This triggers Azure to re-process the entire configuration, which resets the provisioning state back to "Succeeded" - without any downtime.
Verify It Worked
az network application-gateway show \
--name ingress-appgateway \
--resource-group MC_my-aks-cluster_uksouth \
--query "provisioningState" -o tsv
# Output: Succeeded 🎉
Why Does This Happen?
The "Failed" state usually occurs when:
- Azure has a transient issue during a config update
- AGIC tries to sync while Azure is having problems
- SSL certificate operations fail mid-way
The gateway keeps running with its last good config, but gets stuck unable to accept new changes.
TL;DR
# Check state
az network application-gateway show --name YOUR_APPGW --resource-group YOUR_RG --query "provisioningState" -o tsv
# Fix it (no downtime)
az network application-gateway update --name YOUR_APPGW --resource-group YOUR_RG --set tags.reset="reset-$(date +%s)"
Credit to this Microsoft Q&A thread for the PowerShell equivalent that pointed me in the right direction.
Top comments (0)