DEV Community

Cover image for Progressive Delivery on AKS: A Step-by-Step Guide using Flagger with Istio and FluxCD
Paul Yu for Microsoft Azure

Posted on • Originally published at paulyu.dev on

Progressive Delivery on AKS: A Step-by-Step Guide using Flagger with Istio and FluxCD

In my previous post, we setup an Azure Kubernetes Service (AKS) cluster to automatically update images based on new image tags in a container registry. As soon as a new image was pushed to the registry the image was immediately updated.

But what if you don't want an agent automatically pushing out new images without some sort of testing? 🤔

In this article, we'll build upon Flux's image update automation capability and add Flagger to implement a canary release strategy.

Flagger is a progressive delivery tool that enables a Kubernetes operator to automate the promotion or rollback of deployments based on metrics analysis. It supports a variety of metrics including Prometheus, Datadog, and New Relic to name a few. It also works well with Istio service mesh, and can implement progressive traffic splitting between primary and canary releases.

The goal here is to harness the power of image update automation while implementing some sort of gating process around it.

Here is the intended workflow:

  1. Modify application code, then commit and push the change to the repo.
  2. Create a new release in GitHub which kicks off a release workflow to build and push an updated container image to a GitHub Container Registry.
  3. FluxCD detects the new image and updates the image tag in a YAML manifest.
  4. FluxCD rolls out the new image to the cluster.
  5. Flagger detects a new deployment revision and starts a canary deployment.
  6. Flagger progressively routes traffic to the new deployment based on metrics.
  7. Flagger promotes the new deployment to production if the metrics are within the threshold.

We'll move really fast through the AKS cluster provisioning, bootstrapping process, and deploying the AKS Store Demo sample app.

If you want a closer look at how image update automation is configured using FluxCD, check out my previous post.

Let's go!

Prerequisites

Before you begin, you need to have the following:

Create an AKS cluster and bootstrap FluxCD

Run the following command to log into Azure and make sure you have the AzureServiceMeshPreview feature enabled.

az login
az feature register --namespace "Microsoft.ContainerService" --name "AzureServiceMeshPreview"
Enter fullscreen mode Exit fullscreen mode

Next run the following command to setup some variables for your deployment.

RG_NAME=rg-flagger
AKS_NAME=aks-flagger
LOC_NAME=westus2
Enter fullscreen mode Exit fullscreen mode

We'll deploy an AKS cluster with the Istio service mesh add-on enabled. If you are unfamiliar with service mesh in general, check out the Istio documentation and my previous post on Service Mesh Considerations for more information.

If you still have the AKS cluster from the previous post, you might want to delete it and start fresh.

Run the following commands to create the resource group and AKS cluster with the Istio add-on enabled.

az group create -n $RG_NAME -l $LOC_NAME
az aks create -n $AKS_NAME -g $RG_NAME --enable-azure-service-mesh --generate-ssh-keys -s Standard_B4s_v2
Enter fullscreen mode Exit fullscreen mode

Istio offers internal and external ingress capabilities which controls traffic coming into the cluster. We'll use the external ingress as our entry point for the sample app.

Run the following command to enable the external ingress gateway.

az aks mesh enable-ingress-gateway \
  -n $AKS_NAME \
  -g $RG_NAME \
  --ingress-gateway-type external
Enter fullscreen mode Exit fullscreen mode

After the cluster and Istio external ingress gateway are deployed, run the following command to connect to the cluster.

az aks get-credentials -n $AKS_NAME -g $RG_NAME
Enter fullscreen mode Exit fullscreen mode

Let's move on to bootstrapping FluxCD and deploying the AKS Store Demo app.

Bootstrap cluster using FluxCD

We'll use the GitHub CLI to work with GitHub and Flux CLI generate new Flux manifests so be sure you have these tools installed.

Connect to GitHub using the GitHub CLI

Run the following command to log into GitHub.

gh auth login --scopes repo,workflow
Enter fullscreen mode Exit fullscreen mode

Fork and clone the AKS Store Demo repo

If you're continuing from the previous post, you should already have the AKS Store Demo repo forked and cloned. If not, run the following commands to fork and clone the repo.

gh repo fork https://github.com/azure-samples/aks-store-demo.git --clone
cd aks-store-demo
gh repo set-default
Enter fullscreen mode Exit fullscreen mode

Create a release workflow

If you're continuing from the previous post, you should already have a release workflow in the AKS Store Demo repo. If not, make sure you are in the root of the aks-store-demo repository and run the following commands to create a release workflow.

# download the releae workflow
wget -O .github/workflows/release-store-front.yaml https://raw.githubusercontent.com/pauldotyu/aks-store-demo/main/.github/workflows/release-store-front.yaml

# download the TopNav.vue file which we'll be modifying
wget -O src/store-front/src/components/TopNav.vue https://raw.githubusercontent.com/pauldotyu/aks-store-demo/main/src/store-front/src/components/TopNav.vue

# commit and push
git add -A
git commit -m "feat: add release workflow"
git push

# back out to the previous directory
cd -
Enter fullscreen mode Exit fullscreen mode

Fork and clone the AKS Store Demo Manifests repo

The great thing about Flux is that it can be bootstrapped using GitOps. So we'll point to a branch in my AKS Store Demo Manifests repo which has everything we need to get the cluster setup quickly.

If you're continuing from the previous post, you should already have the AKS Store Demo Manifests repo forked and cloned. If not, run the following commands to fork and clone it.

gh repo fork https://github.com/pauldotyu/aks-store-demo-manifests.git --clone
cd aks-store-demo-manifests
gh repo set-default
Enter fullscreen mode Exit fullscreen mode

I have updated manifests to include Istio resources in the istio branch. We'll use this branch to the cluster.

git fetch
git checkout --track origin/istio

# get the latest from upstream
git fetch upstream istio
git rebase upstream/istio
Enter fullscreen mode Exit fullscreen mode

Secrets for FluxCD Image Update Automation

As mentioned in my previous post, we'll need to create a Flux secret to allow Flux to write to our GitHub repo.

Run the following command to create a namespace to land the Kubernetes secret into.

kubectl create namespace flux-system
Enter fullscreen mode Exit fullscreen mode

Run the following commands set your GitHub info.

# make sure you are in the aks-store-demo-manifests repo
export GITHUB_USER=$(gh api user --jq .login)
export GITHUB_TOKEN=$(gh auth token)
export GITHUB_REPO_URL=$(gh repo view --json url | jq .url -r)
Enter fullscreen mode Exit fullscreen mode

Run the following command to create the secret.

flux create secret git aks-store-demo \
  --url=$GITHUB_REPO_URL \
  --username=$GITHUB_USER \
  --password=$GITHUB_TOKEN
Enter fullscreen mode Exit fullscreen mode

Update the GitRepository URL in a couple of Flux manifests to point to your repo.

sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/flux-system/gotk-sync.yaml > tmp && mv tmp clusters/dev/flux-system/gotk-sync.yaml
sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/aks-store-demo-source.yaml > tmp && mv tmp clusters/dev/aks-store-demo-source.yaml
sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/aks-store-demo-store-front-image.yaml > tmp && mv tmp clusters/dev/aks-store-demo-store-front-image.yaml

git add -A
git commit -m 'feat: update git sync url'
git push
Enter fullscreen mode Exit fullscreen mode

We can now bootstrap our cluster with FluxCD using the manifests in the istio branch of the AKS Store Demo Manifests repo.

flux bootstrap github create \
  --owner=$GITHUB_USER \
  --repository=aks-store-demo-manifests \
  --personal \
  --path=./clusters/dev \
  --branch=istio \
  --reconcile \
  --network-policy \
  --components-extra=image-reflector-controller,image-automation-controller
Enter fullscreen mode Exit fullscreen mode

After a minute or two, run the following command to watch the bootstrap process.

flux logs --kind=Kustomization --name=aks-store-demo -f
# press ctrl-c to exit
Enter fullscreen mode Exit fullscreen mode

Once the Kustomization reconciliation process is complete, run the following command to retrieve the public IP address of the Istio ingress gateway.

echo "http://$(kubectl get svc -n aks-istio-ingress aks-istio-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
Enter fullscreen mode Exit fullscreen mode

You should see the AKS Store Demo app running in your browser.

Install Flagger

Time to install Flagger, the GitOps way!

We'll use the Flux CLI to generate the Flagger manifests and commit them to our repo.

Run the following command to generate a HelmRepository for Flagger's Helm chart.

flux create source helm flagger \
  --url=oci://ghcr.io/fluxcd/charts \
  --export > ./clusters/dev/flagger-source.yaml
Enter fullscreen mode Exit fullscreen mode

Run the following command to create a values.yaml file which will be used to configure Flagger.

cat <<EOF > values.yaml
meshProvider: istio
prometheus:
  install: true
EOF
Enter fullscreen mode Exit fullscreen mode

Here, we are telling Flagger to use Istio as the service mesh provider and to install Prometheus to collect metrics.

Next we need to create a HelmRelease resource to install Flagger and pass in the values.yaml file we just created to configure Flagger.

flux create helmrelease flagger \
  --target-namespace=flagger-system \
  --create-target-namespace=true \
  --crds CreateReplace \
  --source=HelmRepository/flagger \
  --chart=flagger \
  --values=values.yaml \
  --export > ./clusters/dev/flagger-helmrelease.yaml
Enter fullscreen mode Exit fullscreen mode

You don't need the values.yaml file anymore, so run the following command to delete it.

rm values.yaml
Enter fullscreen mode Exit fullscreen mode

Flagger can also run load tests against your application to generate metrics. We'll use its load testing service to generate load against our application.

Flagger's load testing service can be installed via a Kustomization resource based on manifests packaged as an artifact in an Open Container Initiative (OCI) registry

Run the following command to create an OCIRepository pointing to an OCI registry.

flux create source oci flagger-loadtester \
  --url=oci://ghcr.io/fluxcd/flagger-manifests \
  --tag-semver=1.x \
  --export > ./clusters/dev/flagger-loadtester-source.yaml
Enter fullscreen mode Exit fullscreen mode

Run the following command to create a Kustomization resource for the installation manifests.

flux create kustomization flagger-loadtester \
  --target-namespace=dev \
  --prune=true \
  --interval=6h \
  --wait=true \
  --timeout=5m \
  --path=./tester \
  --source=OCIRepository/flagger-loadtester \
  --export > ./clusters/dev/flagger-loadtester-kustomization.yaml
Enter fullscreen mode Exit fullscreen mode

We're ready to commit our changes to our repo.

# pull the latest changes from the repo
git pull

# add the new files and commit the changes
git add -A
git commit -m 'feat: add flagger'
git push
Enter fullscreen mode Exit fullscreen mode

This will trigger a FluxCD reconciliation and install Flagger in our cluster.

After a minute or two, run any of the following commands to see the status of the new resources.

flux get source helm
flux get source chart
flux get source oci
flux get helmrelease
flux get kustomization
Enter fullscreen mode Exit fullscreen mode

Confirm that Flagger is installed and running.

kubectl get deploy -n flagger-system
Enter fullscreen mode Exit fullscreen mode

Deploy a Canary

With Flagger installed, a Canary Custom Resource Definition (CRD) is available to us.

A Canary resource will automate some of our Kubernetes resources. It will create a Service, VirtualService, and a Canary deployment for us. So we don't need to create these resources ourselves.

Open the ./base/store-front.yaml manifest file using your favorite editor and remove the Service resource.

The Service resource looks like this.

---
apiVersion: v1
kind: Service
metadata:
  name: store-front
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      targetPort: 8080
  selector:
    app: store-front
Enter fullscreen mode Exit fullscreen mode

Next remove the VirtualService resource.

The VirtualService resource looks like this.

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: store-front
spec:
  hosts:
    - "*"
  gateways:
    - store-front
  http:
    - route:
        - destination:
            host: store-front
            port:
              number: 80
Enter fullscreen mode Exit fullscreen mode

Finally, add the following Canary resource to the end of the manifest file.

---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: store-front
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: store-front
  progressDeadlineSeconds: 60
  service:
    port: 80
    targetPort: 8080 # make sure this matches container port
    portDiscovery: true
    hosts:
      - "*"
    gateways:
      - store-front
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 20
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 30s
      - name: error-rate
        thresholdRange:
          max: 10
        interval: 30s
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.dev/
        timeout: 60s
        metadata:
          type: bash
          cmd: "curl -s http://store-front-canary.dev"
      - name: load-test
        url: http://flagger-loadtester.dev/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://store-front-canary.dev"
Enter fullscreen mode Exit fullscreen mode

There's a lot going on here, but essentially we are telling Flagger to create a Canary deployment for our store-front app. We are also telling Flagger to use the Istio ingress gateway to route traffic to our app and to use the load testing service to generate load against our app to generate metrics and analyze them to determine if the canary deployment should be promoted to production.

Note the threshold, maxWeight, and stepWeight values in the Canary manifest. We are going to start with 10% traffic directed to the canary and increase by 10%. Once the test is successful at 20% traffic, the Canary will promote the canary to primary and receive 100% of traffic. 20% is a low number here an intentionally set to speed up the testing process. Normally you would set this number closer to 100.

Commit and push the changes to your repo.

git add ./base/store-front.yaml
git commit -m 'feat: add store-front canary'
git push
Enter fullscreen mode Exit fullscreen mode

Get Flux to reconcile the changes.

flux reconcile kustomization aks-store-demo --with-source
Enter fullscreen mode Exit fullscreen mode

After Flux has finished reconciling, wait a minute or two then run the following command to watch the canary deployment and wait until the status is Initialized.

kubectl get canary -n dev store-front -w
# press ctrl-c to exit
Enter fullscreen mode Exit fullscreen mode

You can view Flagger logs with the following command:

kubectl logs -n flagger-system deployment/flagger-system-flagger
Enter fullscreen mode Exit fullscreen mode

Run the following command to view important resources.

kubectl get service,destinationrule,virtualservice -n dev
Enter fullscreen mode Exit fullscreen mode

Flagger has created two Service resources (one for canary and one for primary), two DestinationRule resources (one to route traffic to canary and one to route to primary) and one VirtualService resource that can be configured for traffic shifting between the two services. Right now, the primary service is weighted to receive 100% of the traffic and you can confirm this by running the following command.

kubectl get virtualservice -n dev store-front -o yaml
Enter fullscreen mode Exit fullscreen mode

With all this in place, ensure you can still access the AKS Store Demo app in your browser.

Test the Canary

Now we can test the progressive deployment of our canary.

Flip back to the aks-store-demo repo so that we can make another change to the TopNav.vue file.

Run the following commands that will update the version number in the TopNav.vue file from 1.0.0 to 2.0.0.

# set the version number
PREVIOUS_VERSION=1.0.0
CURRENT_VERSION=2.0.0

# make sure you are in the aks-store-demo directory
sed "s/Azure Pet Supplies v${PREVIOUS_VERSION}/Azure Pet Supplies v${CURRENT_VERSION}/g" src/store-front/src/components/TopNav.vue > TempTopNav.vue
mv TempTopNav.vue src/store-front/src/components/TopNav.vue
Enter fullscreen mode Exit fullscreen mode

Commit and push the changes to your repo.

git add -A
git commit -m "feat: update title again"
git push
Enter fullscreen mode Exit fullscreen mode

Create a new release in GitHub and watch the magic happen!

gh release create $CURRENT_VERSION --generate-notes
Enter fullscreen mode Exit fullscreen mode

Wait a few seconds then run the following command to watch the release build.

gh run watch
Enter fullscreen mode Exit fullscreen mode

With the new image built, the Flux ImagePolicy resource will reconcile, detect a new image tag and trigger the ImageUpdateAutomation resource reconciliation process.

The new image tag will be written to the kustomization.yaml manifest and the sample app's Kustomization resource will reconcile and update the its Deployment.

Here is where Flagger picks up the baton. Flagger will detect a new deployment revision and trigger a canary deployment. It will progressively route traffic to the new deployment based on the metrics we defined in the Canary resource. If the metrics are within the threshold, Flagger will promote the new deployment as the primary.

You can run the following commands to watch the image update process.

# watch image policy
flux logs --kind=ImagePolicy --name=store-front -f

# watch kustomization
flux logs --kind=Kustomization --name=aks-store-demo -f

# confirm the image tag was updated
kubectl get deploy -n dev store-front -o yaml | grep image:
Enter fullscreen mode Exit fullscreen mode

You can then run the following command to watch the canary deployment.

kubectl logs -n flagger-system deployment/flagger-system-flagger -f | jq .msg
Enter fullscreen mode Exit fullscreen mode

By the end of the Canary deployment process, you should see the following messages.

"New revision detected! Scaling up store-front.dev"
"Starting canary analysis for store-front.dev"
"Pre-rollout check acceptance-test passed"
"Advance store-front.dev canary weight 10"
"Advance store-front.dev canary weight 20"
"Copying store-front.dev template spec to store-front-primary.dev"
"Routing all traffic to primary"
"Promotion completed! Scaling down store-front.dev"
Enter fullscreen mode Exit fullscreen mode

Now you can refresh the AKS Store Demo app in your browser and see the new version of the app 🥳

Conclusion

Image update automation is cool, but it's even cooler when you can implement some sort of gating process around it. Flagger is a great tool to help you implement progressive delivery strategies in your Kubernetes cluster. It works well with Istio and can be configured to use a variety of metrics providers and if it detects an issue, it will rollback the deployment. It's a great tool to have in your GitOps tool belt and will help you automate your deployments with confidence.

We've covered a lot of ground when it comes to GitOps and AKS but we have only scratched the surface. So stay tuned for more GitOps goodness!

Continue the conversation

If you have any feedback or suggestions, please feel free to reach out to me on Twitter or LinkedIn.

You can also find me in the Microsoft Open Source Discord, so feel free to DM me or drop a note in the cloud-native channel where my team hangs out!

Peace ✌️

Resources

Top comments (0)