Cover image for Horizontal Pod autoscaling based on HTTP requests metric from Istio

Horizontal Pod autoscaling based on HTTP requests metric from Istio

mraszplewicz profile image Maciej Raszplewicz ・5 min read

After reading this article, you will learn how to configure autoscaling based on the average number of HTTP requests per second.

I haven’t found any article describing the configuration of horizontal pod autoscaling based on the HTTP requests metric from Isio. There are a few sources, but all of them are outdated or much more complicated than my solution.

Some time ago, we have created a similar autoscaling solution based on metrics from AWS Load Balancer, which was scaling containers deployed on ECS. We first tried CPU based autoscaling, but we had a lot of problems because of high CPU usage on an application startup. Scaling based on the number of HTTP requests worked much better. However, in the Kubernetes world, things are completely different…

All source code is available here: https://github.com/devopsbox-io/example-istio-hpa


I have used several tools in this article, you could probably replace some of them, though. Remember to use a fairly new version of Istio (tested with 1.7.2). Probably older versions do not have istio_requests_total metric available per Pod.
List of tools:

  • Minikube (tested with v1.10.1, Kubernetes v1.17.12)
  • KVM (required by Minikube)
  • Kubectl (tested with v1.17.12)
  • Helm (tested with v3.2.1)
  • Siege (tested with 4.0.4)
  • Istioctl (tested with 1.7.2)


First of all, we have to start Minikube:

minikube start --driver=kvm2 --kubernetes-version v1.17.12 --memory=8192 --cpus=4 && minikube tunnel
Enter fullscreen mode Exit fullscreen mode

Few things to mention here: I use kvm2 and Kubernetes version 1.17, but the solution will probably work on different Kubernetes versions and different Minikube drivers (or even other Kubernetes distributions). We need quite a lot of RAM and CPU because we want to test the autoscaling. The last thing - we have to run the Minikube tunnel to access the Istio ingress gateway, so it will ask you for the sudo password and it will lock your terminal, therefore you will have to open a new one.

Next, we need to install Istio:

istioctl install -y
Enter fullscreen mode Exit fullscreen mode

We are using the default Istio configuration. Nothing special here.

Then, we will create namespaces and enable Istio automatic sidecar injection on one of them:

kubectl create namespace monitoring
kubectl create namespace dev
kubectl label namespace dev istio-injection=enabled
Enter fullscreen mode Exit fullscreen mode

I will not explain what a sidecar is and how Istio works, but if you want to know more - just read the Istio documentation.

Then, we will deploy the sample application (code available here: https://github.com/devopsbox-io/example-istio-hpa/blob/main/sample-app-with-istio.yaml):

kubectl -n dev apply -f sample-app-with-istio.yaml
Enter fullscreen mode Exit fullscreen mode

and wait for the deployment to work (probably a few minutes):

export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
until curl -f http://$INGRESS_HOST; do echo "Waiting for the application to start..."; sleep 1; done
Enter fullscreen mode Exit fullscreen mode

This is an almost unmodified httpbin application from Istio documentation.

The solution

For our solution we will need Prometheus to scrape metrics from Istio:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm repo update
helm -n monitoring install prometheus prometheus-community/prometheus
Enter fullscreen mode Exit fullscreen mode

It is the default helm chart for the Prometheus installation. We use this one because Istio has a default configuration to expose metrics for it, i.e. pod has the following annotations:

  • prometheus.io/path: /stats/prometheus
  • prometheus.io/port: 15020
  • prometheus.io/scrape: true

Having Prometheus installed doesn't mean that we can use its metrics for horizontal Pod autoscaling. We will need one more thing - Prometheus Adapter installed with customized configuration file prometheus-adapter-values.yaml:

  url: http://prometheus-server.monitoring.svc.cluster.local
  port: 80
  - seriesQuery: 'istio_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
        kubernetes_namespace: {resource: "namespace"}
        kubernetes_pod_name: {resource: "pod"}
      matches: "^(.*)_total"
      as: "${1}_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
Enter fullscreen mode Exit fullscreen mode

Here, we can see our Prometheus instance URL, port, and one custom rule. Let's focus on this rule:

  • seriesQuery is needed for metric discovery resources/overrides are mapping fields from the metric (kubernetes_namespace, kubernetes_pod_name) to the names required by Kubernetes (namespace, pod).
  • name/matches, name/as are needed to change the metric name. We are transforming this metric, so it is good to change the name istio_requests_total to istio_requests_per_second.
  • metricsQuery here is the actual query (which is actually a query template) and it will be run by the adapter while scraping the metric from Prometheus. rate and [2m] "calculates the per-second average rate of increase of the time series in the range vector" (from Prometheus documentation), here it is the per-second rate of HTTP requests as measured over the last 2 minutes, per time series in the range vector (also, almost from the Prometheus documentation).

Now, as we have the adapter configuration, we can deploy it using:

helm -n monitoring install prometheus-adapter prometheus-community/prometheus-adapter -f prometheus-adapter-values.yaml
Enter fullscreen mode Exit fullscreen mode

Ok, so the last thing is to create the Horizontal Pod Autoscaler using the following configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: httpbin
  minReplicas: 1
  maxReplicas: 5
  - type: Pods
        name: istio_requests_per_second
        type: AverageValue
        averageValue: "10"
    apiVersion: apps/v1
    kind: Deployment
    name: httpbin
Enter fullscreen mode Exit fullscreen mode

Most of the configuration is self-explanatory. scaleTargetRef references our application’s Deployment object and min and max replicas are our boundaries. The most interesting part is metrics — here we tell the autoscaler to use our custom istio_requests_per_second metric (which is calculated per Pod) and that it should scale out after more than 10 average requests per second.

One of the most important things — probably when other articles about this topic were written, istio_requests_total metric wasn’t calculated per pod. Things got much easier because now it is!

Now let’s create the Horizontal Pod Autoscaler:

kubectl -n dev apply -f hpa.yaml
Enter fullscreen mode Exit fullscreen mode

and wait for the metric availability (probably a few minutes):

until kubectl -n dev describe hpa | grep "\"istio_requests_per_second\" on pods:" | grep -v "<unknown> / 10"; do echo "Waiting for the metric availability..."; sleep 1; done
Enter fullscreen mode Exit fullscreen mode

We have our autoscaling up and running. Let’s test it!


First of all, we can open two new terminal windows and watch what is happening (every line in a separate window):

watch kubectl -n dev get pod
watch kubectl -n dev describe hpa httpbin
Enter fullscreen mode Exit fullscreen mode

Then, let’s start testing:

export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
siege -c 2 -t 5m http://$INGRESS_HOST
Enter fullscreen mode Exit fullscreen mode

You can use other tools than siege (e.g. hey). It is important that it needs to support HTTP/1.1 so ab (apache benchmark) is not the right solution.

After a few minutes, you should see more pods running, and that “describe hpa” shows the current number of requests per second.


It is not that hard to create an autoscaling solution based on the HTTP requests per second metric if you know what you are doing. It should be also quite simple, to change it to some other Prometheus metric. But should you really do it yourself? Our DevOpsBox platform has already built-in full autoscaling with reasonable defaults!

For more details about the DevOpsBox platform please visit https://www.devopsbox.io/


Editor guide