Scalability is a key requirement for cloud native applications. With Kubernetes, scaling your application is as simple as increasing the number of replicas for the corresponding Deployment
or ReplicaSet
- but, this is a manual process. Kubernetes makes it possible to automatically scale your applications (i.e. Pod
s in a Deployment
or ReplicaSet
) in a declarative manner using the Horizontal Pod Autoscaler specification. By default, there is support for using CPU utilization (Resource
metrics) as criteria for auto-scaling, but it is also possible to integrate custom as well as externally provided metrics.
This blog will demonstrate how you can external metrics to auto-scale a Kubernetes application. As an example, we will use HTTP access request metrics that are exposed using Prometheus. Instead of using the Horizontal Pod Autoscaler
directly, we will leverage Kubernetes Event Driven Autoscaling aka KEDA - an open source Kubernetes operator which integrates natively with the Horizontal Pod Autoscaler
to provide fine grained autoscaling (including to/from zero) for event-driven workloads.
The code is available on GitHub
I would love to have your feedback and suggestions! Feel free to tweet or drop a comment 😃
Overview
Here is a summary of how things work end to end - each of these will be discussed in detail in this section
- The application exposes HTTP access count metrics in Prometheus format
- Prometheus is configured to scrape those metrics
- Prometheus scaler in KEDA is configured and deployed to auto-scale the app based on the HTTP access count metrics
KEDA and Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit which is a part of the Cloud Native Computing Foundation. Prometheus scrapes metrics from various sources and stores them as time-series data and tools like Grafana or other API consumers can be used to visualize the collected data.
KEDA supports the concept of Scaler
s which act as a bridge between KEDA and an external system. A Scaler
implementation is specific to a target system and fetches relevant data from it, which is then used by KEDA to help drive auto-scaling. There is support for multiple scalers(including Kafka, Redis, etc.) including Prometheus. This means that you can leverage KEDA to auto-scale your Kubernetes Deployment
s using Prometheus metrics as the criteria.
Sample application
The example Golang app exposes an HTTP endpoint and does two important things:
- Uses Prometheus Go client library to instrument the app and expose the
http_requests
metric backed by aCounter
. Prometheus metrics endpoint is available at/metrics
.
var httpRequestsCounter = promauto.NewCounter(prometheus.CounterOpts{
Name: "http_requests",
Help: "number of http requests",
})
- In response to a
GET
request, it also increments a key (access_count
) in Redis - this is a simple way of getting some "work done" as a part of the HTTP handler and also helps to validate the Prometheus metrics (should be the same as the value of theaccess_count
in Redis)
func main() {
http.Handle("/metrics", promhttp.Handler())
http.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
defer httpRequestsCounter.Inc()
count, err := client.Incr(redisCounterName).Result()
if err != nil {
fmt.Println("Unable to increment redis counter", err)
os.Exit(1)
}
resp := "Accessed on " + time.Now().String() + "\nAccess count " + strconv.Itoa(int(count))
w.Write([]byte(resp))
})
http.ListenAndServe(":8080", nil)
}
The application is deployed to Kubernetes as a Deployment
and a ClusterIP
service is also created to allow the Prometheus server to scrape the app /metrics
endpoint.
Here is the manifest for the application
Prometheus server
The Prometheus deployment manifest consists of:
-
ConfigMap
to capture Prometheus configuration -
Deployment
for Prometheus server itself -
ClusterIP
service to access the Prometheus UI -
ClusterRole
,ClusterRoleBinding
andServiceAccount
to allow for Kubernetes service discovery to work
Here is the manifest for Prometheus setup
KEDA Prometheus ScaledObject
As explained previously, a Scaler
implementation acts as a bridge between KEDA and the external system from which metrics need to be fetched. ScaledObject
is a custom resource that needs to be deployed in order to sync a Deployment
with an event source (Prometheus in this case). It contains information on which Deployment
to scale, metadata on the event source (e.g. connection string secret, queue name), polling interval, cooldown period, etc. The ScaledObject
will result in corresponding autoscaling resource (HPA definition) to scale the Deployment
When a
ScaledObject
gets deleted, the corresponding HPA definition is cleaned up.
Here is the ScaledObject definition for our example which uses the Prometheus
scaler
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
namespace: default
labels:
deploymentName: go-prom-app
spec:
scaleTargetRef:
deploymentName: go-prom-app
pollingInterval: 15
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-service.default.svc.cluster.local:9090
metricName: access_frequency
threshold: '3'
query: sum(rate(http_requests[2m]))
Notice the following:
- It targets a
Deployment
namedgo-prom-app
- The trigger type is
prometheus
. The PrometheusserverAddress
is mentioned along withmetricName
, threshold and the PromQL query (sum(rate(http_requests[2m]))
) to be used - As per
pollingInterval
, KEDA will poll Prometheus target everyfifteen
seconds. A minimum of onePod
will be maintained (minReplicaCount
) and the maximum number ofPod
s will not exceed themaxReplicaCount
(ten
in this example)
It is possible to set
minReplicaCount
tozero
. In this case, KEDA will "activate" the deployment from zero to one and then leave it to HPA to auto-scale it further (the same is done the other way around i.e. scale in from one to zero). We haven't chosen zero since this is an HTTP service and not an on-demand system such as a message queue/topic consumer
Magic behind auto-scale
The threshold
count is used to act as the trigger to scale the Deployment. In this example, the PromQL query sum(rate(http_requests[2m]))
returns the aggregated value of the per-second rate of HTTP requests as measured over the last two minutes. Since the threshold
count is three
, this means that there will be one Pod for if value for sum(rate(http_requests[2m]))
remains less than three. If it goes up, there will be an additional Pod
for every time the sum(rate(http_requests[2m]))
increases by three
e.g. if the value is between 12
to 14
, the number of Pod
s will be 4
Ok it's time to try it hands-on!
Pre-requisites
All you need is a Kubernetes cluster and kubectl
Kubernetes cluster - This example uses minikube
but feel free to use any other. You can install it using this guide.
To install latest version on Mac:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
&& chmod +x minikube
sudo mkdir -p /usr/local/bin/
sudo install minikube /usr/local/bin/
Please install kubectl
to access your Kubernetes cluster.
To install latest version on Mac:
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl"
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
kubectl version
Setup
Install KEDA
You can deploy KEDA in multiple ways as per the documentation. I am simply using a monolith YAML to get the job done
kubectl apply -f https://raw.githubusercontent.com/kedacore/keda/master/deploy/KedaScaleController.yaml
KEDA and its components are installed in the
keda
namespace
To confirm,
kubectl get pods -n keda
Wait for KEDA operator
Pod
to start (Running
state) before you proceed
Setup Redis using Helm
If you don't have Helm installed, simply use this guide. On a mac, you can get this done quickly using
brew install kubernetes-helm
helm init --history-max 200
helm init
is to initialize the local CLI and also installTiller
into your Kubernetes cluster
kubectl get pods -n kube-system | grep tiller
Wait for the Tiller Pod
to switch to Running
state
One Helm is setup, getting a Redis server is as simple as running:
helm install --name redis-server --set cluster.enabled=false --set usePassword=false stable/redis
To confirm whether Redis is ready:
kubectl get pods/redis-server-master-0
Wait for Redis server
Pod
to start (Running
state) before you proceed
Deploy the application
To deploy:
kubectl apply -f go-app.yaml
//output
deployment.apps/go-prom-app created
service/go-prom-app-service created
Confirm whether its running
kubectl get pods -l=app=go-prom-app
Wait for application
Pod
to start (Running
state) before you proceed
Deploy Prometheus server
The Prometheus manifest uses Kubernetes Service Discovery for Prometheus to dynamically detect application Pod
s based on the service label
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_run]
regex: go-prom-app-service
action: keep
To deploy:
kubectl apply -f prometheus.yaml
//output
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/default configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
configmap/prom-conf created
deployment.extensions/prometheus-deployment created
service/prometheus-service created
Confirm whether its running
kubectl get pods -l=app=prometheus-server
Wait for Prometheus server
Pod
to start (Running
state) before you proceed
Use kubectl port-forward
to access Prometheus UI - you will be able to access Prometheus UI (or API server) at http://localhost:9090
kubectl port-forward service/prometheus-service 9090
Deploy the KEDA auto-scale config
You need to create the ScaledObject
kubectl apply -f keda-prometheus-scaledobject.yaml
Check KEDA operator logs
KEDA_POD_NAME=$(kubectl get pods -n keda -o=jsonpath='{.items[0].metadata.name}')
kubectl logs $KEDA_POD_NAME -n keda
you should see
time="2019-10-15T09:38:28Z" level=info msg="Watching ScaledObject: default/prometheus-scaledobject"
time="2019-10-15T09:38:28Z" level=info msg="Created HPA with namespace default and name keda-hpa-go-prom-app"
Check application Pod
- should have one instance running since minReplicaCount
was 1
kubectl get pods -l=app=go-prom-app
Confirm that the HPA resource was created as well
kubectl get hpa
You should see something like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-go-prom-app Deployment/go-prom-app 0/3 (avg) 1 10 1 45s
Autoscaling in action...
Sanity test: Access the app
To access the REST endpoint for our app, simply run:
kubectl port-forward service/go-prom-app-service 8080
You should now be able to access the Go app using http://localhost:8080
To access the endpoint:
curl http://localhost:8080/test
You should see a response similar to:
Accessed on 2019-10-21 11:29:10.560385986 +0000 UTC m=+406004.817901246
Access count 1
At this point, check Redis as well. You will see that the access_count
key has been incremented to 1
kubectl exec -it redis-server-master-0 -- redis-cli get access_count
//output
"1"
Confirm that the http_requests
metric count has is also the same
curl http://localhost:8080/metrics | grep http_requests
//output
# HELP http_requests number of http requests
# TYPE http_requests counter
http_requests 1
Generate load
We will use hey, a utility program to generate load
curl -o hey https://storage.googleapis.com/hey-release/hey_darwin_amd64 && chmod a+x hey
Invoke it as such
./hey http://localhost:8080/test
By default, the utility sends 200
requests. You should be able to confirm it using the Prometheus metrics as well as Redis
curl http://localhost:8080/metrics | grep http_requests
//output
# HELP http_requests number of http requests
# TYPE http_requests counter
http_requests 201
kubectl exec -it redis-server-master-0 -- redis-cli get access_count
//output
201
Confirm the actual metric (returned by the PromQL query)
curl -g 'http://localhost:9090/api/v1/query?query=sum(rate(http_requests[2m]))'
//output
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1571734214.228,"1.686057971014493"]}]}}
In this case, the actual result is1.686057971014493
(in value
). This is not enough to trigger a scale out since the threshold we have set is 3
Moar load!
In a new terminal, keep a track of the application Pod
s
kubectl get pods -l=app=go-prom-app -w
Let's simulate a heavy load with:
./hey -n 2000 http://localhost:8080/test
In sometime, you will see that the Deployment
will be scaled out by the HPA and new Pod
s will be spun up.
Check the HPA to confirm the same,
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-go-prom-app Deployment/go-prom-app 1830m/3 (avg) 1 10 6 4m22s
If the load does not sustain, the Deployment
will be scaled down to the point where only a single Pod
is running
If you check the actual metric (returned by the PromQL query) using
To clean up
//Delete KEDA
kubectl delete namespace keda
//Delete the app, Prometheus server and KEDA scaled object
kubectl delete -f .
//Delete Redis
helm del --purge redis-server
Conclusion
KEDA allows you to auto scale your Kubernetes Deployments (to/from zero) based on data from external metrics such as Prometheus metrics, queue length in Redis, consumer lag of a Kafka topic, etc. It does all the heavy lifting of integrating with the external source as well as exposing its metrics via a Metrics server for the Horizontal Pod Autoscaler to weave its magic!
That's all for this blog. Also, if you found this article useful, please like and follow 😃😃
Top comments (0)