DEV Community: Aleksi Waldén

Prometheus Observability Platform: Grafana

Aleksi Waldén — Thu, 14 Sep 2023 10:28:01 +0000

Grafana is the industry standard open-source product for visualising metrics stored in a TSDB format, or a variety of other data sources. With Grafana, we can create dashboards, queries, and alerts from the data that we have. With all our metrics in long-term storage, we can use a single data source to access all the metrics from all our infrastructure that uses the metrics platform. This enables easily creating dashboards that aggregate data from multiple different Kubernetes clusters, and enable drilling down to a single resource easily.

Demo

Next, we will set up a Grafana instance into our minikube and use Promxy as the default data source. This example assumes that you have completed the following steps, as the components from those are needed:

Prerequisites:

base64

First we start with adding the Grafana Helm chart repository, and installing its contents into the Grafana namespace:

helm repo add grafana https://grafana.github.io/helm-charts

Next, we define Promxy as the data source. In the Helm values file, we need the following block to do this:

datasources.yaml:
  apiVersion: 1
  datasources:
  - name: Promxy
    type: prometheus
    url: "http://promxy.promxy.svc.cluster.local:8082"
    isDefault: true

We are using the svc.cluster.local address for the Promxy service, because all our services are inside the cluster.

I have converted the above into json so that it can be passed to Helm:

helm install grafana grafana/grafana --create-namespace --namespace grafana --set-json 'datasources={"datasources.yaml":{"apiVersion":1,"datasources":[{"name":"Promxy","type":"prometheus","url":"http://promxy.promxy.svc.cluster.local:8082","isDefault":true}]}}'

Next we need to get the admin password for the admin user:

kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Now we can port-forward the Grafana service:

kubectl port-forward -n grafana services/grafana 9090:80

Navigate to http://localhost:9090 to access the web UI, and log in with the username admin, and the password acquired in the previous step. From here you can verify that Promxy is set up and acting as the default data source by navigating to Administration -> Data sources -> Promxy, and clicking the Test button at the bottom of the page.

Assuming the test was successful, we can then navigate to the “Explore” item in the menu

and check that we have metrics available in the “metrics explorer” section. Alternatively, we can use the following query to check that metrics are available:

sum(kube_pod_container_status_restarts_total) by (namespace, container)

N.B. you might have to change the time range for the query to get results:

We have now achieved setting up Grafana as the metrics visualisation tool for our metrics platform. This enables us to create dashboards and Grafana alerts for metrics from all sources sending metrics to our long-term storage cluster (or clusters if we have multiple regions) that are queried using Promxy.

Prometheus Observability Platform: Application metrics

Aleksi Waldén — Thu, 14 Sep 2023 10:27:31 +0000

When creating our own applications, we need to use a metrics library to generate the metrics and then inside our application functions increment said metrics. With Go for example, we can use the Prometheus library. The metrics will then be exposed to the /metrics endpoint. If our application is inside a Kubernetes cluster with a prometheus-operator, we can use a ServiceMonitor to scrape its metrics. If we don’t have such a possibility, we can instead set up an application to send metrics straight to our long-term storage solution. For VictoriaMetrics, we can use the github.com/VictoriaMetrics/metrics library to send the metrics to VictoriaMetrics. Remember to add authentication logic into the section pushing the metrics to the long-term storage, if necessary.

Demo

This example assumes that you have completed the following steps as the components from those are needed:

Prometheus Observability Platform: Prometheus

Let's set up a hello-world Golang application in our cluster, and use ServiceMonitor to send its metrics to Prometheus.

First, we need to update our kube-prometheus-stack Helm deployment to pick up ServiceMonitor resources with a certain label attached. We need to pass the following value to our Helm chart:

prometheus:
  prometheusSpec:
    serviceMonitorSelector:
      matchExpressions:
      - key: app
        operator: Exists

I have converted that into json so that it can be passed to Helm:

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set-json 'prometheus.prometheusSpec.serviceMonitorSelector={"matchExpressions":[{"key":"app","operator":"Exists"}]}'

This will update our kube-prometheus-stack to pick up ServiceMonitor resources from any namespace, as long as they have an app label attached.

Next, we are going to create a namespace for our hello-world application which is a simple Golang application exposing metrics via the Prometheus module. We will borrow this already-made application, which has logic defined to increment a metric called hello_processed_total each time the page is loaded.

To create a namespace and a pod, we use the following commands:

kubectl create namespace hello-world

kubectl run hello-world --namespace=hello-world --image='okteto/hello-world:golang-metrics' --labels app=hello-world

Next, we need to create a service for the new pod:

cat <<'EOF' | kubectl create -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    app: hello-world
  name: hello-world
  namespace: hello-world
spec:
  ports:
  - name: http
    port: 8080
  selector:
    app: hello-world
  type: ClusterIP
EOF

Now we can test that our application is working by port-forwarding it. We can also check what the hello_processed_total metric looks like:

kubectl port-forward -n hello-world services/hello-world 9090:8080

Now navigate to http://localhost:9090 and http://localhost:9090/metrics. You should see a metric called hello_processed_total with a number attached. Each reload of the page will increment this number.

Next, we need to set up a ServiceMonitor to send these metrics to Prometheus:

cat <<'EOF' | kubectl create -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hello-world
  namespace: hello-world
  labels:
    app: hello-world
spec:
  selector:
    matchLabels:
      app: hello-world
  endpoints:
    - port: http
EOF

This ServiceMonitor will target services matching the label selector (app=hello-world) and will scrape the port called “http”.

Now, if we port-forward our Prometheus service, we should see a new service in the service discovery section:

kubectl port-forward -n prometheus services/kube-prometheus-stack-prometheus 9090:9090

Navigate to http://localhost:9090/service-discovery and you should see that there is a new service discovered with the name serviceMonitor/hello-world/hello-world/0 and it should show 1/1 active targets.

We can now query the hello_processed_total metric:

We have now achieved sending metrics from our custom app running in its own namespace into Prometheus.

Next part: Prometheus Observability Platform: Grafana

Prometheus Observability Platform: Handling multiple regions

Aleksi Waldén — Thu, 14 Sep 2023 10:27:13 +0000

When we have multiple regions such as the EU and US, we need to have a long-term storage solution running in both of those. If we want to combine the resources into a single query we need to use a query layer that can query both endpoints. One such component we can use is Promxy.

Promxy uses the same PromQL syntax as Prometheus and we can define server groups with multiple endpoints. In our case, we would define our EU and US long-term storage endpoints under one server group. We can then use the single Promxy endpoint to query both the EU and the US.

Demo

This example assumes that you have completed the following steps, as the components from those are needed:

We can use the Helm chart offered in the Promxy repository to deploy a proxy to our Kubernetes cluster.

First we clone the repository, because the Helm chart is not published to a public registry:

git clone https://github.com/jacksontj/promxy.git

Then we navigate to the folder containing the Helm chart:

cd promxy/deploy/k8s/helm-charts/promxy/

We set up Promxy with the following server_groups:

server_groups:
  - static_configs:
    - targets:
      - vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481
      labels:
        region: eu
    scheme: http
    path_prefix: /select/0/prometheus

I have converted this to json so we can pass it to the helm chart:

{"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}

To install Promxy from the local Helm chart we use the following:

helm install promxy . --create-namespace --namespace promxy --set 'image.tag=latest' --set-json 'config.promxy={"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}'

We can now port-forward the Promxy service and access the web UI from http://localhost:9090:

kubectl port-forward -n promxy services/promxy 9090:8082

Here we can perform the same query for the kube_pod_container_status_restarts_total to verify that Promxy is able to reach the VictoriaMetrics data:

If we have more regions than just the US, we can add them under the server_groups and query multiple VictoriaMetrics instances from a single Promxy source.

We have now achieved setting up Promxy with server_groups to use for querying VictoriaMetrics instances.

Next part: Prometheus Observability Platform: Application metrics

Prometheus Observability Platform: Alert routing

Aleksi Waldén — Thu, 14 Sep 2023 10:26:37 +0000

Alertmanager is a component usually bundled with Prometheus to handle routing the alerts to receivers such as Slack, e-mail, and PagerDuty. It uses a routing tree to send alerts to one or multiple receivers.

Routes define which receivers each alert should be sent to. You can define rules for the routes. The rules are evaluated from top to bottom, and alerts are sent to matching receivers. Usually, the match block is used to match the label name and value for a certain receiver. Notification integrations are configured for each receiver. There are multiple different options available, such as email_configs, slack_configs, and webhook_configs.

Alertmanager has a web UI that can be used to view current alerts and silence them if needed.

With a platform setup, we usually don’t want to use multiple Alertmanagers, so we disable the provisioning of additional alertmanagers for Prometheus deployments that include them automatically. Instead, we use one centralised Alertmanager inside, for example, a Kubernetes cluster which is aimed at monitoring platform usage.

Demo

This example assumes that you have completed the following steps, as the components from those are needed:

Prerequisites:

Amtool (https://github.com/prometheus/alertmanager#install-1)

Now that we have an alert defined and deployed to vmalert we can add Alertmanager to our platform. Because we are creating this with a platform aspect in mind, we will install Alertmanager as a separate resource, and not as a part of the kube-platform-stack. We will use a tool called amtool which is bundled with alertmanager to run unit tests on our alert rules

We can install the Alertmanager with the following Helm chart:

helm install alertmanager prometheus-community/alertmanager --create-namespace --namespace alertmanager

We can now port-forward the alertmanager service to access the alertmanager web UI from http://localhost:9090:

kubectl port-forward -n alertmanager services/alertmanager 9090:9093

To trigger a test alert, we can use the following command from another terminal tab while keeping the port-forwarding on:

curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert"}}]' localhost:9090/api/v1/alerts

We can now use amtool to list the currently firing alerts:

amtool alert query --alertmanager.url=http://localhost:9090
---
Alertname   Starts At                Summary  State   
TestAlert   2023-07-07 07:23:55 UTC           active

Let's add a test receiver and routing for it. Below is an example of the configuration we want to pass to Alertmanager in Helm values format.

config:
  receivers:
    - name: default-receiver
    - name: test-team-receiver

  route:
    receiver: 'default-receiver'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
    routes:
      - receiver: 'test-team-receiver'
        matchers:
        - team="test-team"

I have converted the above into a json one-liner so we can pass it into Helm without having to create an intermediate file.

helm upgrade alertmanager prometheus-community/alertmanager --namespace alertmanager --set-json 'config.receivers=[{"name":"default-receiver"},{"name":"test-team-receiver"}]' --set-json 'config.route={"receiver":"default-receiver","group_wait":"30s","group_interval":"5m","repeat_interval":"4h","routes":[{"receiver":"test-team-receiver","matchers":["team=\"test-team\""]}]}'

We can now use amtool to test that an alert that has the label team=test-team gets routed to the test-team-receiver:

amtool config routes test --alertmanager.url=http://localhost:9090 team=test-team
---
test-team-receiver

amtool config routes test --alertmanager.url=http://localhost:9090 team=test     
---
default-receiver

We have now set up an Alertmanager which can route alerts depending on team label value.

Next, we need to update vmalert to route alerts into the Alertmanager using the cluster local address of the alertmanager service:

helm upgrade vmalert vm/victoria-metrics-alert --namespace victoriametrics --reuse-values --set server.notifier.alertmanager.url="http://alertmanager.alertmanager.svc.cluster.local:9093"

Now we can run a pod that will be crashing to increment the kube_pod_container_status_restarts_total metric by creating a pod that has a typo in the sleep command:

kubectl run crashpod --image busybox:latest --command -- slep 1d

Next we port-forward the alertmanager service. We should see an alert in there when we navigate to http://localhost:9090:

kubectl port-forward -n alertmanager services/alertmanager 9090:9093

We have now achieved setting up Alertmanager as our tool for routing alerts from the vmalert component.

Next part: Prometheus Observability Platform: Handling multiple regions

Prometheus Observability Platform: Alerts

Aleksi Waldén — Thu, 14 Sep 2023 10:25:36 +0000

With Prometheus, we can use PromQL to write alert rules and evaluate them using the given evaluation rules and intervals. Alerts have an evaluation period: if an alert is active for the duration of the evaluation period, then it will fire. Prometheus is usually bundled with a component called Alertmanager, which is used to route alerts to different receivers such as Slack and email. Once an alert fires, it is sent to the Alertmanager which uses a routing table to find out if the alert is to be sent to a receiver, and how to route it.

Prometheus alerts are evaluated against the local storage. With VictoriaMetrics, we can use the vmalert component to evaluate alert rules against the VictoriaMetrics long-term storage using the same PromQL syntax as with Prometheus. It is tempting to write all the alerting rules in VictoriaMetrics, but depending on the size of the infrastructure we might want to evaluate some rules on the Prometheus servers where the data originates from, to avoid overloading VictoriaMetrics.

Alert rules can be very complex, and it is best to validate them before deploying them to Prometheus. Promtool can be used to validate Prometheus alerting rules and run unit tests on them. You can implement these simple validation and unit testing steps in your continuous integration (CI) system.

A good monitoring platform enables teams to write their own alerts against the metrics stored in the long-term storage. We can do this in a mono-repository or multi-repository fashion. With a mono-repository, we have all the infrastructure and the alerting defined in the same repository and pipelines delivering them to servers. A multi-repository approach would set up a separate repository for the alerts, where we define the alerting rules using PromQL, and add validation and unit tests.

The main benefit of the multi-repository approach is reduced cognitive load. The contributors do not see or need to be aware of anything else than the alert rules. This also eliminates the possibility of introducing bugs into the underlying infrastructure. The downside of this approach is tying the separated alerting configuration back to the Prometheus server.

Terraform can be used to set up the repository used for alerting as a remote module and thus pull the alerting rules into the server when deploying the server. With a mono-repository, we can more easily tie the alerts into the Prometheus server, but if we are using Terraform then we need to either split the state of the alerts or accept that the contributors might affect more resources than just the alerts which might also cause more anxiety to the contributors.

Demo

This example assumes that you have completed the following steps as the components from those are needed:

Prerequisites:

Promtool (https://github.com/prometheus/prometheus/tree/main#building-from-source)
yq (optional)
jq (optional)

Figuring out suitable metrics for alerts can be hard. The awesome-prometheus-alerts website is an excellent source for inspiration for this. It has a collection of pre-made alerts using the PromQL syntax. For example, we can set up an alert for crash-looping Kubernetes pods, with the alert named KubernetesPodCrashLooping.

Below there is an example unit test for the KubernetesPodCrashLooping alert. First, we want to simplify the alert a little and add some required blocks for promtool to be able to validate the rule. This file is saved as kube-alerts.rules.yml:

groups:
  - name: kube-alerts
    rules:
    - alert: KubernetesPodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[5m]) > 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping

We can use the command promtool check rules kube-alert.rules.yml to validate the rule. If everything is OK, the response looks like this:

promtool check rules kube-alert.rules.yml
---
Checking kube-alert.rules.yml
  SUCCESS: 1 rules found

To write a unit test for this alert, we create a file called kube-alert.test.yml:

rule_files:
  - kube-alert.rules.yml

evaluation_interval: 1m

tests:
  - interval: 1m

    input_series:
      - series: kube_pod_container_status_restarts_total{namespace="test-namespace",pod="test-pod"}
        values: '1+2x15'

    alert_rule_test:
      - alertname: KubernetesPodCrashLooping
        eval_time: 15m
        exp_alerts:
          - exp_labels:
              severity: warning
              namespace: test-namespace
              pod: test-pod
            exp_annotations:
              summary: Pod test-namespace/test-pod is crash looping

So we are expecting an increase of over 2 in the kube_pod_container_status_restarts_total time series within 5 minutes, and that this increase is active for at least 10 minutes. In the summary field, we expect to receive namespace and pod labels and to a severity label with the value “warning”.

To write a test for this rule we need to create an input series that can trigger this rule and pass all the labels needed for the summary field. Because our evaluation time is 15 minutes we need at least 15 entries into our series. The syntax ‘1+2x15’ adds 2 to the previous number 15 times to create a time series. We also pass the required namespace and pod labels and write the expected summary field response.

To run the unit test we use the command promtool test rules kube-alert.test.yml which will return the following response if all went well:

promtool test rules kube-alert.test.yml
---
Unit Testing:  kube-alert.test.yml
  SUCCESS

Next we need to deploy vmalert so that we can evaluate alert rules against the data in the long-term storage.

First we have to convert our alert rule into a format that works with Helm. The problem with promtool and Helm charts is that the groups: section is required in both of them, so we need to remove it from the alert we have created, but if we remove it then promtool no longer works. There are multiple ways to handle this, for example the Terraform trimprefix() function which can be used to remove the groups: section from the alert rules. For this use case we are going to use a monstrous one-liner to remove the groups: section, convert the output into json and then output it into a single line so we can pass it to Helm:

cat kube-alert.rules.yml | sed '/groups:/d' | yq -o=json | jq -c

This will get us the following json one line string:

[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) > 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]

Now we can deploy the vmalert Helm chart:

helm install vmalert vm/victoria-metrics-alert --namespace victoriametrics --set 'server.notifier.alertmanager.url=http://localhost:9093' --set 'server.datasource.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set 'server.remote.write.url=http://vmcluster-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus' --set 'server.remote.read.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set-json 'server.config.alerts.groups=[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) > 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]'

server.notifier.alertmanager: We are using a placeholder value here for now, as we cannot install the chart without providing some value here
server.datasource.url: Prometheus HTTP API compatible datasource
server.remote.write.url: Remote write url for storing rules and alert states
server.remote.read.url: URL to restore the alert states from

We can now port-forward the vmalert service and navigate to the web UI in http://localhost:9090

We have now achieved creating an alert rule, writing a unit test for it, and setting up vmalert with the alert rule defined.

Next part: Prometheus Observability Platform: Alert routing

Prometheus Observability Platform: Long-term storage

Aleksi Waldén — Thu, 14 Sep 2023 10:25:05 +0000

As Prometheus is not so well designed for persisting data, a long-term storage solution is called for. Multiple different products can handle long-term storage for Prometheus metrics for example VictoriaMetrics, Grafana Mimir, Thanos, and M3.

With some of these options, we get the capability to store the data into object storage which is ideal for modern workloads running in Kubernetes as we don’t want to store any persistent data inside our cluster. Object storage can be for example Azure Blob storage or AWS S3. This option however has downsides on performance (compared to block storage), so if you have high performance requirements, you might have to look into block storage options.

In this document, we will be focusing on VictoriaMetrics. It was chosen because it is open-source, highly performant, and all its crucial components are free. VictoriaMetrics can only handle block storage, but it is also very fast due to its simple architecture designed only for local storage. It can be run in single mode or clustered. The central part of the architecture consists of:

The vmstorage component, which stores the time series data;
vmselect, used to fetch and merge data from vmstorage; and
vminsert, which inserts the data into vmstorage nodes.

In the clustered version, data is distributed evenly across the vmstorage nodes by the vminsert component, and the distributed data is then fetched and merged by the vmselect component. In Kubernetes, each of these components will have its own pod and the vmselect and vminsert components will have a service to load balance the traffic. All the vmstorage endpoints (pods) will be connected to the vminsert and vmselect pods.

VictoriaMetrics also has multiple additional features, such as:

the vmalert component which can be used for alerts about the data, and
vmagent which can be used as a data ingestion point, and for filtering and re-labelling metrics.
the vmauth component for simple authentication, which uses credentials from the Authorization header. (You can also use some other component in front of vminsert or vmagent, such as oauth2-proxy to handle authentication.)

We can set up Prometheus to write the data it receives into the long-term storage using the remote_write block in the configuration. If authentication is set up, it also needs to be defined in the remote_write block.

Demo

We will now set up the following architecture with minikube, Prometheus, and VictoriaMetrics. This example assumes that you have completed the steps from Prometheus Observability Platform: Prometheus

First we add the VictoriaMetrics helm chart and install it into the VictoriaMetrics namespace:

helm repo add vm https://victoriametrics.github.io/helm-charts/

helm install vmcluster vm/victoria-metrics-cluster --create-namespace --namespace victoriametrics

We should now see six pods running in the victoriametrics namespace:

kubectl get pods -n victoriametrics
---
NAME                                                           READY   STATUS    RESTARTS   AGE
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-gqx25    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-t8kcn    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-42wjs   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-t2jhp   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-0                 1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-1                 1/1     Running   0          58s

To access the vmselect web UI we need to port forward the vmselect service:

kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481

You can then navigate to http://localhost:9090/select/0/prometheus/vmui to access the vmselect VMUI. The URL is a clustered url with the 0 representing the accountid of the cluster.

To set up remote writing from Prometheus into the VictoriaMetrics we need to update our kube-prometheus-stack deployment with the following:

helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set 'prometheus.prometheusSpec.remoteWrite[0].url=http://vmcluster-victoria-metrics-cluster-vminsert.victoriametrics.svc.cluster.local:8480/insert/0/prometheus/'

Let’s break down the URL provided above. First, we have the service name for vminsert, which you can find with the following command:

kubectl get svc -n victoriametrics 
---
NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
vmcluster-victoria-metrics-cluster-vminsert    ClusterIP   10.100.63.152   <none>        8480/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmselect    ClusterIP   10.99.12.151    <none>        8481/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmstorage   ClusterIP   None            <none>        8482/TCP,8401/TCP,8400/TCP   3h32m

Next we have the namespace victoriametrics. Since Prometheus and VictoriaMetrics are in different namespaces, we have svc.cluster.local followed by the port number of the vminsert service and finally, we have the Prometheus enabled write endpoint with the accountid 0 in it, because we are using the clustered version of VictoriaMetrics.

On the VMUI (http://localhost:9090/select/0/prometheus/vmui) we can now verify that metrics are coming from Prometheus.

kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481

Navigate to the Query section and insert kube_pod_container_status_restarts_total into the Query field. You should now see approximately the same output as you did from Prometheus.

We have now set up a simple Prometheus and VictoriaMetrics integration

Next part: Prometheus Observability Platform: Alerts

Prometheus Observability Platform: Prometheus

Aleksi Waldén — Thu, 14 Sep 2023 10:24:21 +0000

Prometheus is an open-source application used for monitoring and alerting. It uses a time series database (TSDB) to store the data fed into it. It is designed to scrape metrics from the targets’ HTTP endpoints using a pull method. It is also capable of being a push target for metrics when using a push gateway. It employs its proprietary query language, PromQL, to query the data stored in its TSDB.

The Prometheus server component can evaluate the data it holds and create alerts based on PromQL queries. These alerts are then sent to a component called Alertmanager where you can set up routing for alerts. The routes can be, for example, to Slack or email.

The default data retention period is 15 days, but it can be set as low as 2 hours and has no upper limit. The local storage that Prometheus uses cannot be clustered or replicated, which is why it is not advised to use Prometheus itself for long-term storage of metrics. Also, the local TSDB gets corrupted easily, so having the option to just drop it without losing metrics is desirable. This is why we want to use the remote write capability to forward the metrics into a more robust long-term storage solution.

In Kubernetes, we have kube-prometheus-stack. This is a Helm chart that contains the Prometheus components for Kubernetes. Prometheus uses a node exporter to scrape metrics from the nodes and a custom resource definition (CRD) called ServiceMonitor to scrape metrics from pods behind a service. There is also a CRD called PodMonitor if you don’t have a service in front of the pods.

Prometheus has a concept of exporters. These are a collection of libraries and servers which are capable of exporting metrics from third-party systems into Prometheus metrics. The most used one is node exporter which is used to collect hardware and OS metrics exposed by Linux kernels.

In Prometheus, we can use the remote_write block to forward data into another Prometheus metrics-capable source. If we want to chain remote writing from Prometheus to another Prometheus, then we need to enable a feature flag for the remote write receiver. Remote writing supports multiple types of authentication methods when the long-term storage requires authentication, such as OAuth2.

With Prometheus, we want to have a Prometheus server as close as possible to the physical servers, so we get the least networking latency between the target and the Prometheus server. In the case of data centres, we can set up a Prometheus server in each data centre zone and have it pull metrics from targets in that data centre zone using telemetry agents such as openTelemetry or act as a remote_write target for workloads that push metrics. This will lead to having multiple Prometheus servers in multiple data centre regions and zones, so your applications need to be aware of where they are located and which Prometheus instance they are supposed to be connected to.

Demo

To test out Prometheus we can use minikube to run a local Kubernetes cluster and then the kube-prometheus-stack Helm chart to install Prometheus into the minikube cluster.

Prerequisites:

Helm (https://helm.sh/docs/intro/install/)
Docker (https://docs.docker.com/engine/install/)
Minikube (https://minikube.sigs.k8s.io/docs/start/)
Kubectl (https://kubernetes.io/docs/tasks/tools/)

First, we need to start our minikube cluster. I defined the Kubernetes version as v1.26.3 here, this is the version which these examples have been tested with.

minikube start --kubernetes-version=v1.26.3

You can validate that you have a connection to your minikube cluster by running:

kubectl cluster-info

Then we want to add the kube-prometheus-stack Helm chart and install it into the Prometheus namespace.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --create-namespace --namespace prometheus --set grafana.enabled=false --set alertmanager.enabled=false

Notice that we are disabling Grafana and Alertmanager for now as we will be installing them manually in the coming parts.

To test that Prometheus is operational, we can port forward the Prometheus service locally:

kubectl port-forward -n prometheus services/kube-prometheus-stack-prometheus 9090:9090

We can now navigate to http://localhost:9090 to access the Prometheus web UI. Notice that some of the targets are giving errors.

We have now set up a simple kube-prometheus-stack onto our cluster and are ready to tackle the next steps.

Next part: Prometheus Observability Platform: Long-term storage

Prometheus Observability Platform: Platform

Aleksi Waldén — Thu, 14 Sep 2023 10:24:13 +0000

What is a platform?

In the modern world of IT, a platform is considered to be a set of shared resources that is utilised by multiple people, such as teams in an organisation. A platform usually consists of a shared codebase that teams can either self-checkout into use, or is automatically included in their workflow.

Earlier there was a buzz about the phrase “you build it, you run it”, which then became a big thing. It meant that each team in the organisation was capable of building their whole application stack including the infrastructure components. They also had the responsibility to maintain said infrastructure. There are benefits to this approach. The biggest one is the fast iteration rate of new ideas and technologies. This is enabled by all the responsibility being inside the team, not tied to a bigger technology stack.

One of the biggest downsides of this approach is that teams become technology and knowledge silos. Every team can have different technologies running and the information is often siloed into that team only. This can lead to a situation where multiple teams are using multiple different technology stacks, and the wheel gets reinvented multiple times, due to a lack of proper knowledge sharing practices. Since the technology stacks can be so different between the teams, getting new team members to replace old ones can also be very difficult.

Nowadays the full stack of an application is very complex. We have to have specialists in application full stack development, infrastructure specialists in the cloud or data centre environments, network specialists to fully understand the networking stack and architecture, and security specialists to keep everything secure. Getting all this expertise into a single team can be very challenging. To partly remedy this issue we can look into platform engineering where we can offload part of the complexity into units specialised in that section.

With platform engineering, we create a platform that each team can build upon. We can, for example, initially have the networking people, the infrastructure people, and the security people collaborate to create a shared codebase that embodies industry best practices and the special requirements of the company. This shared codebase can be for example a self-checkout platform for Kubernetes clusters (AKS, EKS, or GKE) with all bells and whistles included such as cert-manager, external-dns, ingress controller, etc. It can be a single Kubernetes cluster where we split teams with namespaces and provide the teams with ways of deploying their workloads into the cluster. It can be a bigger platform codebase where teams can deploy modules of components that conform to security standards and best practices (e.g. having network integration in private endpoint form, firewalls enabled with preset whitelists for VPN endpoints).

This drastically reduces the cognitive load for teams developing their applications. There are some downsides to this approach due to the increased reliance on the shared codebase. Updates to the code are no longer done by just a single team. You have to communicate the updates and the required steps for performing them. For example, if you upgrade the Kubernetes codebase to update the major version of the cluster, you need to communicate this and create instructions for performing the upgrade and dealing with possibly deprecated resources. You will probably end up versioning your shared codebase (depending on the size of the organisation), as requiring all teams to use the main branch at all times can cause too much overhead for the teams.

The cloud moves very fast and new technologies are constantly being developed. With the platform approach, infrastructure teams can focus on shared practices, and keep evolving those. Security people can focus on creating shared security tooling such as implementing ways for teams to use SonarQube. The networking team can focus on creating a shared networking infrastructure where components can easily be used across teams, using DNS resolution that is usable in multiple clouds and on-premises data centres. The platform team can focus on listening to the needs of the development teams, creating shared code for the most often used components, and helping teams get past hurdles that the cloud imposes (such as migrating from a PostgreSQL version to another).

Next part: Prometheus Observability Platform: Prometheus

Prometheus Observability Platform: Intro

Aleksi Waldén — Thu, 14 Sep 2023 10:24:00 +0000

When a company reaches a certain size and complexity, it becomes hard to track all the metrics that the applications are generating. We can end up with teams in the company running their own observability tooling, or multiple sets of stand-alone Prometheus servers which are handled as multiple data sources in Grafana.

The observability platform for metrics with Prometheus (later referred to as metrics platform) is a way for all the teams and products in the company to utilise the same observability tooling for metrics-based telemetry. In short, this means that every team will send metrics to the same long-term Prometheus storage, use the same data source in Grafana when creating dashboards, and be able to set up alerts from these metrics using either Grafana alerts or Prometheus native alerts.

We want all our Prometheus servers to write their data into a long-term storage solution. If the architecture consists of multiple Kubernetes clusters, we want every cluster to have its own prometheus-operator installed and set up to send metrics.

With this centralisation, we can use a single data source to access all the metrics from all our infrastructure connected to the metrics platform. This enables creating dashboards that easily aggregate multiple Kubernetes clusters in a single panel, and allow drilling down to a single resource from the dashboard.

This series of posts will be a deep dive into the concept of a metrics platform running on Kubernetes, consisting of the following parts:

The first part of this series is a look at what a platform is. From here we will continue with setting Prometheus up on our minikube cluster, and leveraging VictoriaMetrics as our long-term storage system. We will set up alerts using Prometheus alerting syntax and use promtool to run unit tests on them. We will then continue setting up vmalert as our alert handling component and send alerts to Alertmanager. Then we will use Promxy to handle situations involving multiple Kubernetes clusters in multiple regions. We will set up a custom app in our cluster and use Prometheus ServiceMonitor to pick up its metrics. Lastly, we will set up Grafana to use a single data source to access all the metrics from our whole platform.

Links to each part of this series: