<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aleksi Waldén</title>
    <description>The latest articles on DEV Community by Aleksi Waldén (@aleksiwalden).</description>
    <link>https://dev.to/aleksiwalden</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1159289%2Fa1070cef-d8b7-4815-b659-50ab4233eb25.png</url>
      <title>DEV Community: Aleksi Waldén</title>
      <link>https://dev.to/aleksiwalden</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aleksiwalden"/>
    <language>en</language>
    <item>
      <title>Prometheus Observability Platform: Grafana</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:28:01 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3</guid>
      <description>&lt;p&gt;Grafana is the industry standard open-source product for visualising metrics stored in a TSDB format, or a variety of other data sources. With Grafana, we can create dashboards, queries, and alerts from the data that we have. With all our metrics in long-term storage, we can use a single data source to access all the metrics from all our infrastructure that uses the metrics platform. This enables easily creating dashboards that aggregate data from multiple different Kubernetes clusters, and enable drilling down to a single resource easily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Next, we will set up a Grafana instance into our minikube and use Promxy as the default data source. This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib"&gt;Prometheus Observability Platform: Handling multiple regions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;base64&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First we start with adding the Grafana Helm chart repository, and installing its contents into the Grafana namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add grafana https://grafana.github.io/helm-charts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we define Promxy as the data source. In the Helm values file, we need the following block to do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datasources.yaml:
  apiVersion: 1
  datasources:
  - name: Promxy
    type: prometheus
    url: "http://promxy.promxy.svc.cluster.local:8082"
    isDefault: true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are using the &lt;code&gt;svc.cluster.local&lt;/code&gt; address for the Promxy service, because all our services are inside the cluster.&lt;/p&gt;

&lt;p&gt;I have converted the above into json so that it can be passed to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install grafana grafana/grafana --create-namespace --namespace grafana --set-json 'datasources={"datasources.yaml":{"apiVersion":1,"datasources":[{"name":"Promxy","type":"prometheus","url":"http://promxy.promxy.svc.cluster.local:8082","isDefault":true}]}}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we need to get the admin password for the admin user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can port-forward the Grafana service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n grafana services/grafana 9090:80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt; to access the web UI, and log in with the username &lt;code&gt;admin&lt;/code&gt;, and the password acquired in the previous step. From here you can verify that Promxy is set up and acting as the default data source by navigating to Administration -&amp;gt; Data sources -&amp;gt; Promxy, and clicking the Test button at the bottom of the page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsik76cty2tbpisxi2lj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsik76cty2tbpisxi2lj.png" alt="Grafana UI" width="639" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtnmxyfaxxyblg43oerc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtnmxyfaxxyblg43oerc.png" alt="Data source test" width="276" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming the test was successful, we can then navigate to the “Explore” item in the menu&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcog4aklr3ykw4kblf0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcog4aklr3ykw4kblf0y.png" alt="Grafana Explore tab" width="258" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and check that we have metrics available in the “metrics explorer” section. Alternatively, we can use the following query to check that metrics are available:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sum(kube_pod_container_status_restarts_total) by (namespace, container)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;N.B. you might have to change the time range for the query to get results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf3d89qret3ve8h0sqgt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf3d89qret3ve8h0sqgt.png" alt="Metrics in Grafana" width="777" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Grafana as the metrics visualisation tool for our metrics platform. This enables us to create dashboards and Grafana alerts for metrics from all sources  sending metrics to our long-term storage cluster (or clusters if we have multiple regions) that are queried using Promxy.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>grafana</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Application metrics</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:27:31 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024</guid>
      <description>&lt;p&gt;When creating our own applications, we need to use a metrics library to generate the metrics and then inside our application functions increment said metrics. With Go for example, we can use the Prometheus library. The metrics will then be exposed to the /metrics endpoint. If our application is inside a Kubernetes cluster with a prometheus-operator, we can use a ServiceMonitor to scrape its metrics. If we don’t have such a possibility, we can instead set up an application to send metrics straight to our long-term storage solution. For VictoriaMetrics, we can use the &lt;a href="https://github.com/VictoriaMetrics/metrics"&gt;github.com/VictoriaMetrics/metrics&lt;/a&gt; library to send the metrics to VictoriaMetrics. Remember to add authentication logic into the section pushing the metrics to the long-term storage, if necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's set up a hello-world Golang application in our cluster, and use ServiceMonitor to send its metrics to Prometheus.&lt;/p&gt;

&lt;p&gt;First, we need to update our kube-prometheus-stack Helm deployment to pick up ServiceMonitor resources with a certain label attached. We need to pass the following value to our Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus:
  prometheusSpec:
    serviceMonitorSelector:
      matchExpressions:
      - key: app
        operator: Exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted that into json so that it can be passed to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set-json 'prometheus.prometheusSpec.serviceMonitorSelector={"matchExpressions":[{"key":"app","operator":"Exists"}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will update our kube-prometheus-stack to pick up ServiceMonitor resources from any namespace, as long as they have an app label attached.&lt;/p&gt;

&lt;p&gt;Next, we are going to create a namespace for our hello-world application which is a simple Golang application exposing metrics via the &lt;a href="https://github.com/prometheus/client_golang/tree/main/prometheus"&gt;Prometheus module&lt;/a&gt;. We will borrow &lt;a href="https://github.com/okteto/go-prometheus-monitoring/blob/master/main.go"&gt;this&lt;/a&gt; already-made application, which has logic defined to increment a metric called &lt;code&gt;hello_processed_total&lt;/code&gt; each time the page is loaded. &lt;/p&gt;

&lt;p&gt;To create a namespace and a pod, we use the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create namespace hello-world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl run hello-world --namespace=hello-world --image='okteto/hello-world:golang-metrics' --labels app=hello-world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to create a service for the new pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt;'EOF' | kubectl create -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    app: hello-world
  name: hello-world
  namespace: hello-world
spec:
  ports:
  - name: http
    port: 8080
  selector:
    app: hello-world
  type: ClusterIP
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can test that our application is working by port-forwarding it. We can also check what the &lt;code&gt;hello_processed_total&lt;/code&gt; metric looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n hello-world services/hello-world 9090:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now navigate to &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt; and &lt;a href="http://localhost:9090/metrics"&gt;http://localhost:9090/metrics&lt;/a&gt;. You should see a metric called &lt;code&gt;hello_processed_total&lt;/code&gt; with a number attached. Each reload of the page will increment this number.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbv5q9gyxtv62j2klkuy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbv5q9gyxtv62j2klkuy.png" alt="Metrics" width="616" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we need to set up a ServiceMonitor to send these metrics to Prometheus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt;'EOF' | kubectl create -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hello-world
  namespace: hello-world
  labels:
    app: hello-world
spec:
  selector:
    matchLabels:
      app: hello-world
  endpoints:
    - port: http
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ServiceMonitor will target services matching the label selector (&lt;code&gt;app=hello-world&lt;/code&gt;) and will scrape the port called “http”.&lt;/p&gt;

&lt;p&gt;Now, if we port-forward our Prometheus service, we should see a new service in the service discovery section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n prometheus services/kube-prometheus-stack-prometheus 9090:9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://localhost:9090/service-discovery"&gt;http://localhost:9090/service-discovery&lt;/a&gt; and you should see that there is a new service discovered with the name &lt;code&gt;serviceMonitor/hello-world/hello-world/0&lt;/code&gt; and it should show 1/1 active targets.&lt;/p&gt;

&lt;p&gt;We can now query the &lt;code&gt;hello_processed_total metric&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1yo9jctopwjomapxjl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1yo9jctopwjomapxjl8.png" alt="Metrics in Prometheus" width="781" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved sending metrics from our custom app running in its own namespace into Prometheus.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3"&gt;Prometheus Observability Platform: Grafana&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>promethe</category>
      <category>metrics</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Handling multiple regions</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:27:13 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib</guid>
      <description>&lt;p&gt;When we have multiple regions such as the EU and US, we need to have a long-term storage solution running in both of those. If we want to combine the resources into a single query we need to use a query layer that can query both endpoints. One such component we can use is Promxy.&lt;/p&gt;

&lt;p&gt;Promxy uses the same PromQL syntax as Prometheus and we can define server groups with multiple endpoints. In our case, we would define our EU and US long-term storage endpoints under one server group. We can then use the single Promxy endpoint to query both the EU and the US.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can use the Helm chart offered in the &lt;a href="https://github.com/jacksontj/promxy/tree/master"&gt;Promxy repository&lt;/a&gt; to deploy a proxy to our Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;First we clone the repository, because the Helm chart is not published to a public registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/jacksontj/promxy.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we navigate to the folder containing the Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd promxy/deploy/k8s/helm-charts/promxy/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We set up Promxy with the following &lt;code&gt;server_groups:&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;server_groups:
  - static_configs:
    - targets:
      - vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481
      labels:
        region: eu
    scheme: http
    path_prefix: /select/0/prometheus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted this to json so we can pass it to the helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To install Promxy from the local Helm chart we use the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install promxy . --create-namespace --namespace promxy --set 'image.tag=latest' --set-json 'config.promxy={"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now port-forward the Promxy service and access the web UI from &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n promxy services/promxy 9090:8082
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we can perform the same query for the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; to verify that Promxy is able to reach the VictoriaMetrics data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nhxmxnnfh5mcl0hnku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nhxmxnnfh5mcl0hnku.png" alt="Promxy" width="777" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we have more regions than just the US, we can add them under the &lt;code&gt;server_groups&lt;/code&gt; and query multiple VictoriaMetrics instances from a single Promxy source.&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Promxy with &lt;code&gt;server_groups&lt;/code&gt; to use for querying VictoriaMetrics instances.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024"&gt;Prometheus Observability Platform: Application metrics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>promxy</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Alert routing</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:26:37 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o</guid>
      <description>&lt;p&gt;Alertmanager is a component usually bundled with Prometheus to handle routing the alerts to receivers such as Slack, e-mail, and PagerDuty. It uses a routing tree to send alerts to one or multiple receivers.&lt;/p&gt;

&lt;p&gt;Routes define which receivers each alert should be sent to. You can define rules for the routes. The rules are evaluated from top to bottom, and alerts are sent to matching receivers. Usually, the match block is used to match the label name and value for a certain receiver. Notification integrations are configured for each receiver. There are multiple different options available, such as &lt;code&gt;email_configs&lt;/code&gt;, &lt;code&gt;slack_configs&lt;/code&gt;, and &lt;code&gt;webhook_configs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Alertmanager has a web UI that can be used to view current alerts and silence them if needed.&lt;/p&gt;

&lt;p&gt;With a platform setup, we usually don’t want to use multiple Alertmanagers, so we disable the provisioning of additional alertmanagers for Prometheus deployments that include them automatically. Instead, we use one centralised Alertmanager inside, for example, a Kubernetes cluster which is aimed at monitoring platform usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb"&gt;Prometheus Observability Platform: Alerts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amtool (&lt;a href="https://github.com/prometheus/alertmanager#install-1"&gt;https://github.com/prometheus/alertmanager#install-1&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have an alert defined and deployed to vmalert we can add Alertmanager to our platform. Because we are creating this with a platform aspect in mind, we will install Alertmanager as a separate resource, and not as a part of the kube-platform-stack. We will use a tool called amtool which is bundled with alertmanager to run unit tests on our alert rules&lt;/p&gt;

&lt;p&gt;We can install the Alertmanager with the following Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install alertmanager prometheus-community/alertmanager --create-namespace --namespace alertmanager
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now port-forward the alertmanager service to access the alertmanager web UI from &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n alertmanager services/alertmanager 9090:9093
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To trigger a test alert, we can use the following command from another terminal tab while keeping the port-forwarding on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert"}}]' localhost:9090/api/v1/alerts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now use amtool to list the currently firing alerts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool alert query --alertmanager.url=http://localhost:9090
---
Alertname   Starts At                Summary  State   
TestAlert   2023-07-07 07:23:55 UTC           active 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's add a test receiver and routing for it. Below is an example of the configuration we want to pass to Alertmanager in Helm values format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;config:
  receivers:
    - name: default-receiver
    - name: test-team-receiver

  route:
    receiver: 'default-receiver'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
    routes:
      - receiver: 'test-team-receiver'
        matchers:
        - team="test-team"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted the above into a json one-liner so we can pass it into Helm without having to create an intermediate file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade alertmanager prometheus-community/alertmanager --namespace alertmanager --set-json 'config.receivers=[{"name":"default-receiver"},{"name":"test-team-receiver"}]' --set-json 'config.route={"receiver":"default-receiver","group_wait":"30s","group_interval":"5m","repeat_interval":"4h","routes":[{"receiver":"test-team-receiver","matchers":["team=\"test-team\""]}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now use amtool to test that an alert that has the label &lt;code&gt;team=test-team&lt;/code&gt; gets routed to the test-team-receiver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool config routes test --alertmanager.url=http://localhost:9090 team=test-team
---
test-team-receiver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool config routes test --alertmanager.url=http://localhost:9090 team=test     
---
default-receiver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have now set up an Alertmanager which can route alerts depending on team label value.&lt;/p&gt;

&lt;p&gt;Next, we need to update vmalert to route alerts into the Alertmanager using the cluster local address of the alertmanager service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade vmalert vm/victoria-metrics-alert --namespace victoriametrics --reuse-values --set server.notifier.alertmanager.url="http://alertmanager.alertmanager.svc.cluster.local:9093"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can run a pod that will be crashing to increment the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; metric by creating a pod that has a typo in the sleep command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl run crashpod --image busybox:latest --command -- slep 1d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we port-forward the alertmanager service. We should see an alert in there when we navigate to &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n alertmanager services/alertmanager 9090:9093
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3vu2jngob5p7evtrpeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3vu2jngob5p7evtrpeq.png" alt="alertmanager" width="778" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Alertmanager as our tool for routing alerts from the vmalert component.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib"&gt;Prometheus Observability Platform: Handling multiple regions&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>alertmanager</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Alerts</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:25:36 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb</guid>
      <description>&lt;p&gt;With Prometheus, we can use PromQL to write alert rules and evaluate them using the given evaluation rules and intervals. Alerts have an evaluation period: if an alert is active for the duration of the evaluation period, then it will fire. Prometheus is usually bundled with a component called Alertmanager, which is used to route alerts to different receivers such as Slack and email. Once an alert fires, it is sent to the Alertmanager which uses a routing table to find out if the alert is to be sent to a receiver, and how to route it.&lt;/p&gt;

&lt;p&gt;Prometheus alerts are evaluated against the local storage. With VictoriaMetrics, we can use the vmalert component to evaluate alert rules against the VictoriaMetrics long-term storage using the same PromQL syntax as with Prometheus. It is tempting to write all the alerting rules in VictoriaMetrics, but depending on the size of the infrastructure we might want to evaluate some rules on the Prometheus servers where the data originates from, to avoid overloading VictoriaMetrics.&lt;/p&gt;

&lt;p&gt;Alert rules can be very complex, and it is best to validate them before deploying them to Prometheus. Promtool can be used to validate Prometheus alerting rules and run unit tests on them. You can implement these simple validation and unit testing steps in your continuous integration (CI) system.&lt;/p&gt;

&lt;p&gt;A good monitoring platform enables teams to write their own alerts against the metrics stored in the long-term storage. We can do this in a mono-repository or multi-repository fashion. With a mono-repository, we have all the infrastructure and the alerting defined in the same repository and pipelines delivering them to servers. A multi-repository approach would set up a separate repository for the alerts, where we define the alerting rules using PromQL, and add validation and unit tests.&lt;/p&gt;

&lt;p&gt;The main benefit of the multi-repository approach is reduced cognitive load. The contributors do not see or need to be aware of anything else than the alert rules. This also eliminates the possibility of introducing bugs into the underlying infrastructure. The downside of this approach is tying the separated alerting configuration back to the Prometheus server. &lt;/p&gt;

&lt;p&gt;Terraform can be used to set up the repository used for alerting as a remote module and thus pull the alerting rules into the server when deploying the server. With a mono-repository, we can more easily tie the alerts into the Prometheus server, but if we are using Terraform then we need to either split the state of the alerts or accept that the contributors might affect more resources than just the alerts which might also cause more anxiety to the contributors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Promtool (&lt;a href="https://github.com/prometheus/prometheus/tree/main#building-from-source"&gt;https://github.com/prometheus/prometheus/tree/main#building-from-source&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;yq (optional)&lt;/li&gt;
&lt;li&gt;jq (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Figuring out suitable metrics for alerts can be hard. The &lt;a href="https://samber.github.io/awesome-prometheus-alerts/"&gt;awesome-prometheus-alerts&lt;/a&gt; website is an excellent source for inspiration for this. It has a collection of pre-made alerts using the PromQL syntax. For example, we can set up an alert for crash-looping Kubernetes pods, with the alert named &lt;code&gt;KubernetesPodCrashLooping&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Below there is an example unit test for the &lt;code&gt;KubernetesPodCrashLooping&lt;/code&gt; alert. First, we want to simplify the alert a little and add some required blocks for promtool to be able to validate the rule. This file is saved as &lt;code&gt;kube-alerts.rules.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;groups:
  - name: kube-alerts
    rules:
    - alert: KubernetesPodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the command promtool check rules &lt;code&gt;kube-alert.rules.yml&lt;/code&gt; to validate the rule. If everything is OK, the response looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;promtool check rules kube-alert.rules.yml
---
Checking kube-alert.rules.yml
  SUCCESS: 1 rules found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To write a unit test for this alert, we create a file called &lt;code&gt;kube-alert.test.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rule_files:
  - kube-alert.rules.yml

evaluation_interval: 1m

tests:
  - interval: 1m

    input_series:
      - series: kube_pod_container_status_restarts_total{namespace="test-namespace",pod="test-pod"}
        values: '1+2x15'

    alert_rule_test:
      - alertname: KubernetesPodCrashLooping
        eval_time: 15m
        exp_alerts:
          - exp_labels:
              severity: warning
              namespace: test-namespace
              pod: test-pod
            exp_annotations:
              summary: Pod test-namespace/test-pod is crash looping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we are expecting an increase of over 2 in the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; time series within 5 minutes, and that this increase is active for at least 10 minutes. In the summary field, we expect to receive namespace and pod labels and to a severity label with the value “warning”.&lt;/p&gt;

&lt;p&gt;To write a test for this rule we need to create an input series that can trigger this rule and pass all the labels needed for the summary field. Because our evaluation time is 15 minutes we need at least 15 entries into our series. The syntax &lt;code&gt;‘1+2x15’&lt;/code&gt; adds 2 to the previous number 15 times to create a time series. We also pass the required namespace and pod labels and write the expected summary field response.&lt;/p&gt;

&lt;p&gt;To run the unit test we use the command promtool test rules &lt;code&gt;kube-alert.test.yml&lt;/code&gt; which will return the following response if all went well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;promtool test rules kube-alert.test.yml
---
Unit Testing:  kube-alert.test.yml
  SUCCESS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we need to deploy vmalert so that we can evaluate alert rules against the data in the long-term storage.&lt;/p&gt;

&lt;p&gt;First we have to convert our alert rule into a format that works with Helm. The problem with promtool and Helm charts is that the groups: section is required in both of them, so we need to remove it from the alert we have created, but if we remove it then promtool no longer works. There are multiple ways to handle this, for example the Terraform &lt;code&gt;trimprefix()&lt;/code&gt; function which can be used to remove the &lt;code&gt;groups:&lt;/code&gt; section from the alert rules. For this use case we are going to use a monstrous one-liner to remove the groups: section, convert the output into json and then output it into a single line so we can pass it to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat kube-alert.rules.yml | sed '/groups:/d' | yq -o=json | jq -c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will get us the following json one line string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can deploy the vmalert Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install vmalert vm/victoria-metrics-alert --namespace victoriametrics --set 'server.notifier.alertmanager.url=http://localhost:9093' --set 'server.datasource.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set 'server.remote.write.url=http://vmcluster-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus' --set 'server.remote.read.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set-json 'server.config.alerts.groups=[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;server.notifier.alertmanager:&lt;/code&gt; We are using a placeholder value here for now, as we cannot install the chart without providing some value here&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.datasource.url:&lt;/code&gt; Prometheus HTTP API compatible datasource&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.remote.write.url:&lt;/code&gt; Remote write url for storing rules and alert states&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.remote.read.url:&lt;/code&gt; URL to restore the alert states from&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can now port-forward the vmalert service and navigate to the web UI in &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1atpvxvxhmrfy192ikqj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1atpvxvxhmrfy192ikqj.png" alt="alert" width="783" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved creating an alert rule, writing a unit test for it, and setting up vmalert with the alert rule defined.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o"&gt;Prometheus Observability Platform: Alert routing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>alerting</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Long-term storage</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:25:05 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj</guid>
      <description>&lt;p&gt;As Prometheus is not so well designed for persisting data, a long-term storage solution is called for. Multiple different products can handle long-term storage for Prometheus metrics for example VictoriaMetrics, Grafana Mimir, Thanos, and M3.&lt;/p&gt;

&lt;p&gt;With some of these options, we get the capability to store the data into object storage which is ideal for modern workloads running in Kubernetes as we don’t want to store any persistent data inside our cluster. Object storage can be for example Azure Blob storage or AWS S3. This option however has downsides on performance (compared to block storage), so if you have high performance requirements, you might have to look into block storage options.&lt;/p&gt;

&lt;p&gt;In this document, we will be focusing on VictoriaMetrics. It was chosen because it is open-source, highly performant, and all its crucial components are free. VictoriaMetrics can only handle block storage, but it is also very fast due to its simple architecture designed only for local storage. It can be run in single mode or clustered. The central part of the architecture consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vmstorage component, which stores the time series data;&lt;/li&gt;
&lt;li&gt;vmselect, used to fetch and merge data from vmstorage; and&lt;/li&gt;
&lt;li&gt;vminsert, which inserts the data into vmstorage nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the clustered version, data is distributed evenly across the vmstorage nodes by the vminsert component, and the distributed data is then fetched and merged by the vmselect component. In Kubernetes, each of these components will have its own pod and the vmselect and vminsert components will have a service to load balance the traffic. All the vmstorage endpoints (pods) will be connected to the vminsert and vmselect pods.&lt;/p&gt;

&lt;p&gt;VictoriaMetrics also has multiple additional features, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the vmalert component which can be used for alerts about the data, and&lt;/li&gt;
&lt;li&gt;vmagent which can be used as a data ingestion point, and for filtering and re-labelling metrics.&lt;/li&gt;
&lt;li&gt;the vmauth component for simple authentication, which uses credentials from the Authorization header. (You can also use some other component in front of vminsert or vmagent, such as oauth2-proxy to handle authentication.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju9mnto04ijmfyqba1fs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju9mnto04ijmfyqba1fs.png" alt="Basic architecture" width="708" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can set up Prometheus to write the data it receives into the long-term storage using the &lt;code&gt;remote_write&lt;/code&gt; block in the configuration. If authentication is set up, it also needs to be defined in the &lt;code&gt;remote_write&lt;/code&gt; block.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;We will now set up the following architecture with minikube, Prometheus, and VictoriaMetrics. This example assumes that you have completed the steps from &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi988mm3bbk13219fo2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi988mm3bbk13219fo2r.png" alt="Demo architecture" width="523" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First we add the VictoriaMetrics helm chart and install it into the VictoriaMetrics namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add vm https://victoriametrics.github.io/helm-charts/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install vmcluster vm/victoria-metrics-cluster --create-namespace --namespace victoriametrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We should now see six pods running in the victoriametrics namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -n victoriametrics
---
NAME                                                           READY   STATUS    RESTARTS   AGE
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-gqx25    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-t8kcn    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-42wjs   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-t2jhp   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-0                 1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-1                 1/1     Running   0          58s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To access the vmselect web UI we need to port forward the vmselect service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then navigate to &lt;a href="http://localhost:9090/select/0/prometheus/vmui"&gt;http://localhost:9090/select/0/prometheus/vmui&lt;/a&gt; to access the vmselect VMUI. The URL is a clustered url with the 0 representing the accountid of the cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa28olty3pma1bdbrspz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa28olty3pma1bdbrspz.png" alt="VMUI" width="780" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To set up remote writing from Prometheus into the VictoriaMetrics we need to update our kube-prometheus-stack deployment with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set 'prometheus.prometheusSpec.remoteWrite[0].url=http://vmcluster-victoria-metrics-cluster-vminsert.victoriametrics.svc.cluster.local:8480/insert/0/prometheus/'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s break down the URL provided above. First, we have the service name for vminsert, which you can find with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get svc -n victoriametrics 
---
NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
vmcluster-victoria-metrics-cluster-vminsert    ClusterIP   10.100.63.152   &amp;lt;none&amp;gt;        8480/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmselect    ClusterIP   10.99.12.151    &amp;lt;none&amp;gt;        8481/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmstorage   ClusterIP   None            &amp;lt;none&amp;gt;        8482/TCP,8401/TCP,8400/TCP   3h32m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we have the namespace victoriametrics. Since Prometheus and VictoriaMetrics are in different namespaces, we have &lt;code&gt;svc.cluster.local&lt;/code&gt; followed by the port number of the vminsert service and finally, we have the Prometheus enabled write endpoint with the accountid 0 in it, because we are using the clustered version of VictoriaMetrics.&lt;/p&gt;

&lt;p&gt;On the VMUI (&lt;a href="http://localhost:9090/select/0/prometheus/vmui"&gt;http://localhost:9090/select/0/prometheus/vmui&lt;/a&gt;) we can now verify that metrics are coming from Prometheus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to the Query section and insert &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; into the Query field. You should now see approximately the same output as you did from Prometheus.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbozct054mgkx3ww10if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbozct054mgkx3ww10if.png" alt="Metrics" width="774" height="806"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now set up a simple Prometheus and VictoriaMetrics integration&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb"&gt;Prometheus Observability Platform: Alerts&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Prometheus Observability Platform: Prometheus</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:24:21 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019</guid>
      <description>&lt;p&gt;Prometheus is an open-source application used for monitoring and alerting. It uses a time series database (TSDB) to store the data fed into it. It is designed to scrape metrics from the targets’ HTTP endpoints using a pull method. It is also capable of being a push target for metrics when using a push gateway. It employs its proprietary query language, PromQL, to query the data stored in its TSDB.&lt;/p&gt;

&lt;p&gt;The Prometheus server component can evaluate the data it holds and create alerts based on PromQL queries. These alerts are then sent to a component called Alertmanager where you can set up routing for alerts. The routes can be, for example, to Slack or email.&lt;/p&gt;

&lt;p&gt;The default data retention period is 15 days, but it can be set as low as 2 hours and has no upper limit. The local storage that Prometheus uses cannot be clustered or replicated, which is why it is not advised to use Prometheus itself for long-term storage of metrics. Also, the local TSDB gets corrupted easily, so having the option to just drop it without losing metrics is desirable. This is why we want to use the remote write capability to forward the metrics into a more robust long-term storage solution.&lt;/p&gt;

&lt;p&gt;In Kubernetes, we have kube-prometheus-stack. This is a Helm chart that contains the Prometheus components for Kubernetes. Prometheus uses a node exporter to scrape metrics from the nodes and a custom resource definition (CRD) called ServiceMonitor to scrape metrics from pods behind a service. There is also a CRD called PodMonitor if you don’t have a service in front of the pods.&lt;/p&gt;

&lt;p&gt;Prometheus has a concept of exporters. These are a collection of libraries and servers which are capable of exporting metrics from third-party systems into Prometheus metrics. The most used one is node exporter which is used to collect hardware and OS metrics exposed by Linux kernels.&lt;/p&gt;

&lt;p&gt;In Prometheus, we can use the &lt;code&gt;remote_write&lt;/code&gt; block to forward data into another Prometheus metrics-capable source. If we want to chain remote writing from Prometheus to another Prometheus, then we need to enable a feature flag for the remote write receiver. Remote writing supports multiple types of authentication methods when the long-term storage requires authentication, such as OAuth2.&lt;/p&gt;

&lt;p&gt;With Prometheus, we want to have a Prometheus server as close as possible to the physical servers, so we get the least networking latency between the target and the Prometheus server. In the case of data centres, we can set up a Prometheus server in each data centre zone and have it pull metrics from targets in that data centre zone using telemetry agents such as openTelemetry or act as a &lt;code&gt;remote_write&lt;/code&gt; target for workloads that push metrics. This will lead to having multiple Prometheus servers in multiple data centre regions and zones, so your applications need to be aware of where they are located and which Prometheus instance they are supposed to be connected to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;To test out Prometheus we can use minikube to run a local Kubernetes cluster and then the kube-prometheus-stack Helm chart to install Prometheus into the minikube cluster. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helm (&lt;a href="https://helm.sh/docs/intro/install/"&gt;https://helm.sh/docs/intro/install/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Docker (&lt;a href="https://docs.docker.com/engine/install/"&gt;https://docs.docker.com/engine/install/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Minikube (&lt;a href="https://minikube.sigs.k8s.io/docs/start/"&gt;https://minikube.sigs.k8s.io/docs/start/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Kubectl (&lt;a href="https://kubernetes.io/docs/tasks/tools/"&gt;https://kubernetes.io/docs/tasks/tools/&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First, we need to start our minikube cluster. I defined the Kubernetes version as v1.26.3 here, this is the version which these examples have been tested with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;minikube start --kubernetes-version=v1.26.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can validate that you have a connection to your minikube cluster by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl cluster-info
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we want to add the kube-prometheus-stack Helm chart and install it into the Prometheus namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --create-namespace --namespace prometheus --set grafana.enabled=false --set alertmanager.enabled=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that we are disabling Grafana and Alertmanager for now as we will be installing them manually in the coming parts.&lt;/p&gt;

&lt;p&gt;To test that Prometheus is operational, we can port forward the Prometheus service locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n prometheus services/kube-prometheus-stack-prometheus 9090:9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now navigate to &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt; to access the Prometheus web UI. Notice that some of the targets are giving errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1kayshm55un2ndv76f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1kayshm55un2ndv76f.png" alt="prometheus-ui" width="779" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now set up a simple kube-prometheus-stack onto our cluster and are ready to tackle the next steps.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>prometheus</category>
      <category>observability</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Platform</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:24:13 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-platform-5d8p</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-platform-5d8p</guid>
      <description>&lt;h1&gt;
  
  
  What is a platform?
&lt;/h1&gt;

&lt;p&gt;In the modern world of IT, a platform is considered to be a set of shared resources that is utilised by multiple people, such as teams in an organisation. A platform usually consists of a shared codebase that teams can either self-checkout into use, or is automatically included in their workflow.&lt;/p&gt;

&lt;p&gt;Earlier there was a buzz about the phrase “you build it, you run it”, which then became a big thing. It meant that each team in the organisation was capable of building their whole application stack including the infrastructure components. They also had the responsibility to maintain said infrastructure. There are benefits to this approach. The biggest one is the fast iteration rate of new ideas and technologies. This is enabled by all the responsibility being inside the team, not tied to a bigger technology stack.&lt;/p&gt;

&lt;p&gt;One of the biggest downsides of this approach is that teams become technology and knowledge silos. Every team can have different technologies running and the information is often siloed into that team only. This can lead to a situation where multiple teams are using multiple different technology stacks, and the wheel gets reinvented multiple times, due to a lack of proper knowledge sharing practices. Since the technology stacks can be so different between the teams, getting new team members to replace old ones can also be very difficult.&lt;/p&gt;

&lt;p&gt;Nowadays the full stack of an application is very complex. We have to have specialists in application full stack development, infrastructure specialists in the cloud or data centre environments, network specialists to fully understand the networking stack and architecture, and security specialists to keep everything secure. Getting all this expertise into a single team can be very challenging. To partly remedy this issue we can look into platform engineering where we can offload part of the complexity into units specialised in that section.&lt;/p&gt;

&lt;p&gt;With platform engineering, we create a platform that each team can build upon. We can, for example, initially have the networking people, the infrastructure people, and the security people collaborate to create a shared codebase that embodies industry best practices and the special requirements of the company. This shared codebase can be for example a self-checkout platform for Kubernetes clusters (AKS, EKS, or GKE) with all bells and whistles included such as cert-manager, external-dns, ingress controller, etc. It can be a single Kubernetes cluster where we split teams with namespaces and provide the teams with ways of deploying their workloads into the cluster. It can be a bigger platform codebase where teams can deploy modules of components that conform to security standards and best practices (e.g. having network integration in private endpoint form, firewalls enabled with preset whitelists for VPN endpoints).&lt;/p&gt;

&lt;p&gt;This drastically reduces the cognitive load for teams developing their applications. There are some downsides to this approach due to the increased reliance on the shared codebase. Updates to the code are no longer done by just a single team. You have to communicate the updates and the required steps for performing them. For example, if you upgrade the Kubernetes codebase to update the major version of the cluster, you need to communicate this and create instructions for performing the upgrade and dealing with possibly deprecated resources. You will probably end up versioning your shared codebase (depending on the size of the organisation), as requiring all teams to use the main branch at all times can cause too much overhead for the teams.&lt;/p&gt;

&lt;p&gt;The cloud moves very fast and new technologies are constantly being developed. With the platform approach, infrastructure teams can focus on shared practices, and keep evolving those. Security people can focus on creating shared security tooling such as implementing ways for teams to use SonarQube. The networking team can focus on creating a shared networking infrastructure where components can easily be used across teams, using DNS resolution that is usable in multiple clouds and on-premises data centres. The platform team can focus on listening to the needs of the development teams, creating shared code for the most often used components, and helping teams get past hurdles that the cloud imposes (such as migrating from a PostgreSQL version to another).&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>prometheus</category>
      <category>platform</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Intro</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:24:00 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-intro-4em6</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-intro-4em6</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjx0g5abkyd35s64rdp5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjx0g5abkyd35s64rdp5.png" alt="architecture diagram" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a company reaches a certain size and complexity, it becomes hard to track all the metrics that the applications are generating. We can end up with teams in the company running their own observability tooling, or multiple sets of stand-alone Prometheus servers which are handled as multiple data sources in Grafana.&lt;/p&gt;

&lt;p&gt;The observability platform for metrics with Prometheus (later referred to as metrics platform) is a way for all the teams and products in the company to utilise the same observability tooling for metrics-based telemetry. In short, this means that every team will send metrics to the same long-term Prometheus storage, use the same data source in Grafana when creating dashboards, and be able to set up alerts from these metrics using either Grafana alerts or Prometheus native alerts.&lt;/p&gt;

&lt;p&gt;We want all our Prometheus servers to write their data into a long-term storage solution. If the architecture consists of multiple Kubernetes clusters, we want every cluster to have its own prometheus-operator installed and set up to send metrics.&lt;/p&gt;

&lt;p&gt;With this centralisation, we can use a single data source to access all the metrics from all our infrastructure connected to the metrics platform. This enables creating dashboards that easily aggregate multiple Kubernetes clusters in a single panel, and allow drilling down to a single resource from the dashboard.&lt;/p&gt;

&lt;p&gt;This series of posts will be a deep dive into the concept of a metrics platform running on Kubernetes, consisting of the following parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://prometheus.io/"&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://victoriametrics.com/"&gt;VictoriaMetrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jacksontj/promxy"&gt;Promxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com/"&gt;Grafana&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first part of this series is a look at what a platform is. From here we will continue with setting Prometheus up on our minikube cluster, and leveraging VictoriaMetrics as our long-term storage system. We will set up alerts using Prometheus alerting syntax and use &lt;a href="https://prometheus.io/docs/prometheus/latest/command-line/promtool/"&gt;promtool&lt;/a&gt; to run unit tests on them. We will then continue setting up &lt;a href="https://docs.victoriametrics.com/vmalert.html"&gt;vmalert&lt;/a&gt; as our alert handling component and send alerts to &lt;a href="https://prometheus.io/docs/alerting/latest/alertmanager/"&gt;Alertmanager&lt;/a&gt;. Then we will use Promxy to handle situations involving multiple Kubernetes clusters in multiple regions. We will set up a custom app in our cluster and use Prometheus ServiceMonitor to pick up its metrics. Lastly, we will set up Grafana to use a single data source to access all the metrics from our whole platform.&lt;/p&gt;

&lt;p&gt;Links to each part of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-platform-5d8p"&gt;Prometheus Observability Platform: Platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb"&gt;Prometheus Observability Platform: Alerts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o"&gt;Prometheus Observability Platform: Alert routing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib"&gt;Prometheus Observability Platform: Handling multiple regions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024"&gt;Prometheus Observability Platform: Application metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3"&gt;Prometheus Observability Platform: Grafana&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>observability</category>
      <category>metrics</category>
      <category>platform</category>
    </item>
  </channel>
</rss>
