<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alessandro Vozza</title>
    <description>The latest articles on DEV Community by Alessandro Vozza (@ams0).</description>
    <link>https://dev.to/ams0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F20554%2F9a8c830c-ebbc-40b9-b813-d57b71b0d307.jpg</url>
      <title>DEV Community: Alessandro Vozza</title>
      <link>https://dev.to/ams0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ams0"/>
    <language>en</language>
    <item>
      <title>Stateless, Secretless Multi-cluster Monitoring in Azure Kubernetes Service with Thanos, Prometheus and Azure Managed Grafana</title>
      <dc:creator>Alessandro Vozza</dc:creator>
      <pubDate>Mon, 25 Jul 2022 09:15:52 +0000</pubDate>
      <link>https://dev.to/ams0/stateless-secretless-multi-cluster-monitoring-in-azure-kubernetes-service-with-thanos-prometheus-and-azure-managed-grafana-37jg</link>
      <guid>https://dev.to/ams0/stateless-secretless-multi-cluster-monitoring-in-azure-kubernetes-service-with-thanos-prometheus-and-azure-managed-grafana-37jg</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Observability is paramount to every distributed system and it's becoming increasingly complicated in a cloud native world where we might deploy multiple ephemeral clusters and we want to keep their metrics beyond their lifecycle span.&lt;/p&gt;

&lt;p&gt;This article aims at cloud native engineers that face the challenge of observing multiple Azure Kubernetes Clusters (AKS) and need a flexible, stateless solution, leveraging available and cost-effective blob storage for long term retention of metrics, one which does not require injecting static secrets to access the storage (as it leverage the native Azure Managed Identities associated with the cluster).&lt;/p&gt;

&lt;p&gt;This solution builds upon well-established Cloud Native Computing Foundation (&lt;a href="https://cncf.io" rel="noopener noreferrer"&gt;CNCF&lt;/a&gt;) open source projects like &lt;a href="https://thanos.io" rel="noopener noreferrer"&gt;Thanos&lt;/a&gt; and &lt;a href="https://prometheus.io" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;,together with a new managed services, Azure Managed Grafana, &lt;a href="https://azure.microsoft.com/en-us/blog/enhance-your-data-visualizations-with-azure-managed-grafana-now-in-preview/" rel="noopener noreferrer"&gt;recently released in public preview&lt;/a&gt;. It allows for ephemeral clusters to still have updated metrics without the 2-hours local storage of metrics in the classic deployment of Thanos sidecar to Prometheus.&lt;/p&gt;

&lt;p&gt;This article was inspired by several sources, most importantly this two articles: &lt;a href="https://techcommunity.microsoft.com/t5/apps-on-azure-blog/using-azure-kubernetes-service-with-grafana-and-prometheus/ba-p/3020459" rel="noopener noreferrer"&gt;Using Azure Kubernetes Service with Grafana and Prometheus&lt;/a&gt; and &lt;a href="https://techcommunity.microsoft.com/t5/apps-on-azure-blog/store-prometheus-metrics-with-thanos-azure-storage-and-azure/ba-p/3067849" rel="noopener noreferrer"&gt;Store Prometheus Metrics with Thanos, Azure Storage and Azure Kubernetes Service&lt;/a&gt; on &lt;a href="https://techcommunity.microsoft.com" rel="noopener noreferrer"&gt;Microsoft Techcommunity blog&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;An 1.23 or 1.24 AKS cluster with either a user-managed identity assigned to the kubelet identity or system-assigned identity&lt;/li&gt;
&lt;li&gt;Ability to assign roles on Azure resources (User Access Administrator role)&lt;/li&gt;
&lt;li&gt;A storage account&lt;/li&gt;
&lt;li&gt;(Recommended) A public DNS zone in Azure&lt;/li&gt;
&lt;li&gt;Azure CLI&lt;/li&gt;
&lt;li&gt;Helm CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;We will deploy all components of Thanos and Prometheus in a single cluster, but since they are couple only via the ingress they don't need to be co-located.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmanl414h5bozfe0365z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmanl414h5bozfe0365z.png" alt="Diagram" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster-wide services
&lt;/h3&gt;

&lt;p&gt;For Thanos receive and query components to be available outside the cluster and secured with TLS, we will need &lt;a href="https://github.com/kubernetes/ingress-nginx" rel="noopener noreferrer"&gt;ingress-nginx&lt;/a&gt; and &lt;a href="https://cert-manager.io/" rel="noopener noreferrer"&gt;cert-manager&lt;/a&gt;. For ingress, deploy the Helm chart using the following command, to account for this &lt;a href="https://github.com/Azure/AKS/issues/2955" rel="noopener noreferrer"&gt;issue&lt;/a&gt; with AKS clusters &amp;gt;1.23:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;--install&lt;/span&gt; ingress-nginx ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo&lt;/span&gt; https://kubernetes.github.io/ingress-nginx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; controller.service.annotations.&lt;span class="s2"&gt;"service&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;beta&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;kubernetes&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;io/azure-load-balancer-health-probe-request-path"&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/healthz &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; controller.service.externalTrafficPolicy&lt;span class="o"&gt;=&lt;/span&gt;Local &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; ingress-nginx &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the extra annotations and the &lt;code&gt;externalTrafficPolicy&lt;/code&gt; set to &lt;code&gt;Local&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next, we need &lt;code&gt;cert-manager&lt;/code&gt; to automatically provision SSL certificates from Let's Encrypt; we will just need a valid email address for the ClusterIssuer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; cert-manager &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; cert-manager &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;installCRDs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; ingressShim.defaultIssuerName&lt;span class="o"&gt;=&lt;/span&gt;letsencrypt-prod &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; ingressShim.defaultIssuerKind&lt;span class="o"&gt;=&lt;/span&gt;ClusterIssuer &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo&lt;/span&gt; https://charts.jetstack.io cert-manager

kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: email@email.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Last but not least, we will add a DNS record for our ingress Loadbalancer IP, so it will be seamless to get public FQDNs for our endpoints for Thanos receive and Thanos Query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az network dns record-set a add-record  &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"*.thanos"&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; dns &lt;span class="nt"&gt;-z&lt;/span&gt; cookingwithazure.com &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--ipv4-address&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;kubectl get svc ingress-nginx-controller &lt;span class="nt"&gt;-n&lt;/span&gt; ingress-nginx &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"{.status.loadBalancer.ingress[0].ip}"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note how we use &lt;code&gt;kubectl&lt;/code&gt; with &lt;code&gt;jsonpath&lt;/code&gt; type output to get the ingress public IP. We can now leverage the wildcard FQDN &lt;code&gt;*.thanos.cookingwithazure.com&lt;/code&gt; in our ingresses and cert-manager will be able to obtain the relative certificate seamlessly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage account preparation
&lt;/h3&gt;

&lt;p&gt;Because we do not want to store any secret or service principal in-cluster, we will leverage the Managed Identities assigned to the cluster and assign the relevant Azure Roles to the storage account.&lt;/p&gt;

&lt;p&gt;Once you have created or identified the storage account to use and created a container within it, to store the Thanos metrics, assign the roles using the &lt;code&gt;azure cli&lt;/code&gt;; first, determine the clientID of the managed identity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;clientid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aks show &lt;span class="nt"&gt;-g&lt;/span&gt; &amp;lt;rg&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;cluster_name&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json &lt;span class="nt"&gt;--query&lt;/span&gt; identityProfile.kubeletidentity.clientId&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, assign the role of &lt;code&gt;Reader and Data Access&lt;/code&gt; to the &lt;strong&gt;Storage account&lt;/strong&gt; (you need this so the cloud controller can generate access keys for the containers) and the &lt;code&gt;Storage Blob Data Contributor&lt;/code&gt; role &lt;strong&gt;to the container only&lt;/strong&gt; (there's no need to give this permission at the storage account level, because it will enable writing to &lt;em&gt;every&lt;/em&gt; container, which we don't need. Always remember to apply the &lt;a href="https://www.cisa.gov/uscert/bsi/articles/knowledge/principles/least-privilege" rel="noopener noreferrer"&gt;principles of least privileges&lt;/a&gt;!)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az role assignment create &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Reader and data access"&lt;/span&gt; &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$clientid&lt;/span&gt; &lt;span class="nt"&gt;--scope&lt;/span&gt; /subscriptions/&amp;lt;subID&amp;gt;/resourceGroups/&amp;lt;rg&amp;gt;/providers/Microsoft.Storage/storageAccounts/&amp;lt;account_name&amp;gt;

az role assignment create &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Storage Blob Data Contributor"&lt;/span&gt; &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$clientid&lt;/span&gt; &lt;span class="nt"&gt;--scope&lt;/span&gt; /subscriptions/&amp;lt;subID&amp;gt;/resourceGroups/&amp;lt;rg&amp;gt;/providers/Microsoft.Storage/storageAccounts/&amp;lt;account_name&amp;gt;/containers/&amp;lt;container_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create basic auth credentials
&lt;/h3&gt;

&lt;p&gt;Ok, we kinda cheated in the title: you &lt;strong&gt;do&lt;/strong&gt; need one credential at least for this setup, and it's the one to access the Prometheus API exposed by Thanos from Azure Managed Grafana. We will use the same credentials (but feel free to generate a different one) to push metrics from Prometheus to Thanos using &lt;code&gt;remote-write&lt;/code&gt; via the ingress controller. You'll need a strong password stored into a file called &lt;code&gt;pass&lt;/code&gt; locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;htpasswd &lt;span class="nt"&gt;-c&lt;/span&gt;  &lt;span class="nt"&gt;-i&lt;/span&gt; auth thanos &amp;lt; pass

&lt;span class="c"&gt;#Create the namespaces&lt;/span&gt;
kubectl create ns thanos
kubectl create ns prometheus

&lt;span class="c"&gt;#for Thanos Query and Receive&lt;/span&gt;
kubectl create secret generic &lt;span class="nt"&gt;-n&lt;/span&gt; thanos basic-auth &lt;span class="nt"&gt;--from-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;auth

&lt;span class="c"&gt;#for Prometheus remote write&lt;/span&gt;
kubectl create secret generic &lt;span class="nt"&gt;-n&lt;/span&gt; prometheus remotewrite-secret &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;thanos &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;pass&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now have the secrets in place for the ingresses and for deploying Prometheus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying Thanos
&lt;/h3&gt;

&lt;p&gt;We will use the &lt;a href="https://github.com/bitnami/charts/tree/master/bitnami/thanos/" rel="noopener noreferrer"&gt;Bitnami chart&lt;/a&gt; to deploy the Thanos components we need.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; thanos &lt;span class="nt"&gt;-n&lt;/span&gt; monitoring &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="nt"&gt;--values&lt;/span&gt; thanos-values.yaml bitnami/thanos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's go thru the relevant sections of the &lt;a href="https://github.com/ams0/ams0/blob/main/blog/dev.to/posts/stateless-monitoring-with-aks-thanos-prometheus-grafana/assets/files/thanos-values.yaml"&gt;values file&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;objstoreConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|-&lt;/span&gt;
  &lt;span class="s"&gt;type: AZURE&lt;/span&gt;
  &lt;span class="s"&gt;config:&lt;/span&gt;
    &lt;span class="s"&gt;storage_account: "thanostore"&lt;/span&gt;
    &lt;span class="s"&gt;container: "thanostore"&lt;/span&gt;
    &lt;span class="s"&gt;endpoint: "blob.core.windows.net"&lt;/span&gt;
    &lt;span class="s"&gt;max_retries: 0&lt;/span&gt;
    &lt;span class="s"&gt;user_assigned_id: "5c424851-e907-4cb0-acb5-3ea42fc56082"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(replace the &lt;code&gt;user_assigned_id&lt;/code&gt; with the object id of your kubeletIdentity, for more information about AKS identities, check out &lt;a href="https://docs.microsoft.com/en-us/azure/aks/use-managed-identity#use-a-pre-created-kubelet-managed-identity" rel="noopener noreferrer"&gt;this article&lt;/a&gt;) This section instructs the Thanos Store Gateway and Compactor to use an Azure Blob store, and to use the kubelet identity to access it. Next, we enable the ruler and the query components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;ruler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;queryFrontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also enable autoscaling for the stateless query components (the &lt;code&gt;query&lt;/code&gt; and the &lt;code&gt;query-frontend&lt;/code&gt;; the latter helps aggregating read queries), and we enable simple authentication for the Query frontend service using &lt;code&gt;ingress-nginx&lt;/code&gt; annotations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;queryFrontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cert-manager.io/cluster-issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;letsencrypt-prod&lt;/span&gt;
      &lt;span class="na"&gt;nginx.ingress.kubernetes.io/auth-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;basic&lt;/span&gt;
      &lt;span class="na"&gt;nginx.ingress.kubernetes.io/auth-secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;basic-auth&lt;/span&gt;
      &lt;span class="na"&gt;nginx.ingress.kubernetes.io/auth-realm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authentication&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Required&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;thanos'&lt;/span&gt;
    &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;query.thanos.cookingwithazure.com&lt;/span&gt;
    &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The annotation references the &lt;code&gt;basic-auth&lt;/code&gt; secret we created before from the &lt;code&gt;htpasswd&lt;/code&gt; credentials. Note that the same annotations are also under the &lt;code&gt;receive&lt;/code&gt; section, as we're using the exact same secret for pushing metrics &lt;em&gt;into&lt;/em&gt; Thanos (although with a different &lt;code&gt;hostname&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Prometheus remote-write
&lt;/h3&gt;

&lt;p&gt;Until full support for Agent mode lands in the Prometheus operator (follow this &lt;a href="https://github.com/prometheus-community/helm-charts/issues/1519" rel="noopener noreferrer"&gt;issue&lt;/a&gt;), we can use the &lt;a href="https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage" rel="noopener noreferrer"&gt;remote write feature&lt;/a&gt; to ship every metrics instantly to a remote endpoint, in our case represented by the Thanos Query Frontend ingress. Let's start by deploying Prometheus using the &lt;a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack" rel="noopener noreferrer"&gt;kube-prometheus-stack helm chart&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm  upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; prometheus promremotewrite &lt;span class="nt"&gt;-f&lt;/span&gt; prom-remotewrite.yaml prometheus-community/kube-prometheus-stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's go thru the &lt;a href="https://github.com/ams0/ams0/blob/main/blog/dev.to/posts/stateless-monitoring-with-aks-thanos-prometheus-grafana/assets/files/prometheus-values.yaml"&gt;values file&lt;/a&gt; to explain the options we need to enable remote-write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;prometheus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;prometheusSpec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;externalLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;datacenter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;westeu&lt;/span&gt;
      &lt;span class="na"&gt;cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;playground&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables Prometheus and attaches two extra labels to every metrics, so it becomes easier to filter data coming from multiple sources/clusters later in Grafana.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;    &lt;span class="na"&gt;remoteWrite&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://receive.thanos.cookingwithazure.com/api/v1/receive"&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Thanos&lt;/span&gt;
      &lt;span class="na"&gt;basicAuth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;remotewrite-secret&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;remotewrite-secret&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This section points to the remote endpoint (secured via SSL using Let's Encrypt certificates, thus trusted by the certificate store on the AKS nodes; if you use a non-trusted certificate, refer to the &lt;a href="https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#tlsconfig" rel="noopener noreferrer"&gt;TLSConfig&lt;/a&gt; section of the PrometheusSpec API). Note how the credentials to access the remote endpoint are coming from the secret created beforehand and stored in the &lt;code&gt;prometheus&lt;/code&gt; namespace.&lt;/p&gt;

&lt;p&gt;Note here that although Prometheus is deployed in the same cluster as Thanos for simplicity, it sends the metrics to the ingress FQDN, thus it's trivial to extend this setup to multiple, remote clusters and collect their metrics into a single, centralized Thanos receive collector (and a single blob storage), with all metrics correctly tagged and identifiable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observing the stack with Azure Managed Grafana
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/services/managed-grafana/" rel="noopener noreferrer"&gt;Azure Managed Grafana&lt;/a&gt;(AME) is a new offering in the toolset of observability tools in Azure, and it's based on the popular open source dashboarding system &lt;a href="https://grafana.com" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;. Beside out of the box integration with Azure, AME is a fully functional Grafana deployment that can be used to monitor and graph different sources, including Thanos and Prometheus. To start, head to the Azure Portal and deploy AME; then, get the endpoint from the Overview tab and connect to your AME instance.&lt;/p&gt;

&lt;p&gt;Add a new source of type Prometheus and basic authentication (the same we created before):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdlzfv7k5im58qg1kpty.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxdlzfv7k5im58qg1kpty.png" alt="Datasource" width="632" height="826"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations! We can now visualize the data flowing from Prometheus, we only need a dashboard to properly display the data. Go to (on the left side navigation bar) Dashboards-&amp;gt; Browse and click on Import; import the "Kubernetes / Views / Global" (ID: 15757) into your Grafana and you'll be able to see the metrics from the cluster:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjenot2v7cok5vxwu5ud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjenot2v7cok5vxwu5ud.png" alt="Dashboard" width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The imported dashboard has no filter for cluster or region, thus will show all cluster metrics aggregated. We will show in a future post how to add a variable to a Grafana dashboard to properly select and filter cluster views.&lt;/p&gt;

&lt;h3&gt;
  
  
  Future work
&lt;/h3&gt;

&lt;p&gt;This setup allows for autoscaling of receiver and query frontend as horizontal pod autoscalers are deployed and associated with the Thanos components. For even greater scalability and metrics isolation, Thanos can be deployed multiple times (each associated with different storage accounts as needed) each with a different ingress to separate at the source the metrics (thus appearing as separate sources in Grafana, which can then be displayed in the same dashboard, selecting the appropriate source for each graph and query).&lt;/p&gt;

</description>
      <category>grafana</category>
      <category>thanos</category>
      <category>aks</category>
      <category>observability</category>
    </item>
    <item>
      <title>Managing multiple clusters with ArgoCD in Azure/k3s secured w/ Traefik&amp;Let’s Encrypt</title>
      <dc:creator>Alessandro Vozza</dc:creator>
      <pubDate>Fri, 16 Oct 2020 15:51:03 +0000</pubDate>
      <link>https://dev.to/ams0/managing-multiple-clusters-with-argocd-in-azure-k3s-secured-w-traefik-let-s-encrypt-2h8</link>
      <guid>https://dev.to/ams0/managing-multiple-clusters-with-argocd-in-azure-k3s-secured-w-traefik-let-s-encrypt-2h8</guid>
      <description>&lt;p&gt;&lt;em&gt;Easily deploy ArgoCD in Azure to manage multiple clusters in a GitOps way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Code available at &lt;a href="https://github.com/ams0/argocd-azure-k3s-traefik"&gt;https://github.com/ams0/argocd-azure-k3s-traefik&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I heard about &lt;a href="https://argoproj.github.io/argo-cd/"&gt;ArgoCD&lt;/a&gt; many times (recently from my friends at Fullstaq &lt;a href="https://www.meetup.com/Cloud-Native-Kubernetes-Netherlands/events/270457327/"&gt;here&lt;/a&gt;) but never came around to kick its tires until now. If you don’t know, ArgoCD is a platform for declarative continuous deployment of Kubernetes applications, and it’s becoming quickly an exceedingly popular (and now it’s an &lt;a href="https://www.cncf.io/blog/2020/04/07/toc-welcomes-argo-into-the-cncf-incubator/?fbclid=IwAR0uGLZVEJxyAUKAPC5Q4ZlDAt2xbkX-kh9zuXLL4n5i-KUUFPKEI43JWZA"&gt;incubated CNCF project&lt;/a&gt;) choice to deploy and manage applications at scale on &lt;em&gt;multiple&lt;/em&gt; clusters.&lt;/p&gt;

&lt;p&gt;Since I want to use it for deploying to a cluster, my plan is to have an ArgoCD instance outside my clusters that can manage them independently from the clusters’ lifecycle; hence, I devise this method of deploying ArgoCD into a VM in azure running the lightweight distribution of Kubernetes from Rancher Labs, &lt;a href="https://k3s.io/"&gt;k3s&lt;/a&gt; (deployed using the &lt;a href="https://github.com/rancher/k3d"&gt;k3d helper tool&lt;/a&gt;) and exposed via the &lt;a href="https://traefik.io/"&gt;Traefik ingress controller&lt;/a&gt; and secured with Let’s Encrypt certificates. Let’s get to it!&lt;/p&gt;

&lt;p&gt;Start by cloning the &lt;a href="https://github.com/ams0/argocd-azure-k3s-traefik"&gt;repo &lt;/a&gt;and entering it:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone [*https://github.com/ams0/argocd-azure-k3s-traefik](https://github.com/ams0/argocd-azure-k3s-traefik)
cd argocd-azure-k3s-traefik*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now some prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Azure CLI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Azure subscription (already logged in)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;kubectl and jq installed&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s create the infrastructure (one VM with some extra ports open to access the Kubernetes APIs and 80/443 to expose our application):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./deploy.sh &amp;lt;rg&amp;gt; &amp;lt;dns_name&amp;gt; &amp;lt;location&amp;gt; &amp;lt;size&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;em&gt;dns_name&lt;/em&gt; should be unique in the region of choice. After a couple of minutes, you’ll have the &lt;em&gt;config&lt;/em&gt; file for the k3s cluster (of one VM, with two virtual nodes inside as docker containers).&lt;/p&gt;

&lt;p&gt;You may have noticed that I skipped the installation of k3s’ built-in Traefik ingress; that’s because it still packs the 1.7 branch and I want to use the newer 2.x branch that introduced the IngressRoute CRD (you can follow the progress on this &lt;a href="https://github.com/rancher/k3s/issues/1141"&gt;issue&lt;/a&gt;). Let’s now install traefik with this &lt;a href="https://github.com/ams0/argocd-azure-k3s-traefik/blob/main/install-traefik.sh"&gt;script&lt;/a&gt; (you’ll need to pass your email for the Let’sEncrypt certificate authority):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./install-traefik &amp;lt;email&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Finally, install ArgoCD passing the same dns_name/region and a password of your choice:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./install-argo.sh &amp;lt;dns_name&amp;gt; &amp;lt;region&amp;gt; &amp;lt;password&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That’s it! The script will patch argocd-server to run over http (SSL termination is done by Traefik) and will patch the secret with the bcrypt-encoded version of your password. Navigate to &lt;a href="https://dns_name.region.cloudapp.azure.com"&gt;*https://dns_name.region.cloudapp.azure.com&lt;/a&gt;* and login in ArgoCD. You can also &lt;a href="https://github.com/argoproj/argo-cd/releases/download"&gt;download the CLI&lt;/a&gt; and login with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;argocd login --username admin \
--password Password \
dns_name.region.cloudapp.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Finally you can add one or more clusters to be managed by argo with (provided you already have the kubeconfig file available, for example using the azure cli to retrieve it):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;az aks get-credentials -g rg -n *cluster_name* -f kubeconfig
argocd cluster add --kubeconfig  ./kubeconfig manageme
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Note that the last option on the above command line must match the context inside the &lt;em&gt;kubeconfig&lt;/em&gt; configuration.&lt;/p&gt;

&lt;p&gt;Now, let’s deploy some apps!&lt;/p&gt;

&lt;p&gt;I’m a big fan of the &lt;a href="https://github.com/fluxcd/helm-controller"&gt;Helm-controller&lt;/a&gt; project and I wanted to use it with ArgoCD. In a nutshell, the controller lets you create objects of kind &lt;em&gt;HelmRelease *in your cluster (representing Helm releases) and manages the lifecycle of those helm releases programmatically (create/update/destroy). So in the *manifests/&lt;/em&gt; folder in my repository, you’ll find the templates to deploy an helm-controller that will in turn deploy helmreleases also present in the same folder.&lt;/p&gt;

&lt;p&gt;Fork the repository I provided at the top of this post and head over to the Argo UI tocreate a new app (call it to your liking, and choose the default project) pointing to the your fork and to the &lt;em&gt;manifests/&lt;/em&gt; path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eqDSYyPq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3078/1%2Az-ZKpKiNIu85SAKxCZmZSw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eqDSYyPq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3078/1%2Az-ZKpKiNIu85SAKxCZmZSw.png" alt="" width="880" height="1462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Importantly, make sure that directory recurse is on. You can also create an object of type “Application” inside your Argo/k3s cluster to achieve the same result:&lt;/p&gt;



&lt;p&gt;The app will start syncing right away, installing the Helm operator first and then an nginx ingress controller and you’ll see the tree of resources being created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eS2IvCpy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3940/1%2AOMbojl19nbrfb1K1yw6Opg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eS2IvCpy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3940/1%2AOMbojl19nbrfb1K1yw6Opg.png" alt="" width="880" height="641"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it! Now every change to the github repo will be reflected in your cluster.&lt;/p&gt;

&lt;p&gt;In conclusion, you might be tempted to ask: is this production ready? Absolutely not, there are still some aspects of the deployment I want to improve (AAD authentication for Argo, for instance, and persistence of data as well). However, it’s a quickstart with ArgoCD in Azure and it will help me learn more in the future. I hope you enjoy it too!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://medium.com/cooking-with-azure/managing-multiple-clusters-with-argocd-in-azure-k3s-secured-w-traefik-lets-encrypt-2de7daabbefa"&gt;Medium&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>argoproj</category>
      <category>traefik</category>
      <category>kubernetes</category>
      <category>azure</category>
    </item>
    <item>
      <title>Hi, I'm Alessandro Vozza</title>
      <dc:creator>Alessandro Vozza</dc:creator>
      <pubDate>Thu, 01 Jun 2017 14:22:06 +0000</pubDate>
      <link>https://dev.to/ams0/hi-im-alessandro-vozza</link>
      <guid>https://dev.to/ams0/hi-im-alessandro-vozza</guid>
      <description>&lt;p&gt;I have been coding for &lt;em&gt;null&lt;/em&gt; years.&lt;/p&gt;

&lt;p&gt;You can find me on GitHub as &lt;a href="https://github.com/ams0" rel="noopener noreferrer"&gt;ams0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I live in Amsterdam (The Netherlands).&lt;/p&gt;

&lt;p&gt;I work for Microsoft.&lt;/p&gt;

&lt;p&gt;I mostly program in these languages: &lt;em&gt;is Kubernetes a language?&lt;/em&gt; .&lt;/p&gt;

&lt;p&gt;I am currently learning more about node.&lt;/p&gt;

&lt;p&gt;Nice to meet you.&lt;/p&gt;

</description>
      <category>introduction</category>
    </item>
  </channel>
</rss>
