Michael Levan

Posted on Jun 14, 2023

Implementing Open-Source Monitoring and Observability In Kubernetes

#kubernetes #devops #git #docker

When you begin to implement monitoring and observability solutions, you’ll be tasked with many questions, but one primary question is: “Do we go a paid route or an open-source route?”.

In many cases, the “homegrown” route is open-source and the “enterprise” route is the paid route. A paid route will consist of tools like Datadog, New Relic, and App Dynamics. The homegrown route will consist of various open-source tools that make monitoring and observability possible for your environment.

In this blog post, you’ll learn about the homegrown/open-source route.

Grafana

When it comes to real-time monitoring with graphs and alerting, Grafana is a great open-source tool that can monitor any workload you need. Because Grafana is open-source, there’s also the ability to create your own graphs. For example, there’s a graph that was created specifically for monitoring ArgoCD workloads.

You’re going to learn how to combine both Prometheus and Grafana in the next section, so don’t worry about deploying Grafana right now. However, if you decide at some point that you want to deploy Grafana without Prometheus, you can use the following code.

First, add the Helm Chart.

helm repo add grafana https://grafana.github.io/helm-charts

Next, ensure that the Helm Chart is updated.

helm repo update

Lastly, install the Helm Chart.

helm install -n grafana grafana grafana/grafana --create-namespace

To access the dashboard, you can use the port-forward command.

kubectl port-forward -n grafana svc/grafana :80

You should now see that the dashboard is up and running.

Prometheus

When you’re thinking about Metrics, Prometheus is one of the top open-source Metrics tools which falls under the observability category. Remember, Prometheus only handles Metrics, not the whole stack.

There are several ways to deploy Prometheus including with a standard Helm Chart like you saw in the Grafana section. However, there’s something called the kube-prometheus stack, which is a combination of Grafana and Prometheus. The really cool thing about kube-prometheus is that it also comes with best practice dashboards out of the box for any Kubernetes environment.

Essentially it’s packaging up what you’ll have to deploy when using Grafana and Prometheus anyways, so the best path forward would be to do the entire configuration in one shot. That’s what kube-prometheus does for you.

To install kube-prometheus, you can use a Helm Chart.

First, add the repo to your local Helm configuration.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Next, ensure that the repo is updated.

helm repo update

Install the package and put it into the monitoring Namespace.


helm install -n monitoring kube-prometheus prometheus-community/kube-prometheus-stack --create-namespace

Once complete, you can run kubectl get all in the monitoring Namespace and you’ll see that there are several resources deployed.

To access the Grafana dashboard, run the following:

kubectl port-forward -n monitoring svc/kube-prometheus-grafana :80

When you’re logging in for the first time, the following credentials can be used:

Username: admin

Password: prom-operator

After logging in, you can go to the Grafana Dashboards and see that there are several dashboards available out of the box.

Fluentd

When you’re thinking about how to specifically capture logs, there are several log aggregators available. Anything from paid tools to open-source to log capturing mechanisms in almost every cloud (at least the big ones). If you’d like to think about and implement an open-source solution, there’s Fluentd.

To install Fluentd, you can use a Helm Chart.

helm repo add fluent https://fluent.github.io/helm-charts

helm install fluentd fluent/fluentd

One thing to say about Fluentd - although it’s absolutely great in terms of log capturing if you’re using a cloud platform like AWS or Azure and the logs are being captured anyways, it honestly may be a better solution to use what’s available in the cloud. The reason why is that there are a few hoops that you have to jump through to get the logs that are collected from Fluentd to a specific location, whether it’s S3 or Azure Files or another location to store data. After that, you have to parse through it.

You can take a look at the following links to help in the process of sending the logs from Fluentd to a specific location.

Tempo

As part of the observability stack, you’ll want tracing enabled, which is all about telling you the health of an application from an end-to-end perspective.

Tempo is part of the Grafana stack and you can install it with the same Helm Chart.

First, add the Grafana Helm chart if you don’t already have it.

helm repo add grafana https://grafana.github.io/helm-charts

Next, install Tempo from the Grafana Helm repository and add it to the monitoring Namespace where the rest of the stack exists.

helm install tempo grafana/tempo-distributed -n monitoring

You should see an output similar to the screenshot below.

If you run a kubectl get all -n monitoring, you’ll see the Tempo Pods running.

pod/tempo-compactor-5db447f8c9-wzv69                         1/1     Running   0          110s
pod/tempo-distributor-5c6b879865-dv5gp                       1/1     Running   0          110s
pod/tempo-ingester-0                                         1/1     Running   0          110s
pod/tempo-ingester-1                                         1/1     Running   0          110s
pod/tempo-ingester-2                                         1/1     Running   0          110s
pod/tempo-memcached-0                                        1/1     Running   0          110s
pod/tempo-querier-674455845b-gw94n                           1/1     Running   0          110s
pod/tempo-query-frontend-764dcf4699-wdc27                    1/1     Running   0          110s

Loki

The last part of the observability stack is around logging. Logs are arguably the most important piece of observability. They are the make or break between understanding an application or system failure and guessing at problems as you troubleshoot.

Unfortunately, a lot of the time logs are exported somewhere and never looked at. If used properly, logs can be the best tool in your tool belt.

Loki is part of the Grafana stack and you can install it with the same Helm Chart.

First, add the Grafana Helm chart if you don’t already have it.

helm repo add grafana https://grafana.github.io/helm-charts

Next, install Loki from the Grafana Helm repository and add it to the monitoring Namespace where the rest of the stack exists.

helm install loki grafana/loki -n monitoring

You should see an output similar to the screenshot below.

If you run a kubectl get all -n monitoring, you’ll see the Loki Pods running.

pod/loki-backend-0                                           1/1     Running   0          2m50s
pod/loki-backend-1                                           1/1     Running   0          2m50s
pod/loki-backend-2                                           1/1     Running   0          2m49s
pod/loki-canary-26zxc                                        1/1     Running   0          2m50s
pod/loki-canary-rfx9n                                        1/1     Running   0          2m50s
pod/loki-canary-zhcmj                                        1/1     Running   0          2m50s
pod/loki-gateway-5d85cb5d7f-n9h2s                            1/1     Running   0          2m50s
pod/loki-grafana-agent-operator-7b7b8bd969-8hmfh             1/1     Running   0          2m50s
pod/loki-logs-fc4s8                                          2/2     Running   0          2m46s
pod/loki-logs-pdcn7                                          2/2     Running   0          2m46s
pod/loki-logs-sj4nl                                          2/2     Running   0          2m46s
pod/loki-read-5f7dc67977-69wgx                               1/1     Running   0          2m50s
pod/loki-read-5f7dc67977-hffbz                               1/1     Running   0          2m50s
pod/loki-read-5f7dc67977-ssgst                               1/1     Running   0          2m50s
pod/loki-write-0                                             1/1     Running   0          2m50s
pod/loki-write-1                                             1/1     Running   0          2m50s
pod/loki-write-2                                             1/1     Running   0          2m50s

Wrapping Up

To put the tools in this blog post into categories:

Grafana == monitoring/graphs/visuals
Metrics == Prometheus
Logs == Fluentd and Loki (you don’t need to use both. Using just Loki with this stack is a valid choice).
Tracing == Tempo

With proper monitoring and observability, you should have just about everything you need to truly understand what’s happening underneath the hood in your environment.