Improving

Posted on Mar 18 • Originally published at improving.com

End-to-End Observability with Prometheus, Grafana, Loki, OpenTelemetry and Tempo

#devops #kubernetes #microservices #monitoring

Observability provides complete insights into the health, performance, and behavior of your Kubernetes cluster and the applications deployed within it. Companies, whether or not they use Kubernetes, have leveraged open-source observability tools like Prometheus, Grafana, Loki, and OpenTelemetry (OTel) to achieve significant improvements in cost, efficiency, and incident response.

For example, companies that reduced observability costs with OpenTelemetry reported notable savings — 84% of these companies saw at least a 10% decrease in costs. A real-world case study shows how Loki helped Paytm Insider save 75% of logging and monitoring costs. Similarly, a 2025 survey by Apica found that nearly half of organizations (48.5%) are already using OpenTelemetry, with another 25.3% planning implementation soon.

Why Observability is Important

Observability — which uses logs, metrics, and traces to provide deep system insights — is particularly crucial for navigating the complexity of modern cloud-native and microservices-based architectures. It helps organizations reduce downtime, increase efficiency, improve developer productivity, and boost revenue.

The setup combining Prometheus, Grafana, Loki, Tempo, Kube-State-Metrics, Node Exporter, and OpenTelemetry offers an open-source alternative to the ELK stack (Elasticsearch, Logstash, and Kibana), providing seamless integration across metrics, logs, and traces. It scales from local development (Minikube) to enterprise-grade clusters, making it cost-effective and easy to adopt.

In this blog post, we will understand the open source observability setup and deploy it. At the end, we'll deploy a sample Java application to demonstrate how to collect logs, metrics, and traces in action.

Understanding the Observability Setup

Let's dive into the observability setup and clearly understand the role of each component.

Prometheus: A time-series monitoring system used to collect metrics from Kubernetes components and services. It supports powerful querying and alerting.
Kube-State-Metrics: An add-on service that generates detailed metrics about the state of Kubernetes objects like deployments, pods, and nodes. These metrics are consumed by Prometheus.
Node Exporter: A Prometheus exporter that exposes hardware and OS metrics from your Kubernetes nodes.
Grafana: A visualization and analytics tool that connects to Prometheus and other data sources to display real-time dashboards for your metrics.
Loki: A log aggregation system from Grafana Labs that works seamlessly with Prometheus and Grafana. It collects logs from your Kubernetes workloads and enables easy correlation with metrics.
Tempo: A distributed tracing backend used to collect and visualize traces. It helps in tracking requests as they flow through different services, enabling root-cause analysis.
OpenTelemetry (OTel): A collection of tools, APIs, and SDKs for collecting telemetry data (traces, metrics, and logs) from your applications. It standardizes observability data collection.

Prerequisites

Minikube — used to set up a local Kubernetes cluster
Helm — the package manager for Kubernetes
App Repo — the test application we will clone

Step 1: Installing Prometheus

Once you clone the repository, change directory to the observability folder and run the command below. A Prometheus Helm chart with custom config is included to get labels of all the applications to be deployed in Minikube.

Note: The ConfigMap is configured to enable a limited set of metrics, but you can enable any metrics from the Prometheus configuration docs as required.

helm upgrade --install prometheus prometheus-helm

Step 2: Install kube-state-metrics and Node Exporter

helm install kube-state-metrics prometheus-community/kube-state-metrics
helm install node-exporter prometheus-community/prometheus-node-exporter

Once both steps are completed successfully and the pods are up and running, verify that all targets are green in Prometheus by port-forwarding the service:

kubectl port-forward service/prometheus-service -n monitoring 9090:9090

Then, access Prometheus at http://localhost:9090.

To confirm metrics are populating, run the following queries:

kube_pod_info
node_cpu_seconds_total

Step 3: Installing Grafana

helm install grafana grafana/grafana --namespace monitoring

After the Grafana pods are in the Running state, port-forward the Grafana service and retrieve the login credentials from the Grafana secret.

Access the UI at http://localhost:3000, then use the fetched credentials to log in.

Navigate to Connections → Data Sources → Add data source. Set the name to prometheus and the connection URL to:

http://prometheus-service.monitoring.svc.cluster.local:9090

Save and exit.

To verify the metrics, go to the Explore section and run the query below. You will see a time series showing the memory utilisation of all running pods:

avg(container_memory_usage_bytes{pod=~".*"}) by (pod) / (1024 * 1024)

Step 4: Install Loki and Tempo

Run the following commands and wait until all pods are in the Running state:

helm upgrade --install loki -f loki.yaml grafana/loki-stack --namespace monitoring
helm upgrade --install tempo -f tempo.yaml grafana/tempo --namespace monitoring

📄 Note: You can find loki.yaml and tempo.yaml in the Git repository. Promtail in the Loki configuration allows you to parse log lines into labels. Refer to the Promtail stages docs on how to extract labels.

Once the pods are ready, follow the same steps used for Prometheus to add Loki and Tempo as data sources in Grafana:

Loki URL: http://loki.monitoring.svc.cluster.local:3100
Tempo URL: http://tempo.monitoring.svc.cluster.local:3100

To view logs, go to Explore in Grafana, select Loki as the datasource, and run the following query to fetch logs from all namespaces:

{namespace=~".+"} |= ``

Step 5: Install OpenTelemetry and Sample Application

Run the following commands to install OpenTelemetry:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install opentelemetry-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring

Once the OpenTelemetry pods are in the Running state, update the sample application's Helm chart to include an init container for trace collection.

To deploy the application, run:

helm upgrade --install calc helm-chart/ --namespace monitoring

In the deployment.yaml file of the Helm chart, you'll find the following init container configuration:

initContainers:
  - name: opentelemetry-auto-instrumentation
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
    command: ["cp", "/javaagent.jar", "/otel-auto-instrumentation/javaagent.jar"]
    volumeMounts:
      - mountPath: /otel-auto-instrumentation
        name: opentelemetry-auto-instrumentation

To generate traces, port-forward the application's service and interact with the app using some inputs to generate trace data. To view traces, navigate to the Explore page in Grafana, select Tempo as the datasource, and run the query:

{}

Why Use This Stack Over ELK?

All these tools together provide a modern, cloud-native, cost-efficient, and tightly integrated observability solution compared to the traditional ELK stack. Key advantages include:

Native support for metrics, logs, and traces: A unified experience and correlation across telemetry types (ELK is primarily log-centric).
Lower resource & storage cost: Loki indexes only metadata (labels), not full log content, making it lighter and cheaper to operate.
Better scalability & resilience in cloud/Kubernetes environments: These tools are built for distributed, elastic infrastructure.
OpenTelemetry compatibility & vendor neutrality: Instrumentation is portable and standards-based.
Operational simplicity & lower overhead: Fewer cluster tuning demands, simpler scaling, and less JVM burden compared to Elasticsearch.

Final Words

You cannot fix what you cannot see. With the sheer amount of data and complexity in modern tech, having a proper observability system in place is critical. The primary aim of this guide was to establish full-stack observability for a Kubernetes cluster by enabling metrics, logs, and traces using Prometheus, Loki, Tempo, and OpenTelemetry — and finally visualizing them with Grafana.

With this setup, you can now monitor, visualize, and troubleshoot applications in real time using metrics, logs, and traces all in one unified observability stack. This not only enhances visibility into the cluster's health and performance but also enables faster root cause analysis and proactive incident response, aligning with modern DevOps and SRE practices.

DEV Community