Observability provides complete insights into the health, performance, and behavior of your Kubernetes cluster and the applications deployed within it. Companies, whether or not they use Kubernetes, have leveraged open-source observability tools like Prometheus, Grafana, Loki, and OpenTelemetry (OTel) to achieve significant improvements in cost, efficiency, and incident response.
For example, companies that reduced observability costs with OpenTelemetry reported notable savings — 84% of these companies saw at least a 10% decrease in costs. A real-world case study shows how Loki helped Paytm Insider save 75% of logging and monitoring costs. Similarly, a 2025 survey by Apica found that nearly half of organizations (48.5%) are already using OpenTelemetry, with another 25.3% planning implementation soon.
Why Observability is Important
Observability — which uses logs, metrics, and traces to provide deep system insights — is particularly crucial for navigating the complexity of modern cloud-native and microservices-based architectures. It helps organizations reduce downtime, increase efficiency, improve developer productivity, and boost revenue.
The setup combining Prometheus, Grafana, Loki, Tempo, Kube-State-Metrics, Node Exporter, and OpenTelemetry offers an open-source alternative to the ELK stack (Elasticsearch, Logstash, and Kibana), providing seamless integration across metrics, logs, and traces. It scales from local development (Minikube) to enterprise-grade clusters, making it cost-effective and easy to adopt.
In this blog post, we will understand the open source observability setup and deploy it. At the end, we'll deploy a sample Java application to demonstrate how to collect logs, metrics, and traces in action.
Understanding the Observability Setup
Let's dive into the observability setup and clearly understand the role of each component.
- Prometheus: A time-series monitoring system used to collect metrics from Kubernetes components and services. It supports powerful querying and alerting.
- Kube-State-Metrics: An add-on service that generates detailed metrics about the state of Kubernetes objects like deployments, pods, and nodes. These metrics are consumed by Prometheus.
- Node Exporter: A Prometheus exporter that exposes hardware and OS metrics from your Kubernetes nodes.
- Grafana: A visualization and analytics tool that connects to Prometheus and other data sources to display real-time dashboards for your metrics.
- Loki: A log aggregation system from Grafana Labs that works seamlessly with Prometheus and Grafana. It collects logs from your Kubernetes workloads and enables easy correlation with metrics.
- Tempo: A distributed tracing backend used to collect and visualize traces. It helps in tracking requests as they flow through different services, enabling root-cause analysis.
- OpenTelemetry (OTel): A collection of tools, APIs, and SDKs for collecting telemetry data (traces, metrics, and logs) from your applications. It standardizes observability data collection.
Prerequisites
- Minikube — used to set up a local Kubernetes cluster
- Helm — the package manager for Kubernetes
- App Repo — the test application we will clone
Step 1: Installing Prometheus
Once you clone the repository, change directory to the observability folder and run the command below. A Prometheus Helm chart with custom config is included to get labels of all the applications to be deployed in Minikube.
Note: The ConfigMap is configured to enable a limited set of metrics, but you can enable any metrics from the Prometheus configuration docs as required.
helm upgrade --install prometheus prometheus-helm
Step 2: Install kube-state-metrics and Node Exporter
helm install kube-state-metrics prometheus-community/kube-state-metrics
helm install node-exporter prometheus-community/prometheus-node-exporter
Once both steps are completed successfully and the pods are up and running, verify that all targets are green in Prometheus by port-forwarding the service:
kubectl port-forward service/prometheus-service -n monitoring 9090:9090
Then, access Prometheus at http://localhost:9090.
To confirm metrics are populating, run the following queries:
kube_pod_info
node_cpu_seconds_total
Step 3: Installing Grafana
helm install grafana grafana/grafana --namespace monitoring
After the Grafana pods are in the Running state, port-forward the Grafana service and retrieve the login credentials from the Grafana secret.
Access the UI at http://localhost:3000, then use the fetched credentials to log in.
Navigate to Connections → Data Sources → Add data source. Set the name to prometheus and the connection URL to:
http://prometheus-service.monitoring.svc.cluster.local:9090
Save and exit.
To verify the metrics, go to the Explore section and run the query below. You will see a time series showing the memory utilisation of all running pods:
avg(container_memory_usage_bytes{pod=~".*"}) by (pod) / (1024 * 1024)
Step 4: Install Loki and Tempo
Run the following commands and wait until all pods are in the Running state:
helm upgrade --install loki -f loki.yaml grafana/loki-stack --namespace monitoring
helm upgrade --install tempo -f tempo.yaml grafana/tempo --namespace monitoring
📄 Note: You can find
loki.yamlandtempo.yamlin the Git repository. Promtail in the Loki configuration allows you to parse log lines into labels. Refer to the Promtail stages docs on how to extract labels.
Once the pods are ready, follow the same steps used for Prometheus to add Loki and Tempo as data sources in Grafana:
-
Loki URL:
http://loki.monitoring.svc.cluster.local:3100 -
Tempo URL:
http://tempo.monitoring.svc.cluster.local:3100
To view logs, go to Explore in Grafana, select Loki as the datasource, and run the following query to fetch logs from all namespaces:
{namespace=~".+"} |= ``
Step 5: Install OpenTelemetry and Sample Application
Run the following commands to install OpenTelemetry:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm upgrade --install opentelemetry-collector open-telemetry/opentelemetry-collector \
--namespace monitoring
Once the OpenTelemetry pods are in the Running state, update the sample application's Helm chart to include an init container for trace collection.
To deploy the application, run:
helm upgrade --install calc helm-chart/ --namespace monitoring
In the deployment.yaml file of the Helm chart, you'll find the following init container configuration:
initContainers:
- name: opentelemetry-auto-instrumentation
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
command: ["cp", "/javaagent.jar", "/otel-auto-instrumentation/javaagent.jar"]
volumeMounts:
- mountPath: /otel-auto-instrumentation
name: opentelemetry-auto-instrumentation
To generate traces, port-forward the application's service and interact with the app using some inputs to generate trace data. To view traces, navigate to the Explore page in Grafana, select Tempo as the datasource, and run the query:
{}
Why Use This Stack Over ELK?
All these tools together provide a modern, cloud-native, cost-efficient, and tightly integrated observability solution compared to the traditional ELK stack. Key advantages include:
- Native support for metrics, logs, and traces: A unified experience and correlation across telemetry types (ELK is primarily log-centric).
- Lower resource & storage cost: Loki indexes only metadata (labels), not full log content, making it lighter and cheaper to operate.
- Better scalability & resilience in cloud/Kubernetes environments: These tools are built for distributed, elastic infrastructure.
- OpenTelemetry compatibility & vendor neutrality: Instrumentation is portable and standards-based.
- Operational simplicity & lower overhead: Fewer cluster tuning demands, simpler scaling, and less JVM burden compared to Elasticsearch.
Final Words
You cannot fix what you cannot see. With the sheer amount of data and complexity in modern tech, having a proper observability system in place is critical. The primary aim of this guide was to establish full-stack observability for a Kubernetes cluster by enabling metrics, logs, and traces using Prometheus, Loki, Tempo, and OpenTelemetry — and finally visualizing them with Grafana.
With this setup, you can now monitor, visualize, and troubleshoot applications in real time using metrics, logs, and traces all in one unified observability stack. This not only enhances visibility into the cluster's health and performance but also enables faster root cause analysis and proactive incident response, aligning with modern DevOps and SRE practices.
Top comments (0)