DEV Community

Abhay Singh Kathayat
Abhay Singh Kathayat

Posted on

Kubernetes Metrics and Monitoring with Prometheus and Grafana

Kubernetes Metrics and Monitoring with Prometheus and Grafana

Monitoring a Kubernetes cluster is critical for ensuring application performance, resource optimization, and identifying potential issues. Prometheus and Grafana are widely used open-source tools for collecting and visualizing Kubernetes metrics.

This article covers the essentials of monitoring Kubernetes using Prometheus and Grafana, from setup to best practices.


Overview of Prometheus and Grafana

Prometheus

  • A time-series database and monitoring tool.
  • Collects metrics from applications and Kubernetes components using an HTTP pull model.
  • Features a powerful query language called PromQL for analyzing metrics.

Grafana

  • A visualization tool for creating interactive dashboards.
  • Integrates seamlessly with Prometheus for displaying Kubernetes metrics.

Kubernetes Metrics to Monitor

Key metrics to monitor in Kubernetes include:

  1. Node Metrics: CPU, memory, and disk usage for cluster nodes.
  2. Pod Metrics: Resource utilization by individual Pods.
  3. Container Metrics: CPU and memory usage by containers.
  4. Cluster Metrics: Overall health, such as the number of running Pods and nodes.
  5. Network Metrics: Data transfer rates and error counts.
  6. Application Metrics: Custom metrics from application code.

Setting Up Prometheus and Grafana

1. Prerequisites

  • A running Kubernetes cluster.
  • kubectl configured to interact with your cluster.
  • Helm installed for deploying Prometheus and Grafana.

2. Install Prometheus and Grafana with Helm

Helm charts simplify the installation process for Prometheus and Grafana.

Step 1: Add the Helm Repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

Step 2: Install the kube-prometheus-stack
The kube-prometheus-stack chart includes Prometheus, Grafana, and related monitoring components.

helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
Enter fullscreen mode Exit fullscreen mode

3. Access Prometheus and Grafana

  • Prometheus: Port-forward the Prometheus server to your local machine.
  kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090
Enter fullscreen mode Exit fullscreen mode

Access Prometheus at http://localhost:9090.

  • Grafana: Port-forward the Grafana service.
  kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
Enter fullscreen mode Exit fullscreen mode

Access Grafana at http://localhost:3000. Use the default credentials (admin / prom-operator).


Configuring Dashboards in Grafana

1. Add Prometheus as a Data Source

  • Navigate to Configuration > Data Sources in Grafana.
  • Select Prometheus and provide the URL of the Prometheus server (e.g., http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local).

2. Import Pre-Built Dashboards

Grafana provides pre-built dashboards for Kubernetes metrics:

  • Go to Dashboards > Import.
  • Use dashboard IDs like:
    • 3119 for Kubernetes cluster monitoring.
    • 6417 for node exporter statistics.
  • Download dashboards from Grafana’s dashboard library.

3. Customize Dashboards

Create custom dashboards tailored to your needs using PromQL queries.

Example PromQL Queries:

  • CPU Usage by Node:
  sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)
Enter fullscreen mode Exit fullscreen mode
  • Memory Usage by Pod:
  sum(container_memory_usage_bytes{container!="POD",pod!=""}) by (pod)
Enter fullscreen mode Exit fullscreen mode

Setting Up Alerts

Prometheus and Grafana support alerting for critical events:

1. Alerts in Prometheus

Define alerts in Prometheus using rules.

Example: Alert for High CPU Usage

groups:
  - name: node-alerts
    rules:
      - alert: HighCPUUsage
        expr: node_cpu_seconds_total{mode="idle"} < 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "Node {{ $labels.instance }} has CPU usage > 90% for the last 2 minutes."
Enter fullscreen mode Exit fullscreen mode

Save the alert in a file and reload Prometheus.

2. Alerts in Grafana

  • Navigate to Alerting > Notification Channels in Grafana.
  • Create alerts based on dashboard panels and set notification channels (e.g., email, Slack, PagerDuty).

Best Practices for Kubernetes Monitoring

1. Monitor Key Metrics

Focus on resource usage, cluster health, and application performance to detect and resolve issues early.

2. Use Resource Limits and Requests

Define CPU and memory limits/requests in your Pod specs for accurate monitoring and scaling.

3. Enable Persistent Storage

Use persistent storage for Prometheus to retain historical metrics after restarts.

4. Secure Access

  • Use Role-Based Access Control (RBAC) for Prometheus and Grafana.
  • Enable HTTPS for secure communication.

5. Optimize Retention Period

Adjust the Prometheus retention period to balance storage requirements with historical data needs.

6. Integrate Logging

Combine Prometheus and Grafana with logging tools like Elasticsearch and Fluentd for comprehensive observability.


Challenges and Considerations

  1. Storage Overhead: Prometheus can consume significant storage for metrics; optimize retention policies.
  2. Scaling: Use Thanos or Cortex to scale Prometheus for larger clusters.
  3. Complex Dashboards: Avoid overly complex Grafana dashboards that can impact performance.

Conclusion

Prometheus and Grafana provide robust tools for monitoring Kubernetes clusters, enabling teams to track resource usage, optimize performance, and respond proactively to issues. By integrating these tools and following best practices, you can ensure a well-monitored and efficient Kubernetes environment.


Top comments (0)