How to monitor multi Kubernetes cluster from an external Prometheus server

The goal of this article is to explore the best method for running Prometheus outside the monitored Kubernetes (k8s) cluster, or to determine any additional development needed for this. Running monitoring software outside the monitored stack is important to ensure access during cluster outages. Additionally, a centralized Prometheus setup is beneficial for monitoring multiple clusters. Acceptable solutions include: configuring Prometheus against the Kubernetes API using host and client certificate data, running a proxy inside the cluster for token management and network access, or providing documentation on using the current Prometheus kubernetes_sd_configs option to achieve similar results.

In the context of monitoring Kubernetes clusters, it’s often necessary for the monitoring server to communicate with the internal network of the Kubernetes clusters to gather metrics, logs, and other data. However, exposing the Kubernetes internal network to the monitoring server can introduce several challenges:

Note:

IP Range Conflicts:
— Multiple Clusters: When multiple Kubernetes clusters are being monitored, it’s common for them to use overlapping or identical IP address ranges for their internal networks. This overlap complicates network routing because the monitoring server wouldn’t be able to differentiate between similar IP addresses in different clusters.
— Routing Issues:** Properly routing traffic to the correct cluster becomes complex and prone to errors when IP ranges overlap. Network configurations would need to be highly customized and maintained to handle these overlaps, increasing the risk of misrouting and the administrative overhead.
Geographic and Network Latency:
— Different Locations/Zones: The monitoring server might be located in a different physical location or network zone compared to the Kubernetes clusters. For instance, the monitoring server could be in a different data center, region, or even cloud provider.
— Latency Issues:** Accessing the Kubernetes internal network across different locations can introduce significant latency. Network latency impacts the performance and responsiveness of the monitoring setup, which is crucial for timely alerting and analysis. High latency can cause delays in data collection, leading to outdated or less accurate monitoring data.
Network Reliability: Cross-zone or cross-region communication may also be subject to network reliability issues, further affecting the consistency and reliability of monitoring data.
Alternative Solutions
To address these issues, alternative approaches can be considered:

Federated Monitoring:
Deploy monitoring agents within each Kubernetes cluster. These agents collect and pre-process the data locally before sending it to a central monitoring server or database. This approach avoids the need to expose the entire internal network and mitigates IP range conflicts.

Proxy or Gateway Solutions: Use a proxy or gateway that securely bridges the monitoring server and the Kubernetes clusters. This can help manage routing more effectively and ensure secure communication without direct network exposure.
Service Meshes: Implement a service mesh (like Istio or Linkerd) to handle cross-cluster communication securely and efficiently. Service meshes provide advanced routing, security, and observability features that can simplify multi-cluster monitoring.
VPNs or VPC Peering: Establish VPNs or VPC peering connections with careful network planning to avoid IP conflicts. These connections can securely extend the network but require proper configuration to manage latency and routing complexities. By addressing IP conflicts and latency issues through these alternative solutions, monitoring servers can effectively collect data from multiple Kubernetes clusters without the drawbacks of direct network exposure.

Prometheus federation is a powerful feature that allows for scalable and centralized metric collection across multiple Prometheus instances, making it ideal for large, distributed systems. Here’s a summary of how it works and its typical topology:

Local Prometheus Servers (Leaf Nodes):
— Deployment: Each Kubernetes cluster or monitoring domain has a local Prometheus server to scrape metrics from local targets like application instances, nodes, and Kubernetes components.
— Advantages: Local servers operate independently, reducing the load on any single server and providing redundancy. They collect detailed metrics at high resolution (short intervals).
Federated Prometheus Servers (Aggregation Layer):
— Deployment: These servers aggregate metrics from the local Prometheus servers and act as middle-tier nodes.
— Scraping Configuration: Configured to scrape a subset of metrics from local servers by defining specific scrape jobs.
— Query Efficiency: Typically collect downsampled or summarized metrics to reduce data volume and query load, possibly scraping at longer intervals and applying aggregations.
Global Prometheus Servers (Root Nodes):
— Deployment: A global server aggregates metrics from federated servers, providing a centralized view of the entire infrastructure.
— Centralized Monitoring: Enables centralized querying, alerting, and dashboarding, useful for high-level overviews and global alerting.
Key Considerations

Label Management:** Proper label management is essential to indicate the source of metrics, such as cluster name or region, for better querying and visualization.

Scrape Intervals: Configured thoughtfully at different hierarchy levels. Local servers might scrape every 15 seconds, while federated servers might scrape every minute to reduce load.
Downsampling and Aggregation: Federated servers can perform these to reduce data volume sent upstream, improving storage efficiency and query performance.
Alerting: Alerts can be managed at various levels. Local servers handle local alerts, while federated or global servers handle higher-level aggregated alerts to ensure relevance and reduce noise.
Example topology
Large Scale Environments: Helps manage and scale monitoring in environments with many clusters or data centers.
Centralized Monitoring: Suitable for organizations requiring a centralized metric view across various locations or departments.
Data Aggregation: Useful for aggregating data and reducing metric granularity as they move up the hierarchy, saving storage, and improving query performance.

Prometheus federation allows for efficient monitoring of large, distributed systems by using a hierarchical topology of servers, balancing the load, managing data volume, and maintaining a centralized view for effective monitoring and alerting.

Main link : 
https://prometheus.io/docs/prometheus/latest/federation/#use-cases

To monitor a Kubernetes cluster from an external Prometheus server using a federation topology, follow these steps to set up a scalable and secure monitoring solution:

Steps to Set Up Prometheus Federation

Install Node Exporter and Prometheus in the Kubernetes Cluster:
— Node Exporter:** Deploy “node-exporter” pods within your Kubernetes cluster to collect metrics about the nodes.
— Prometheus Instance: Set up a Prometheus server within the cluster. Configure it with short-term storage to handle the metrics collection and storage needs of the cluster.
Expose Prometheus Service Externally:
— Expose Prometheus Service: Make the internal Prometheus service accessible from outside the Kubernetes cluster. You can achieve this through:
— Ingress Controller (Load Balancer): Use an ingress controller to route external traffic to your Prometheus instance. This setup can be more flexible and scalable.
— Node Port: Alternatively, expose the service via a node port, allowing external access through a specific port on the node IPs.
— Secure the Endpoint: Protect the external endpoint with HTTPS to encrypt data in transit and use basic authentication to restrict access. This ensures that only authorized users can access the Prometheus metrics.
Configure the External Prometheus Server:
— Scrape Configuration: Configure your external Prometheus server (the central Prometheus) to scrape metrics from the exposed endpoint of the Prometheus instance within the Kubernetes cluster.
— Authentication and Tags: Ensure that the external Prometheus server is set up with the correct authentication credentials and tags to identify the source of the metrics. This setup allows the central Prometheus server to securely and accurately collect metrics from the Kubernetes cluster.

Scalable Monitoring Solution

Adding More Clusters: To monitor additional Kubernetes clusters, repeat the setup process for each new cluster. Install node-exporter, deploy a Prometheus instance, expose the service, and configure the external Prometheus server to scrape the new endpoint.

Scaling Central Prometheus: If the number of clusters grows beyond the capacity of a single central Prometheus server, you can: — Add Another Central Prometheus: Deploy additional central Prometheus instances to handle the increased load and aggregate metrics from the various clusters. — Federation Hierarchy: Implement a hierarchical federation topology where multiple central Prometheus servers aggregate metrics from other central Prometheus servers, creating a scalable and efficient monitoring setup. And finally
Add more clusters: To monitor additional Kubernetes clusters, repeat the setup process for each new cluster. Install node-exporter, deploy a Prometheus instance, expose the service, and configure the external Prometheus server to scrape the new endpoint.
Scaling Central Prometheus: If the number of clusters exceeds the capacity of a central Prometheus server, you can: — Add another central Prometheus: Deploy additional central Prometheus instances to handle increased load and aggregate metrics from different clusters. — Federation Hierarchy: Implement a hierarchical federation topology where multiple central Prometheus servers collect metrics from other central Prometheus servers, creating a scalable and efficient monitoring setup.

After install kube-state-metrics with helm on kubernetes cluster

Correct YAML Configuration:

mahmoudi@master1:~$ sudo cat kube-state-nodeport.yaml 

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: kube-state-metrics
    meta.helm.sh/release-namespace: default
    prometheus.io/scrape: "true"
  creationTimestamp: "2024-07-29T05:29:29Z"
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: kube-state-metrics
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: kube-state-metrics
    app.kubernetes.io/version: 2.12.0
    helm.sh/chart: kube-state-metrics-5.19.1
  name: kube-state-metrics
  namespace: default
  resourceVersion: "166572251"
  uid: 9135babd-d36a-4c5e-8181-35b4a2340f1c
spec:
  # clusterIP: 10.233.3.71
  # clusterIPs:
  # - 10.233.3.71
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
    nodePort: 32080

  selector:
    app.kubernetes.io/instance: kube-state-metrics
    app.kubernetes.io/name: kube-state-metrics
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

mahmoudi@master1:~$ kubectl delete svc kube-state-metrics -n kube-system OR (default)
mahmoudi@master1:~$ kubectl apply -f kube-state-metrics-nodeport.yaml
mahmoudi@master1:~$ kubectl get svc kube-state-metrics -n kube-system






`
Now

edit prometheus.yml file

add this black code

# Here it's Prometheus itself.
scrape_configs:
- job_name: 'kube-state-metrics'
metrics_path: /metrics
scheme: http
static_configs:
targets: ['192.168.33.200:32080']`