DEV Community

Sergey Zhekpisov
Sergey Zhekpisov

Posted on • Edited on

1

Memorious Prometheus

Recently, we received alerts in Alertmanager, deployed with a kube-stack-prometheus Helm chart. The alert stated that 50% of the EKS endpoints for "apiserver/kubernetes" were down.

50% of the apiserver/kubernetes targets in the default namespace are down.

A brief look at Prometheus revealed that there were four(!) targets for the serviceMonitor/monitoring/prometheus-operator-monito-apiserver/0 endpoint - two were down, and two were up. Upon examining other clusters, it became clear that there are normally only two targets for each cluster.

So, it turns out that the EKS Control Plane was updated during the night, and the apiserver endpoints received new IP addresses. However, the Prometheus scraper retained old IP addresses in its database.

Solution was simple:

kubectl rollout restart statefulset prometheus-prometheus-operator-monito-prometheus -n monitoring

...and the old targets that were "down" disappeared, and the alert was resolved.

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay