DEV Community

Cover image for Setup Prometheus and Grafana with existing EKS Fargate cluster - Monitoring
Nowsath for AWS Community Builders

Posted on • Updated on

Setup Prometheus and Grafana with existing EKS Fargate cluster - Monitoring

In this article, I will delineate the fundamental steps for configuring Prometheus and Grafana within the existing EKS Fargate cluster, along with the establishment of custom metrics. These measures are commonly utilized for monitoring and alerting purposes.

Steps to follow.

  1. Configure Node Groups
  2. Install AWS EBS CSI driver
  3. Install Prometheus
  4. Install Grafana

1. Configure Node Groups

Given the absence of any pre-existing node groups, let's proceed to create a new one.

i. Create an IAM Role for EC2 worker nodes

Go to the AWS IAM console, create a role 'AmazonEKSWorkerNodeRole' with following 3 AWS managed policies.

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKS_CNI_Policy
  • AmazonEKSWorkerNodePolicy

AWS IAM console

ii. Create Node groups

Navigate to the AWS EKS console, add node group to the cluster. It is imperative to utilize EC2 instances for Prometheus and Grafana, as both applications require volumes to be mounted on them.

When configuring the node group for the cluster, take the following into consideration,

Node IAM role: Select created role in the previous step (AmazonEKSWorkerNodeRole)
Instance type: Select based on your requirements (t3.small in my case)
Subnets: The private subnets within the VPC where the EKS cluster is located. If you want to enable remote access to node then you need to use public subnets.

In my configuration, I opted for a t3.small instance with a desired size of 1. The setup worked without any issues.

To verify the proper functioning of the EC2 worker nodes, execute the following command. The output should indicate that a pod is currently running.

$ k get po -l k8s-app=aws-node -n kube-system
NAME             READY   STATUS    RESTARTS   AGE
aws-node-hbvz2   1/1     Running   0          58m
Enter fullscreen mode Exit fullscreen mode

2. Install AWS EBS CSI driver

The Amazon EBS CSI driver manages the lifecycle of Amazon EBS volumes as storage for the Kubernetes Volumes that you create.

The Amazon EBS CSI driver makes Amazon EBS volumes for these types of Kubernetes volumes: generic ephemeral volumes and persistent volumes.

Prometheus and Grafana require persistent storage, commonly referred to as PV (Persistent Volume) in Kubernetes terminology, to be attached to them.

i. Create AWS EBS CSI driver IAM role and associate to service account

Create a service account named 'ebs-csi-controller-sa' and associate it with AWS managed IAM policies. This service account will be utilized during the installation of the AWS EBS CSI driver.

Replace 'my-cluster' with the name of your cluster.

eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-cluster \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --override-existing-serviceaccounts --approve
Enter fullscreen mode Exit fullscreen mode

If you don't specify the role name it will assign a name automatically.

ii. Add Helm repositories

We will use Helm to install the components required to run Prometheus and Grafana.

helm repo add aws-ebs-csi-driver https://kubernetes-sigs.github.io/aws-ebs-csi-driver
helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Enter fullscreen mode Exit fullscreen mode

ii. Installing aws-ebs-csi-driver

Following the addition of the new Helm repository, proceed to install the AWS EBS CSI driver using the below helm command.

Replace the region 'eu-north-1' with your cluster region.

helm upgrade --install aws-ebs-csi-driver \
  --namespace kube-system \
  --set controller.region=eu-north-1 \
  --set controller.serviceAccount.create=false \
  --set controller.serviceAccount.name=ebs-csi-controller-sa \
  aws-ebs-csi-driver/aws-ebs-csi-driver
Enter fullscreen mode Exit fullscreen mode

3. Install Prometheus

For persistent storage of scraped metrics and configurations, Prometheus leverages two EBS volumes: one dedicated to the prometheus-server pod and another for the prometheus-alertmanager pod.

i. Create a namespace for Prometheus

Create a namespace called 'prometheus'.
kubectl create namespace prometheus

ii. Set availability Zone and create storage class

There are two options for storage class:

  • Create a storage class in your worker node's Availability Zone (AZ).

  • Use the default storage class (proceed to step iii).

Get the Availability Zone of one of the worker nodes:

EBS_AZ=$(kubectl get nodes \
  -o=jsonpath="{.items[0].metadata.labels['topology\.kubernetes\.io\/zone']}")
Enter fullscreen mode Exit fullscreen mode

Create a storage class:

echo "
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: prometheus
  namespace: prometheus
provisioner: ebs.csi.aws.com
parameters:
  type: gp2
reclaimPolicy: Retain
allowedTopologies:
- matchLabelExpressions:
  - key: topology.ebs.csi.aws.com/zone
    values:
    - $EBS_AZ
" | kubectl apply -f -
Enter fullscreen mode Exit fullscreen mode

iii. Installing Prometheus

First download the Helm values for Prometheus file:

wget https://github.com/aws-samples/containers-blog-maelstrom/raw/main/fargate-monitoring/prometheus_values.yml
Enter fullscreen mode Exit fullscreen mode

Moreover, if you wish to configure custom metrics endpoints, include those details under the 'extraScrapeConfigs:|' section in the prometheus_values.yml file, as demonstrated here.

extraScrapeConfigs: |
  - job_name: 'api-svc'
    metrics_path: /metrics
    scheme: http
    static_configs:
      - targets: ['api-svc.api-dev.svc.cluster.local:5557']
  - job_name: 'apps-svc'
    metrics_path: /metrics
    scheme: http
    static_configs:
      - targets: ['apps-svc.api-dev.svc.cluster.local:5559']
Enter fullscreen mode Exit fullscreen mode

Run Helm command to install Prometheus (for worker node's AZ storage class option):

helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
  --namespace prometheus --version 15
Enter fullscreen mode Exit fullscreen mode

Run Helm command to install Prometheus (for default storage class option):

helm upgrade -i prometheus -f prometheus_values.yml prometheus-community/prometheus \
  --namespace prometheus \
  --set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2" \
  --version 15
Enter fullscreen mode Exit fullscreen mode

Important Note: The default storage class has a reclaim policy set to "Delete" Consequently, any EBS volumes used by Prometheus will be automatically deleted when you remove Prometheus itself.

Once Helm installation is completed, let's verify the resources.

$ k get all -n prometheus
NAME                                                 READY   STATUS    RESTARTS   AGE
pod/prometheus-alertmanager-c7644896-7kfjq           2/2     Running   0          103m
pod/prometheus-kube-state-metrics-8476bdcc64-wng4m   1/1     Running   0          103m
pod/prometheus-node-exporter-8hf57                   1/1     Running   0          103m
pod/prometheus-pushgateway-665779d98f-v8q5d          1/1     Running   0          103m
pod/prometheus-server-6fd8bc8576-wwmvw               2/2     Running   0          103m

NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/prometheus-alertmanager         ClusterIP   172.20.84.40     <none>        80/TCP         103m
service/prometheus-kube-state-metrics   ClusterIP   172.20.192.129   <none>        8080/TCP       103m
service/prometheus-node-exporter        ClusterIP   None             <none>        9100/TCP       103m
service/prometheus-pushgateway          ClusterIP   172.20.181.13    <none>        9091/TCP       103m
service/prometheus-server               NodePort    172.20.167.19    <none>        80:30900/TCP   103m

NAME                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-node-exporter   1         1         1       1            1           <none>          103m

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-alertmanager         1/1     1            1           103m
deployment.apps/prometheus-kube-state-metrics   1/1     1            1           103m
deployment.apps/prometheus-pushgateway          1/1     1            1           103m
deployment.apps/prometheus-server               1/1     1            1           103m

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-alertmanager-c7644896           1         1         1       103m
replicaset.apps/prometheus-kube-state-metrics-8476bdcc64   1         1         1       103m
replicaset.apps/prometheus-pushgateway-665779d98f          1         1         1       103m
replicaset.apps/prometheus-server-6fd8bc8576               1         1         1       103m

Enter fullscreen mode Exit fullscreen mode

The chart creates two persistent volume claims: an 8Gi volume for prometheus-server pod and a 2Gi volume for prometheus-alertmanager.

iv. Check metrics from Prometheus

To inspect metrics from Prometheus in the browser, you must initiate port forwarding.

kubectl port-forward -n prometheus deploy/prometheus-server 8081:9090 &
Enter fullscreen mode Exit fullscreen mode

Now, open a web browser and navigate to http://localhost:8081/targets

From the page you can see all configured metrics, alerts, rules & other configurations.

Prometheus dashboard


4. Install Grafana

In this step, we will create a dedicated Kubernetes namespace for Grafana, create the Grafana manifest file, setup security groups, setup an Ingress, and finally configure the dashboard.

i. Create a namespace for Grafana

Create a namespace called 'grafana'.
kubectl create namespace grafana

ii. Create a Grafana mainfest file

We also require a manifest file to configure Grafana. Below is an example of the file named grafana.yaml.

# grafana.yaml
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus-server.prometheus.svc.cluster.local
        access: proxy
        isDefault: true
Enter fullscreen mode Exit fullscreen mode

iii. Installing Grafana

Now, proceed to install Grafana using Helm. Replace 'my-password' with your password.

helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClass='gp2' \
    --set persistence.enabled=true \
    --set adminPassword='my-passoword' \
    --values grafana.yaml \
    --set service.type=NodePort
Enter fullscreen mode Exit fullscreen mode

iv. Create a security group

Create a security group (grafana-alb-sg) with allowing https from anywhere as an inbound rule for ingress ALB.

v. Allow inbound request to EC2 worker node security group

Before exposing Grafana to the external world, let's examine the definition of the Kubernetes service responsible for running Grafana.

$ k -n grafana get svc grafana -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: grafana
    meta.helm.sh/release-namespace: grafana
  creationTimestamp: "2023-12-29T05:59:40Z"
  labels:
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: grafana
    app.kubernetes.io/version: 10.2.2
    helm.sh/chart: grafana-7.0.14
  name: grafana
  namespace: grafana
  resourceVersion: "179748053"
  uid: 7da370e2-63a4-4ca6-8ad4-14e624a51c4f
spec:
  clusterIP: 172.20.102.0
  clusterIPs:
  - 172.20.102.0
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: service
    nodePort: 31059
    port: 80
    protocol: TCP
    targetPort: 3000
  selector:
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/name: grafana
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

Enter fullscreen mode Exit fullscreen mode

The target port is set to 3000, which corresponds to the port utilized by pods running Grafana.

To enable inbound requests for port 3000, it is necessary to associate the security group created in the previous step with the EC2 worker nodes.

vi. Setup Ingress

Define a new Kubernetes Ingress to facilitate the provisioning of an ALB.

Additionally, assume that you have already installed the AWS Load Balancer Controller to enable Kubernetes Ingress in creating an ALB.

Let's define the Ingress definition file for Grafana.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: grafana
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/load-balancer-name: grafana-alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/subnets: ${PUBLIC_SUBNET_IDs}
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/security-groups: ${ALB_SECURITY_GROUP_ID}
    alb.ingress.kubernetes.io/healthcheck-port: "3000"
    alb.ingress.kubernetes.io/healthcheck-path: /api/health
    alb.ingress.kubernetes.io/certificate-arn: ${ACM_CERT_ARN}
spec:
  rules:
    - host: ${YOUR_ROUTE53_DOMAIN}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grafana
                port:
                  number: 80
Enter fullscreen mode Exit fullscreen mode

Replace values for subnets,secruity-groups,certificate-arn, host with your values.

Post applying the new Ingress and once the new ALB is ready, navigate to ${YOUR_ROUTE53_DOMAIN} to confirm that Grafana is now accessible.

After logging into your Grafana account, proceed to import the necessary dashboards.

Grafana import dashboard

You can download dashboards from Grafana Dashboards

I utilized these two dashboards, which proved to be valuable for monitoring the overall EKS Fargate cluster.


That concludes our walkthrough! In this guide, we established a new node group essential for Prometheus and Grafana, and successfully installed and configured both tools.

I trust this post proves valuable to you! 😊


Troubleshooting:
I've authored another post detailing additional issues and their solutions encountered during the setup of Prometheus & Grafana.

You can find the post here.

Additional references:

Top comments (1)

Collapse
 
nowsathk profile image
Nowsath

Hi Readers,
I've revamped the Prometheus installation section in this article, offering improved methods compared to the previous methods.