Kentaro Matsumoto for AWS Community Builders

Posted on Nov 25

Trying out Amazon CloudWatch Network Flow Monitor in EKS

#aws #cloudwatch

1. Introduction

The Amazon CloudWatch Network Flow Monitor service, which can monitor the communication status between resources within AWS, was released in December 2024.
This time, we will confirm the setup procedure and usability of the EKS version (where the agent runs as a DaemonSet).

2. What We Did

Prepare a barebones EKS cluster.
Add the Network Flow Monitor (for EKS) add-on to the EKS cluster.
Configure the Network Flow Monitor "monitors".
Access an Nginx pod launched inside EKS from an external client and verify that Network Flow Monitor metrics are collected.
Introduce packet loss to one of the Nginx pods and confirm that the Network Flow Monitor metrics change.

3. Architecture Diagram

4. Configuration Steps

4.1 Pre-environment Setup

We will build a VPC and EKS cluster for this evaluation (detailed steps omitted). This time, we use the Management Console with mostly default settings.

The k8s version is 1.33.
We prepared two t3.medium worker nodes as the node group.
The add-on "Amazon EKS Pod Identity Agent" was added (necessary for Network Flow Monitor; automatically added by default).
- The pod status after environment construction is as follows:

[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide
NAMESPACE      NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE                                             NOMINATED NODE   READINESS GATES
external-dns   external-dns-754cf78755-ks8nc     1/1     Running   0          19h   10.0.10.131   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    aws-node-bwshx                    2/2     Running   0          19h   10.0.11.7     ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>
kube-system    aws-node-ckvdl                    2/2     Running   0          19h   10.0.10.159   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    coredns-bdbfddcf5-54sbq           1/1     Running   0          19h   10.0.10.13    ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    coredns-bdbfddcf5-zvdr9           1/1     Running   0          19h   10.0.10.115   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    eks-node-monitoring-agent-ltzc9   1/1     Running   0          19h   10.0.10.159   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    eks-node-monitoring-agent-nckx7   1/1     Running   0          19h   10.0.11.7     ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>
kube-system    eks-pod-identity-agent-jz6kc      1/1     Running   0          19h   10.0.11.7     ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>
kube-system    eks-pod-identity-agent-khq8q      1/1     Running   0          19h   10.0.10.159   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    kube-proxy-569w9                  1/1     Running   0          19h   10.0.10.159   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    kube-proxy-cm94v                  1/1     Running   0          19h   10.0.11.7     ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>
kube-system    metrics-server-fdccf8449-2b2sj    1/1     Running   0          19h   10.0.10.110   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
kube-system    metrics-server-fdccf8449-5584h    1/1     Running   0          19h   10.0.10.55    ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>

4.2 Adding the Network Flow Monitor Add-on

We add the Network Flow Monitor add-on using the Management Console. The official procedure is "Install the EKS AWS Network Flow Monitor Agent add-on."

From the Add-ons section of the constructed EKS cluster, select "Get more add-ons."

Select the AWS Network Flow Monitor Agent.

Create the necessary IAM role to be attached to the Network Flow Monitor Agent pods by selecting "Create Recommended Role."

Create the IAM role with the default settings and configure it as the role to be attached to the pods.

After being added as an add-on, confirm that it is running as a DaemonSet:

[ec2-user@ip-10-0-0-60 mysample]$ kubectl get daemonsets -A
NAMESPACE                     NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
amazon-network-flow-monitor   aws-network-flow-monitor-agent   2         2         2       2            2           kubernetes.io/os=linux   75s
kube-system                   aws-node                         2         2         2       2            2           <none>                   19h
kube-system                   dcgm-server                      0         0         0       0            0           kubernetes.io/os=linux   19h
kube-system                   eks-node-monitoring-agent        2         2         2       2            2           kubernetes.io/os=linux   19h
kube-system                   eks-pod-identity-agent           2         2         2       2            2           <none>                   19h
kube-system                   kube-proxy                       2         2         2       2            2           <none>                   19h
[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide
NAMESPACE                     NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                                             NOMINATED NODE   READINESS GATES
amazon-network-flow-monitor   aws-network-flow-monitor-agent-7v24v   1/1     Running   0          64s   10.0.11.7     ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>
amazon-network-flow-monitor   aws-network-flow-monitor-agent-rpqr6   1/1     Running   0          64s   10.0.10.159   ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
... (other pods)

4.3 Configuring the Network Flow Monitor "Monitors"

We create three "monitors": for the entire VPC, for the AZ-3a side of the VPC, and for the AZ-3b side of the VPC.

From CloudWatch - Flow Monitors, select "Create Monitor."

Create a monitor targeting the entire EKS VPC.

Create a monitor selecting the AZ-3a subnet of the EKS VPC (and similarly for AZ-3b).

4.4 Preparing Nginx

Preparing Nginx with tc

To introduce packet loss to a pod later, we prepare an Nginx container image that can use the tc (Traffic Control) command and register it in ECR (steps omitted). The Dockerfile is as follows:

# Use the official Nginx image as the base image
FROM nginx:alpine

# Install the iproute2 package, which includes the tc command
RUN apk update && apk add iproute2

# Start nginx when the container launches
CMD ["nginx", "-g", "daemon off;"]

Deploying Nginx

We deploy Nginx so that one pod runs on each of the two worker nodes and expose the HTTP port externally. The NET_ADMIN capability is required to run the tc command. We use AntiAffinity to prevent both pods from launching on the same worker node.

kind: Deployment
metadata:
  name: mynginx-with-tc-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mynginx-with-tc
  template:
    metadata:
      labels:
        app: mynginx-with-tc
    spec:
      containers:
      - name: mynginx-with-tc-container
        image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-3.amazonaws.com/mksamba/mynginx-with-tc-repo:latest
        ports:
        - containerPort: 80
        securityContext:
          capabilities:
            add: ["NET_ADMIN"]
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - mynginx-with-tc
            topologyKey: "kubernetes.io/hostname"
---
apiVersion: v1
kind: Service
metadata:
  name: mynginx-with-tc-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

spec:
  type: LoadBalancer
  selector:
    app: mynginx-with-tc
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Confirm that pods are running on each worker node:

[ec2-user@ip-10-0-0-60 mysample]$ kubectl apply -f mynginx-with-tc.yaml 
deployment.apps/mynginx-with-tc-deployment created
service/mynginx-with-tc-service created

[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide | grep nginx
default                       mynginx-with-tc-deployment-68cb4fff79-qjw9q   1/1     Running   0          71s   10.0.10.8     ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
default                       mynginx-with-tc-deployment-68cb4fff79-tfc8s   1/1

4.5 Accessing Nginx from an External Client

We access Nginx via the CLB (Classic Load Balancer) about 10,000 times using curl from the Internet. Traffic is distributed to the two pods by the CLB.

#!/bin/bash
# Specify the target URL
URL="http://xxxxxxxxxx.ap-northeast-3.elb.amazonaws.com"

# Loop
for ((i=1; i<=10000; i++))
do
  echo "Request #$i"
  curl -o /dev/null -s -w "%{http_code}\n" "$URL"
done

4.6 Introducing Packet Loss to Nginx

We introduce a 3% packet loss to only one pod (the AZ-3b side).

# Identify the pod name to configure
[ec2-user@ip-10-0-0-60 ~]$ kubectl get pod -A -o wide |grep mynginx
default                       mynginx-with-tc-deployment-68cb4fff79-qjw9q   1/1     Running   0          27m   10.0.10.8     ip-10-0-10-159.ap-northeast-3.compute.internal   <none>           <none>
default                       mynginx-with-tc-deployment-68cb4fff79-tfc8s   1/1     Running   0          27m   10.0.11.191   ip-10-0-11-7.ap-northeast-3.compute.internal     <none>           <none>

# Introduce packet loss to the AZ-3b side pod
[ec2-user@ip-10-0-0-60 ~]$ kubectl exec -it mynginx-with-tc-deployment-68cb4fff79-tfc8s -- tc qdisc add dev eth0 root netem loss 3%

# (Reference) Command to revert the packet loss setting
[ec2-user@ip-10-0-0-60 ~]$ kubectl exec -it mynginx-with-tc-deployment-68cb4fff79-tfc8s -- tc qdisc del dev eth0 root

4.7 Checking the Network Flow Monitor "Monitors"

We check the monitor values during normal operation and when packet loss is introduced to one pod.
Around 21:20PM is normal operation, and around 21:40PM is the traffic with packet loss introduced.

VPC-wide Monitor: Around 21:40PM, retransmissions are occurring.

AZ-3a Monitor: Traffic is half of the VPC-wide total, but since the pod is normal, there are no retransmissions.

AZ-3b Monitor: Traffic is half of the VPC-wide total, and around 21:40PM, a large amount of retransmissions occurs due to the packet loss in the pod.

In this example, using the Network Flow Monitor, we can proceed with investigation like this: "Increased retransmissions across the entire VPC" -> "No issue on the AZ-3a side" -> "Retransmissions only on the AZ-3b side" -> "Network health Indicator for AZ-3b is Healthy, so it's not an AWS infrastructure issue" -> "Perhaps an anomaly within the user's scope of responsibility, such as the AZ-3b worker node or pod?"

5. Impressions

The setup process was extremely easy, simply adding the add-on via the Management Console.
This time, we confirmed the metric difference by introducing packet loss to generate retransmissions, but we want to consider how this service can improve our level of monitoring going forward.

DEV Community