1. Introduction
The Amazon CloudWatch Network Flow Monitor service, which can monitor the communication status between resources within AWS, was released in December 2024.
This time, we will confirm the setup procedure and usability of the EKS version (where the agent runs as a DaemonSet).
2. What We Did
- Prepare a barebones EKS cluster.
- Add the Network Flow Monitor (for EKS) add-on to the EKS cluster.
- Configure the Network Flow Monitor "monitors".
- Access an Nginx pod launched inside EKS from an external client and verify that Network Flow Monitor metrics are collected.
- Introduce packet loss to one of the Nginx pods and confirm that the Network Flow Monitor metrics change.
3. Architecture Diagram
4. Configuration Steps
4.1 Pre-environment Setup
We will build a VPC and EKS cluster for this evaluation (detailed steps omitted). This time, we use the Management Console with mostly default settings.
- The k8s version is 1.33.
- We prepared two t3.medium worker nodes as the node group.
-
The add-on "Amazon EKS Pod Identity Agent" was added (necessary for Network Flow Monitor; automatically added by default).
- The pod status after environment construction is as follows:
[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
external-dns external-dns-754cf78755-ks8nc 1/1 Running 0 19h 10.0.10.131 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system aws-node-bwshx 2/2 Running 0 19h 10.0.11.7 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
kube-system aws-node-ckvdl 2/2 Running 0 19h 10.0.10.159 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system coredns-bdbfddcf5-54sbq 1/1 Running 0 19h 10.0.10.13 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system coredns-bdbfddcf5-zvdr9 1/1 Running 0 19h 10.0.10.115 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system eks-node-monitoring-agent-ltzc9 1/1 Running 0 19h 10.0.10.159 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system eks-node-monitoring-agent-nckx7 1/1 Running 0 19h 10.0.11.7 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
kube-system eks-pod-identity-agent-jz6kc 1/1 Running 0 19h 10.0.11.7 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
kube-system eks-pod-identity-agent-khq8q 1/1 Running 0 19h 10.0.10.159 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system kube-proxy-569w9 1/1 Running 0 19h 10.0.10.159 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system kube-proxy-cm94v 1/1 Running 0 19h 10.0.11.7 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
kube-system metrics-server-fdccf8449-2b2sj 1/1 Running 0 19h 10.0.10.110 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
kube-system metrics-server-fdccf8449-5584h 1/1 Running 0 19h 10.0.10.55 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
4.2 Adding the Network Flow Monitor Add-on
We add the Network Flow Monitor add-on using the Management Console. The official procedure is "Install the EKS AWS Network Flow Monitor Agent add-on."
- From the Add-ons section of the constructed EKS cluster, select "Get more add-ons."
- Select the AWS Network Flow Monitor Agent.
- Create the necessary IAM role to be attached to the Network Flow Monitor Agent pods by selecting "Create Recommended Role."
- Create the IAM role with the default settings and configure it as the role to be attached to the pods.
- After being added as an add-on, confirm that it is running as a DaemonSet:
[ec2-user@ip-10-0-0-60 mysample]$ kubectl get daemonsets -A
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
amazon-network-flow-monitor aws-network-flow-monitor-agent 2 2 2 2 2 kubernetes.io/os=linux 75s
kube-system aws-node 2 2 2 2 2 <none> 19h
kube-system dcgm-server 0 0 0 0 0 kubernetes.io/os=linux 19h
kube-system eks-node-monitoring-agent 2 2 2 2 2 kubernetes.io/os=linux 19h
kube-system eks-pod-identity-agent 2 2 2 2 2 <none> 19h
kube-system kube-proxy 2 2 2 2 2 <none> 19h
[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-network-flow-monitor aws-network-flow-monitor-agent-7v24v 1/1 Running 0 64s 10.0.11.7 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
amazon-network-flow-monitor aws-network-flow-monitor-agent-rpqr6 1/1 Running 0 64s 10.0.10.159 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
... (other pods)
4.3 Configuring the Network Flow Monitor "Monitors"
We create three "monitors": for the entire VPC, for the AZ-3a side of the VPC, and for the AZ-3b side of the VPC.
- From CloudWatch - Flow Monitors, select "Create Monitor."
- Create a monitor targeting the entire EKS VPC.
- Create a monitor selecting the AZ-3a subnet of the EKS VPC (and similarly for AZ-3b).
4.4 Preparing Nginx
Preparing Nginx with tc
- To introduce packet loss to a pod later, we prepare an Nginx container image that can use the tc (Traffic Control) command and register it in ECR (steps omitted). The Dockerfile is as follows:
# Use the official Nginx image as the base image
FROM nginx:alpine
# Install the iproute2 package, which includes the tc command
RUN apk update && apk add iproute2
# Start nginx when the container launches
CMD ["nginx", "-g", "daemon off;"]
Deploying Nginx
- We deploy Nginx so that one pod runs on each of the two worker nodes and expose the HTTP port externally. The NET_ADMIN capability is required to run the tc command. We use AntiAffinity to prevent both pods from launching on the same worker node.
kind: Deployment
metadata:
name: mynginx-with-tc-deployment
spec:
replicas: 2
selector:
matchLabels:
app: mynginx-with-tc
template:
metadata:
labels:
app: mynginx-with-tc
spec:
containers:
- name: mynginx-with-tc-container
image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-3.amazonaws.com/mksamba/mynginx-with-tc-repo:latest
ports:
- containerPort: 80
securityContext:
capabilities:
add: ["NET_ADMIN"]
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mynginx-with-tc
topologyKey: "kubernetes.io/hostname"
---
apiVersion: v1
kind: Service
metadata:
name: mynginx-with-tc-service
annotations:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
type: LoadBalancer
selector:
app: mynginx-with-tc
ports:
- protocol: TCP
port: 80
targetPort: 80
- Confirm that pods are running on each worker node:
[ec2-user@ip-10-0-0-60 mysample]$ kubectl apply -f mynginx-with-tc.yaml
deployment.apps/mynginx-with-tc-deployment created
service/mynginx-with-tc-service created
[ec2-user@ip-10-0-0-60 mysample]$ kubectl get pod -A -o wide | grep nginx
default mynginx-with-tc-deployment-68cb4fff79-qjw9q 1/1 Running 0 71s 10.0.10.8 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
default mynginx-with-tc-deployment-68cb4fff79-tfc8s 1/1
4.5 Accessing Nginx from an External Client
We access Nginx via the CLB (Classic Load Balancer) about 10,000 times using curl from the Internet. Traffic is distributed to the two pods by the CLB.
#!/bin/bash
# Specify the target URL
URL="http://xxxxxxxxxx.ap-northeast-3.elb.amazonaws.com"
# Loop
for ((i=1; i<=10000; i++))
do
echo "Request #$i"
curl -o /dev/null -s -w "%{http_code}\n" "$URL"
done
4.6 Introducing Packet Loss to Nginx
We introduce a 3% packet loss to only one pod (the AZ-3b side).
# Identify the pod name to configure
[ec2-user@ip-10-0-0-60 ~]$ kubectl get pod -A -o wide |grep mynginx
default mynginx-with-tc-deployment-68cb4fff79-qjw9q 1/1 Running 0 27m 10.0.10.8 ip-10-0-10-159.ap-northeast-3.compute.internal <none> <none>
default mynginx-with-tc-deployment-68cb4fff79-tfc8s 1/1 Running 0 27m 10.0.11.191 ip-10-0-11-7.ap-northeast-3.compute.internal <none> <none>
# Introduce packet loss to the AZ-3b side pod
[ec2-user@ip-10-0-0-60 ~]$ kubectl exec -it mynginx-with-tc-deployment-68cb4fff79-tfc8s -- tc qdisc add dev eth0 root netem loss 3%
# (Reference) Command to revert the packet loss setting
[ec2-user@ip-10-0-0-60 ~]$ kubectl exec -it mynginx-with-tc-deployment-68cb4fff79-tfc8s -- tc qdisc del dev eth0 root
4.7 Checking the Network Flow Monitor "Monitors"
We check the monitor values during normal operation and when packet loss is introduced to one pod.
Around 21:20PM is normal operation, and around 21:40PM is the traffic with packet loss introduced.
- VPC-wide Monitor: Around 21:40PM, retransmissions are occurring.
- AZ-3a Monitor: Traffic is half of the VPC-wide total, but since the pod is normal, there are no retransmissions.
- AZ-3b Monitor: Traffic is half of the VPC-wide total, and around 21:40PM, a large amount of retransmissions occurs due to the packet loss in the pod.
- In this example, using the Network Flow Monitor, we can proceed with investigation like this: "Increased retransmissions across the entire VPC" -> "No issue on the AZ-3a side" -> "Retransmissions only on the AZ-3b side" -> "Network health Indicator for AZ-3b is Healthy, so it's not an AWS infrastructure issue" -> "Perhaps an anomaly within the user's scope of responsibility, such as the AZ-3b worker node or pod?"
5. Impressions
- The setup process was extremely easy, simply adding the add-on via the Management Console.
- This time, we confirmed the metric difference by introducing packet loss to generate retransmissions, but we want to consider how this service can improve our level of monitoring going forward.



















Top comments (0)