For the past few months I have been observing intermittent network delays in the internal http communication between pods in our AKS clusters. But what does "intermittent" and "slow" mean? A picture is worth a thousand words, so here we go:
You will quickly notice that there are some outliers, ranging from 200-300ms to 2 seconds!
Upon closer inspection one of the out-of-process pod-to-pod calls is taking too long:
Now, the reasons can be manifold:
- Sender issue (delay in sending)
- Network issue (delay in transmitting)
- Recipient issue (delay in receiving/processing)
From the last screenshot above it seems that recipient was very fast at processing the response, and the time is lost somewhere in between.
Even though I had the feeling that would not help too much I opened an MS Support Ticket for checking for any network issues in our AKS clusters ... MS Support agent advised to sniff the network traffic, and this was a really good hint (I had to brush up my Wireshark knowledge though).
It turns out it is quite easy to sniff network traffic between pods in a K8s cluster (provided you have admin access to it). MS advised to use ksniff but there are also easier ways I googled only later on. So with ksniff you just need to:
Install krew (plugin installer for kubectl) - https://krew.sigs.k8s.io/docs/user-guide/setup/install/
Install ksniff - https://github.com/eldadru/ksniff#installation
Install Wireshark/tshark - here I had a problem that default Ubuntu repositories contain an old version of Wireshark/tshark, so this github issue comment helped me
kubectl sniff POD-NAME -n POD-NAMESPACE-NAME -p
(argument -p is important otherwise I was getting " ... cannot access '/tmp/static-tcpdump': No such file or directory" error)
... and voila, Wireshark opens automatically and starts getting network traffic in. I put
http in the filter so that I can see only the http traffic and started waiting for another occurrence of the problem ... which took 7-8 hours. This is what I got (317 ms instead of 2 seconds this time, but it varies):
However the request which was taking 300+ ms in the Application Insights screenshots was taking only 1 ms in this trace ... and the strange thing is that it was starting much later - 300 ms later than what I saw in Application Insights ...
http filter in Wireshark showed some interesting DNS communication, with the request-response marked in red taking 300ms!
It turns out we are using target service (Kubernetes service, behind which there are 1 or more pods) hostnames in our calling pod configuration like this:
and somehow AKS or K8s is trying multiple DNS lookups by appending again
.default.svc.cluster.local (or parts of it), until at the end it tries to lookup the original hostname which is of course found immediately ... and one of these DNS lookups takes longer from time to time.
Solution (at least for now, until MS advise what could be a better one): Remove the suffix
.default.svc.cluster.local for all target service hostnames in our calling pod configurations. The picture is different now:
Hope the above helps someone avoid this issue!
Top comments (8)
Hello good afternoon.
Thanks for sharing knowledge.
I am following the deployment script, however, when running the command to check the application pod, the ksniff de pod is not created.
I checked on the internet and found a reference saying I should add the following parameter to the command: --socket /run/k3s/containerd/containerd.sock
This caused the ksniff pod to be created, but ksniff cannot mount the volume.
Did you do something different from what is in the publication to make the solution fit to be used in AKS?
What does kubectl get events say?
I dont remember anymore, but must have dilligently described what I did ...
Thanks for answering.
Executing the command without entering the parameter --socket /run/k3s/containerd/containerd.sock the error presented is: ctr: container "ksniff"" in namespace "k8s.io": not found
Inserting this parameter the pod is deployed, however, the volume is not mounted and reports the error: Unable to attach or mount volumes: unmounted volumes=[container-socket], unattached volumes=[container-socket host default-token-7tfcb] : timed out waiting for the condition
I opened a support request in the Ksniff project, but so far I haven't had any feedback.
I am not sure what command you are sending ..
I see in your github issue this:
kubectl sniff -p nginx -n sniff --socket /run/k3s/containerd/containerd.sock -v
maybe you should try rearranging the parameters, and making sure nginx is the pod name (and the pod really exists), and sniff is the namespace name (and it really exists) ...
kubectl sniff POD-NAME -n POD-NAMESPACE-NAME -p
Not sure why you would need the --socket stuff ...
We are facing the same issues as well but we can't remove the
.namespace.svc.cluster.localbecause the dependent service is in different namespace. Did you get any update from Microsoft or do you have any other workarounds?
I do not have an open ticket with MS about this anymore, and I am not waiting on any answer from them. The workaround was enough for us ...
Did you try appending the namespace only?
Glad you liked ksniff!
Yeap, thanks for creating it!