For the past few months I have been observing intermittent network delays in the internal http communication between pods in our AKS clusters. But what does "intermittent" and "slow" mean? A picture is worth a thousand words, so here we go:
You will quickly notice that there are some outliers, ranging from 200-300ms to 2 seconds!
Upon closer inspection one of the out-of-process pod-to-pod calls is taking too long:
Now, the reasons can be manifold:
- Sender issue (delay in sending)
- Network issue (delay in transmitting)
- Recipient issue (delay in receiving/processing)
From the last screenshot above it seems that recipient was very fast at processing the response, and the time is lost somewhere in between.
Even though I had the feeling that would not help too much I opened an MS Support Ticket for checking for any network issues in our AKS clusters ... MS Support agent advised to sniff the network traffic, and this was a really good hint (I had to brush up my Wireshark knowledge though).
It turns out it is quite easy to sniff network traffic between pods in a K8s cluster (provided you have admin access to it). MS advised to use ksniff but there are also easier ways I googled only later on. So with ksniff you just need to:
Install krew (plugin installer for kubectl) - https://krew.sigs.k8s.io/docs/user-guide/setup/install/
Install ksniff - https://github.com/eldadru/ksniff#installation
Install Wireshark/tshark - here I had a problem that default Ubuntu repositories contain an old version of Wireshark/tshark, so this github issue comment helped me
kubectl sniff POD-NAME -n POD-NAMESPACE-NAME -p
(argument -p is important otherwise I was getting " ... cannot access '/tmp/static-tcpdump': No such file or directory" error)
... and voila, Wireshark opens automatically and starts getting network traffic in. I put
http in the filter so that I can see only the http traffic and started waiting for another occurrence of the problem ... which took 7-8 hours. This is what I got (317 ms instead of 2 seconds this time, but it varies):
However the request which was taking 300+ ms in the Application Insights screenshots was taking only 1 ms in this trace ... and the strange thing is that it was starting much later - 300 ms later than what I saw in Application Insights ...
http filter in Wireshark showed some interesting DNS communication, with the request-response marked in red taking 300ms!
It turns out we are using target service (Kubernetes service, behind which there are 1 or more pods) hostnames in our calling pod configuration like this:
and somehow AKS or K8s is trying multiple DNS lookups by appending again
.default.svc.cluster.local (or parts of it), until at the end it tries to lookup the original hostname which is of course found immediately ... and one of these DNS lookups takes longer from time to time.
Solution (at least for now, until MS advise what could be a better one): Remove the suffix
.default.svc.cluster.local for all target service hostnames in our calling pod configurations. The picture is different now:
Hope the above helps someone avoid this issue!