Last week, with my colleague Marc, we faced a timeout issue in an Istio service mesh. An idle PostgreSQL connection was shut down precisely one hour after it has been opened. During our investigations, I had to capture the network traffic entering and leaving the PostgreSQL client pod.
For this, I've been using Ksniff. This Kubernetes plugin tries to deploy a statically compiled tcpdump binary inside the pod which traffic you want to capture and then streams the captured packets to a Wireshark instance running on your workstation. This plugin is just awesome, thanks a lot to its author Eldad Rudich!
Istio works by injecting an Envoy proxy sidecar container inside every pod, which will intercept inbound and outbound network traffic. So when a process inside your container communicates with an external server, there are in fact two TCP connections: one between the process and Envoy, and one between Envoy and the distant server. However, in Wireshark, I saw only the packets between Envoy and the distant server and the packets from Envoy to the process but not the packets from the process to Envoy.
In the Wireshark screenshot below,
10.0.9.225 is the IP address of the process,
172.20.94.219 is the IP address of the virtual service the process communicates with, and
10.0.5.234 is the IP address of the distant (real) server backing the virtual service.
I was a bit surprised 🤔 So I searched how Istio diverted the network traffic through Envoy. It does so by adding iptables
REDIRECT rules to send the traffic to port
localhost which Envoy is listening on. Indeed, I could see it in Wireshark:
But then, how can Envoy know where to forward the traffic if all the traffic it sees entering is destined for
127.0.0.1:15001? All my knowledge on IP network functioning was questioned! However, after some time spent looking for the answer, I finally found it!
When a packet hits an iptables
REDIRECT rules, the kernel sets a socket option named
SO_ORIGINAL_DST which contains the original packet destination. Envoy just has to read this option to decide what to do with this packet.
We've seen how network traffic is redirected in an Istio service mesh using iptables
REDIRECT rules and
SO_ORIGINAL_DST socket option.
Going back to our original issue, the investigations confirmed that the culprit was Envoy's idle timeout. From what I understand, it should be possible to configure it, but I didn't figure out yet how to do so.