An Iximiuz Cluster of Clusters with Tailscale and Cilium

#tailscale #cilium #kubernetes #networking

Editors note: This was originally much more ambitious, but for various reasons detailed below it had to be scaled back a bit. It's still pretty cool.

The final k8s cluster project is here and the thing I built to visualize it is here, but read on for the story of how I got there!

So I got hooked on the kubernetes-the-hard-way walkthroughs a couple years back and ran through them on various different public cloud providers. Since iximiuz Labs is my current favorite place to play, I thought I would give it a go there as well.

It turns out that somebody else already did this, and it's way better than I would have done anyway, so I guess I might as well just forget about it.

...or maybe not!!!!

What if, I thought to myself...just what IF instead of the traditional 1-3 controller nodes and 2-3 worker nodes (these numbers, I assume, are used in the typical k8s the hard way due to cost), I instead networked together not only more worker nodes, but different networks altogether?!?!

The canonical walkthrough uses only 4 machines (1 controller and 3 workers) all in the same subnet (this, as we shall see, MASSIVELY simplifies the networking). I decided that since I had so much fun playing with Tailscale in a previous post, I would use it a bit more to connect the various subnets across multiple iximiuz Labs playgrounds.

So the original idea was to do all the inter-node routing via Tailscale, so that each controller and worker node would be able to address each other by hostname, which would map to the Tailscale overlay IP address (eg, 100.1.20.144). So even though we were wiring together a bunch of different playgrounds/subnets, all of the VMs (and hence also the eventual pods) would think they were on a flat network.

My Kingdom for Some Cool Viz

Then I decided that just a big ole kubernetes cluster is pretty cool all by itself, but it's a bit difficult to get a good picture of exactly HOW that differs from just a regular one (ie, all the other k8s the hard way walkthroughs).

So I thought, maybe we could have some sort of visual, and even better to have a live traffic graph UI that would make it very obvious when we scaled a deployment up/down, or even just had a traffic generator hit one of the services with a ton of traffic and actually see the difference between a 3 worker cluster and a 9 worker cluster.

And so after a bit of research, it seemed like the easiest way to do this would be with some sort of a service mesh (the pods don't care about it, and there's usually baked in telemetry, so a ready data source that I could consume).

The ones I was looking at were Istio/Envoy/Cilium, and since the iximiuz Labs VMs have a maximum of 4CPU and 8GiB RAM, I was trying to minimize the resource usage of the service mesh. Of those, not only is Cilium the clear winner, but it uses eBPF, which I though was really cool, and I wanted to play with it more.

Okay, so I decided on Cilium, now what?

I started looking around for traffic visualization projects using Cilium (more specifically, hubble-observatory, which is Cilium's observability platform). The default view for hubble-ui, which is the included UI for hubble-observatory, just shows static traffic routes.

Basically nothing I found was live traffic. There was a super old Netflix project, but it wasn't maintained anymore.

Hmmmm...there's nothing that does what I need, and I've been spending the last year or so getting acquainted with agentic workflows...

Let's Just Make The Thing We Need

Okay, so I need something running in the cluster that will grab data from hubble-observatory and I need a frontend that will open up and display something in a browser tab. Let's mix some Go and React and do the damned thing!

I used codex for this, and I was actually surprised at how far it got on the first attempt. Most of the ~~work~~ prompting was getting a local dev example up and running with sample data so I could see what it looked like and iterate on stuff...

no, those graph edges need to be thinner
...thinner!
alright, and stop resetting everything every 2 seconds, I said DECOUPLE the frontend and backend!
WHY DID YOU SUGGEST WEBSOCKETS AGAIN?!?!! THIS COULD EASILY BE SERVER SENT EVENTS!!!! (to be fair, it probably doesn't matter that much in this case, since it's just a throwaway demo, but the agent always starts out with websockets even when it's definitely a worse choice than SSE)
that's okay, but we need to auto-zoom when the topology changes
forget everything I just said, we need to also cater for a mobile UI (I'm obsessed with running iximiuz Labs stuff on my phone, so this is a must for any of my projects)

...eventually, I had something that looked like I wanted, and I was ready to get it showing some actual live data.

My MVP meets Live Cluster Reality

Okay, so this is where things went a bit pear-shaped, and I spent probably 75% of the total project time.

To start with, actually running a service inside kubernetes was going to require all kinds of extra stuff like RBAC, network policies, and pod specs for the deployment. No biggie, I've got an Intern to do that.

I'll also admit to heavy ulitization of the scripts in the project, since there's no way I was gonna "hard way" build it from scratch every day (the playgrounds have a max lifetime of 8 hours, and since it's a group of them, I guess we could have saved VM state and restarted them, but I also wanted to be sure the project would always come up from zero, so I went with the scripts), and I started to see some really weird failures and intermittent behavior:

Everything would come up, but there would be no data coming through hubble-gazer
I could see initial data in hubble-gazer, but then every 2 seconds (the same interval as the SSE pushes to the frontend) it got less and less until it completely disappeared
sometimes the scripts would just choke and things wouldn't start up

The short answer to "why?" is that not only was I using an overlay network (Cilium VXLANs) wrapped in another overlay network (Tailscale), but the topology was such that we had 5 different subnets and 12 kubernetes nodes all communicating with each other for reachability.

We ended up with a situation where some of the playgrounds would end up with the same public IP address but be on different private subnets, and so Tailscale would have to fall back to a relayed connection, which still works, but introduces latency.

So things I had initially wanted to try, like HPA scaling based on CPU from the metrics server to show a cool demo of dynamic pod growth, just wouldn't work. I couldn't even get just KEDA with queues in redis to work, since the jitter and general network conditions just wouldn't support it.

This was all very frustrating and a bit confusing to me, and I'm sure somebody with better networking knowledge could have sorted it out, but I had already learned a bunch and a deadline (I wanted this done by Kubecon EU 2026) was fast approaching.

Updating the topology so all of the worker nodes were in the same subnet basically solved everything, since now internode traffic didn't have to transit two layers of tunnelling, and we only used Tailscale for communication between the workers and the control plane.

So we ended up with a much simplified topology, but it also finally worked and I could finish aligning hubble-gazer to show the live traffic.

I had been hoping that the KEDA-based scaling I added would be more obvious, but oh well.

The final topology (3 playgrounds, 9 VMs) looks like this:

And here are some tasty gifs of the live traffic being portforwarded from the jumpbox back to localhost to be accessible from the laptop I used to set up the cluster (that's why the browser URL bar says localhost:8888).

All services and pods