Robert Scott

Posted on Sep 14

Kubernetes cluster marathon!

#kubernetes #docker #devops #containers

Setting up a Kubernetes Cluster on Ubuntu 24.04: A Troubleshooting Journey

Or: How I learned that sometimes starting over is the best solution

Setting up Kubernetes should be straightforward, right? Well, as I discovered today, reality has other plans. Here's my troubleshooting journey setting up a two-node Kubernetes cluster on Ubuntu 24.04, complete with all the roadblocks I hit and how to fix them.

The Initial Problem: Package Repository Issues

My first hurdle came immediately when trying to install kubectl and kubelet:

sudo apt install kubectl kubelet kubeadm
# Error: couldn't find the programs kubectl and kubelet

The Fix: Updated Repository URLs

The issue was that Google changed their package repository URLs in 2024, but many tutorials still reference the old packages.cloud.google.com URLs. Here's the correct way for Ubuntu 24.04:

# Remove any old repository entries
sudo rm -f /etc/apt/sources.list.d/kubernetes.list
sudo rm -f /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Add the current official Kubernetes repository
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg

# Add the official GPG key (note the updated URL)
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Add the repository
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Update and install
sudo apt-get update
sudo apt-get install -y kubectl kubelet kubeadm
sudo apt-mark hold kubelet kubeadm kubectl

Problem #2: CRI Socket Confusion

When running kubeadm init, I got:

error: define which one you wish to use by setting crisocket field for kubeadm

This happens when you have multiple container runtimes installed. Ubuntu 24.04 can have both Docker and containerd available.

The Fix: Choose Your Container Runtime

I chose containerd (recommended for modern Kubernetes):

# Install and configure containerd
sudo apt-get install -y containerd

# Generate proper config with CRI enabled
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

# Make sure CRI plugin isn't disabled
sudo sed -i 's/disabled_plugins = \["cri"\]/disabled_plugins = []/g' /etc/containerd/config.toml

# Enable and start
sudo systemctl enable --now containerd

# Use explicit CRI socket
sudo kubeadm init --cri-socket unix:///var/run/containerd/containerd.sock

Problem #3: The CRI v1 Runtime API Error

Even after installing containerd, I got:

failed to create new CRI runtime service: validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService

The Fix: Proper containerd Configuration

The default containerd config sometimes has the CRI plugin disabled. The key is generating a proper config:

# Stop containerd
sudo systemctl stop containerd

# Remove bad config
sudo rm /etc/containerd/config.toml

# Generate proper config
sudo containerd config default | sudo tee /etc/containerd/config.toml

# Enable SystemdCgroup (required for Ubuntu 24.04)
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

# Ensure CRI plugin is enabled
grep -A5 -B5 disabled_plugins /etc/containerd/config.toml

# Restart
sudo systemctl start containerd

Test with:

sudo crictl version

Problem #4: Hostname Resolution Issues

During kubeadm init, I got errors about kubelet not being able to reach the hostname. Ubuntu 24.04 sets up hostname resolution with:

127.0.0.1 for localhost
127.0.1.1 for your hostname

But 127.0.1.1 isn't reachable from other machines!

The Fix: Use Your Real Network IP

# Find your actual network IP
ip route get 8.8.8.8 | grep -oP 'src \K\S+'

# Update /etc/hosts to use real IP instead of 127.0.1.1
sudo sed -i 's/127.0.1.1/192.168.1.244/' /etc/hosts

# Initialize with your real IP
sudo kubeadm init --cri-socket unix:///var/run/containerd/containerd.sock --apiserver-advertise-address=192.168.1.244

Problem #5: Port Already in Use

Even after kubeadm reset, I kept getting "port 6443 is in use" errors.

The Fix: Thorough Cleanup

# Reset with CRI socket specified
sudo kubeadm reset --force --cri-socket unix:///var/run/containerd/containerd.sock

# Clean up everything
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/etcd/
sudo rm -rf /var/lib/kubelet/
sudo rm -rf ~/.kube/
sudo rm -rf /etc/cni/net.d/

# Reset iptables
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X

# Kill any hanging processes
sudo pkill -f kube-apiserver
sudo pkill -f etcd
sudo pkill -f kubelet

# Restart services
sudo systemctl restart containerd
sudo systemctl restart kubelet

Problem #6: Worker Node Network Connectivity

When trying to join my worker node, I got:

error execution phase preflight: couldn't validate the identity of the API server - failed to request cluster info configmap: the client timed out waiting for headers

The worker node simply couldn't reach the control plane, even though I was using the correct IP address.

The Root Cause: Network Complexity

This is where things got complicated. I had:

Tailscale running on the control plane but not the worker
Potential firewall issues
VM networking complications

The Final Solution: Sometimes Starting Over Is Best

After hours of debugging network connectivity, container runtime conflicts, and configuration issues, I realized something important: it's okay to start over.

Instead of continuing to debug a complex setup with multiple moving parts, the better approach was:

Start with a clean VM
Set up Tailscale first (before any Kubernetes components)
Use a single container runtime from the beginning
Use Tailscale IPs for all cluster communication

This eliminates:

Network routing issues
Firewall complications
IP address confusion
Container runtime conflicts

Key Takeaways

Google changed Kubernetes repository URLs in 2024 - use the new pkgs.k8s.io URLs
Ubuntu 24.04 needs SystemdCgroup enabled for containerd
Always specify the CRI socket when you have multiple container runtimes
Use your real network IP, not 127.0.1.1 for multi-node clusters
Thorough cleanup is essential when resetting kubeadm
Network connectivity issues are the hardest to debug - consider using overlay networks like Tailscale from the start
Starting over with a plan beats fixing a messy setup

The Most Important Lesson

Don't feel bad about starting over! Kubernetes has a steep learning curve, and networking issues can be genuinely tricky even for experienced developers. Sometimes the fastest path to success is a clean slate with lessons learned.

Getting the control plane running (which I did!) is actually the hardest part. The worker node join should be straightforward once the networking is sorted out properly.

Have you faced similar Kubernetes setup challenges? What was your biggest hurdle? Share your experiences in the comments!

kubernetes #ubuntu #devops #troubleshooting #containerization #networking

DEV Community