DEV Community

Cover image for Baremetal Kubernetes with kubeadm in 10 minutes? Easy!
Sergei Gannochenko
Sergei Gannochenko

Posted on • Originally published at gannochenko.dev

Baremetal Kubernetes with kubeadm in 10 minutes? Easy!

A while ago, being on sick leave, I challenged myself with setting up a baremetal K8s cluster. I chose Kubeadm back then, and so now I am sharing the results.

Yeh, maybe you already have the following question popped up in your mind:

Why the hell

Well, you may want to consider a bare-metal cluster, because

👉 you are setting up an internal infrastructure for your company, and using cloud providers is not an option,

👉 you wish to receive close-to-production experience with you own pet K8s,

👉 you are well-aware of Minikube, MicroK8s and others, but always wanted to have a real multi-node cluster out there,

👉 you don't want to pay for a pre-configured AKS/GKE/EKS/whatever solution just yet,

👉 you want to set everything up from scratch, at least once, out of curiosity,

👉 you have free time :)

Why making another article while there is plenty of them already? Well because most of the material is obviously not for beginners.
Sometimes the material feels fragmented, ripped ouf of the context of something bigger or simply not covering the entire process.

What you should understand just before we continue.

☝️ It still won't be a production-ready environment for mature projects! 🚨🚨🚨

☝️ However, with extra hours dedicated, the cluster can become production-ready.

☝️ But even so, it will be good enough for hobby projects, proof of concepts, learning and exploration.

☝️ You will not receive any GUI control panel, only raw kubectl.

Resonates with your needs? Then let's do it!

Nodes

Kubernetes is a cluster, it means that typically more than one machine is involved.
I got for myself two instances of VDS (VPS), with 2 Cores and 4 GB of RAM each (those are the minimum requirements), Centos7 on-board. It was a cheap russian hosting with a data center in Moscow and unlimited bandwidth.
It works for me just fine, but you can pick AWS's EC2, DigitalOcean or Vultr, or any other provider.

You may as well get two virtual machines with VirtualBox and Vagrant, but it will not give you the same experience.

  • X.X.X.X - IP address of Node A
  • Y.Y.Y.Y - IP address of Node B

Just as a part of pre-flight preparations, I run the following commands to make the whole process a bit easier.

On my machine:

printf "\nX.X.X.X k8s-master\nY.Y.Y.Y k8s-node01\n" | sudo tee -a /etc/hosts;
scp ~/.ssh/id_rsa.pub root@X.X.X.X:~
scp ~/.ssh/id_rsa.pub root@Y.Y.Y.Y:~
~~~{% endraw %}

On each node under {% raw %}`root`{% endraw %}:{% raw %}

~~~bash
yum update

# create a non-root user
useradd admin

# enable sudo
usermod -aG wheel admin
passwd admin
passwd -l admin

# enable key-based sign-in
cd /home/admin/
mkdir ./.ssh
chmod 711 ./.ssh
cat ~/id_rsa.pub >> ./.ssh/authorized_keys
chmod 600 ./.ssh/authorized_keys
chown -R admin:admin ./.ssh
~~~{% endraw %}

You may also want to disable password-based SSH authentication, disable root login and switch SSH to a different port to ward off brute-forcers.

Now I can sign-in the nodes by typing {% raw %}`ssh admin@k8s-node01`{% endraw %} without password and able to run {% raw %}`sudo`{% endraw %}.

## Common setup

Kubernetes logically consists of two parts: {% raw %}`Control plane`{% endraw %} and {% raw %}`Compute`{% endraw %}.

* {% raw %}`Control plane`{% endraw %} operates the cluster, and it is hosted on so-called {% raw %}`master node(s)`{% endraw %}.
* {% raw %}`Compute`{% endraw %} runs containerized applications and is hosted on {% raw %}`worker nodes`{% endraw %}.

Therefore, I choose {% raw %}`Node A`{% endraw %} to be the {% raw %}`master node`{% endraw %}, whereas {% raw %}`Node B`{% endraw %} - {% raw %}`worker node`{% endraw %}.

The following set of commands I need to run <ins>on each node</ins>:

➡️ Allow nodes to address each other by name:{% raw %}

~~~bash
printf "\nX.X.X.X master\nY.Y.Y.Y node01\n" | sudo tee -a /etc/hosts;
~~~{% endraw %}

Make sure nodes can actually talk to each other:{% raw %}

~~~bash
ping master
ping node01
~~~{% endraw %}

➡️ K8s does not play nicely with SELinux, switching it off:{% raw %}

~~~bash
sudo setenforce 0
sudo sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
~~~{% endraw %}

➡️ K8s uses {% raw %}`br_netfilter`{% endraw %} kernel module for it's internal networking. So I check if module even exists, load it, and enable on-boot load:{% raw %} 

~~~bash
sudo modprobe br_netfilter
sudo echo "br_netfilter" | sudo tee -a /etc/modules-load.d/br_netfilter.conf
~~~

➡️ Tell the `bridge module` to use `iptables`:

~~~bash
echo '1' | sudo tee -a /proc/sys/net/bridge/bridge-nf-call-iptables
printf "\nnet.bridge.bridge-nf-call-iptables = 1\nnet.ipv4.ip_forward=1\n" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
~~~

➡️ Disable firewalld 👻 😱

Some out-of-the-box firewall rules are set in a way it paralyzes cluster networking. In production it totally makes sense to find what causes that and fix the issue. Since I am building a non-production environment, I don't bother investing time. So I just disable {% raw %}`firewalld`{% endraw %} and by doing so I wipe out all firewall rules.{% raw %}

~~~bash
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld

# making sure there is not a single rule left, and the policy is "ACCEPT" on every chain
sudo iptables -L -v -n
~~~{% endraw %}

➡️ K8s requires the {% raw %}`swap`{% endraw %} partition to be disabled for good:{% raw %}

~~~bash
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab;
~~~{% endraw %}

➡️ Install and configure {% raw %}`Docker`{% endraw %}{% raw %}

~~~bash
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce

# changing docker cgroupdriver to "systemd" 
sudo mkdir /etc/docker
sudo tee /etc/docker/daemon.json > /dev/null <<'EOF'
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF
~~~{% endraw %}

➡️ Install {% raw %}`Kubernetes`{% endraw %} and {% raw %}`kubeadm`{% endraw %}{% raw %}

~~~bash
sudo tee /etc/yum.repos.d/kubernetes.repo > /dev/null <<'EOF'
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

sudo yum install -y kubelet kubeadm kubectl
~~~{% endraw %}

➡️ Launch {% raw %}`Docker`{% endraw %} daemon{% raw %}

~~~bash
sudo systemctl start docker
sudo systemctl enable docker
~~~{% endraw %}

➡️ Launch {% raw %}`Kubelet`{% endraw %} daemon

Kubeled daemon in charge of doing all K8s-related stuff on this particular node.{% raw %}

~~~bash
sudo systemctl start kubelet
sudo systemctl enable kubelet
~~~{% endraw %}

Allright, hopefully no errors there, and we can proceed to more neat stuff.

## Master-specific setup

The commands below should be executed <ins>on the master node only</ins>.

➡️ Let {% raw %}`kubeadm`{% endraw %} do all the heavy lifting:{% raw %}
~~~bash
sudo kubeadm init --apiserver-advertise-address=X.X.X.X --pod-network-cidr=10.244.0.0/16 --apiserver-cert-extra-sans=localhost,127.0.0.1
~~~{% endraw %}

If everything went well, I just need to do a few more extra steps.

➡️ Create a configuration file for the current user (admin){% raw %}

~~~bash
mkdir -p ${HOME}/.kube
sudo cp -i /etc/kubernetes/admin.conf ${HOME}/.kube/config
sudo chown $(id -u):$(id -g) ${HOME}/.kube/config
~~~{% endraw %}

➡️ Deploy a pod network

So we have containers. A {% raw %}`container`{% endraw %} is a containerized application written in any language (Node, Java, Go, ...) and running inside of the cluster.
A {% raw %}`pod`{% endraw %} is a group of {% raw %}`containers`{% endraw %}, which share common resources.
{% raw %}`Pods`{% endraw %} are exposed to the outer world via {% raw %}`services`{% endraw %}.

Yeh, sounds complicated. But this is what makes K8s quite abstracted and therefore unbiased and flexible.

So the {% raw %}`pod network`{% endraw %} then is an implementation of K8s networking model: it enables pods talking to each other, to services, and via services - to the outer world.

There are numerous implementations out there, but I will stick to the kinda default one, which is called {% raw %}`flannel`{% endraw %}. It is basic and proven to be working.{% raw %}

~~~bash
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
~~~{% endraw %}

## Joining the nodes

Hooray! Technically nothing stops us from joining the nodes together now.

On the master node I create a new token for joining the cluster:{% raw %}

~~~bash
sudo kubeadm token create
~~~{% endraw %}

This will get me the token of format {% raw %}`xxxxxx.xxxxxxxxxxxxxxxx`{% endraw %}.

## Worker-specific setup

The commands below should be executed <ins>on the worker node(s) only</ins>.

➡️ Add a worker node to the cluster

Member that token I made? I need that one to execute the {% raw %}`join`{% endraw %} command <ins>on each worker node</ins>:{% raw %}

~~~bash
sudo kubeadm join X.X.X.X:6443 --token xxxxxx.xxxxxxxxxxxxxxxx --discovery-token-unsafe-skip-ca-verification
~~~{% endraw %}

From this moment on, there will be no need to do something else on the {% raw %}`worker`{% endraw %} node, so I may end that ssh session.

## Checking the status of the cluster

I can now go back to the {% raw %}`master`{% endraw %} node and check how the cluster is doing:{% raw %}

~~~bash
kubectl get nodes
~~~{% endraw %}

If I see something like this...{% raw %}

~~~text
NAME     STATUS   ROLES    AGE    VERSION
node01   Ready    <none>   4h7m   v1.19.3
master   Ready    master   45h    v1.19.3
~~~{% endraw %}

... that means: "Congratulations 🎉🎉🎉 You and I just got our own K8s clusters up and running!" 

Most of the tutorials usually end here, or they maybe show how to expose an Apache container via a {% raw %}`NodePort`{% endraw %}-type service (make a container accessible on the worker node directly via a random port greater than 1023). This obviously is not commonly-used IRL, as it defeats one of the key advantages of K8s. So instead let's go a bit further and setup `ingress`. 

➡️ Ingress

Basically, ingress tells the cluster *&laquo;Hey, for this given domain, path and protocol route all requests to the service A. For other domain, path and protocol - to the service B (and so on)&raquo;*. So ingress works as a router for the incoming requests.

Sounds quite powerful, doesn't it? I am gonna use {% raw %}`ingress-nginx`{% endraw %}, the default kubernetes ingress, as it is pretty simple and easily configurable:{% raw %}

~~~bash
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.40.2/deploy/static/provider/baremetal/deploy.yaml
~~~{% endraw %}

Let's check how ingress is doing:

~~~bash
kubectl get pods -n ingress-nginx
~~~

In a few minutes, the status should look somewhat like this:

~~~text
NAME                                        READY   STATUS      RESTARTS   AGE
ingress-nginx-admission-create-7dhcj        0/1     Completed   0          22m
ingress-nginx-admission-patch-kcsvt         0/1     Completed   0          22m
ingress-nginx-controller-785557f9c9-7ldcl   1/1     Running     0          22m
~~~

➡️ Load balancer

What ingress does not know, obviously, is how exactly the cluster is connected to the outer world. This task is solved by a `load balancer` that typically works outside of the cluster. Every cloud provider implements it's own one, and their load balancers vary from one to another. Since I don't have any cloud provider here, I need a substitution.

So there is a project called `Metallb`, which I am going to leverage for this matter. Please find which version is [the latest](https://metallb.universe.tf/). Also, sometimes they do backward-incompatible installation instruction updates, so if something does not work, feel free to [check their docs out](https://metallb.universe.tf/installation/)!

~~~bash
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
~~~

I also need to provide a config file, otherwise the balancer stays dormant.

~~~bash
tee /tmp/metallb-conf.yml > /dev/null <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
    namespace: metallb-system
    name: config
data:
    config: |
        address-pools:
        - name: default
          protocol: layer2
          addresses:
          - Y.Y.Y.Y
EOF
kubectl apply -f /tmp/metallb-conf.yml;
rm /tmp/metallb-conf.yml;
~~~

To see how the balancer is doing:

~~~bash
kubectl get pods -n metallb-system
~~~

It should look like this:

~~~text
NAME                         READY   STATUS    RESTARTS   AGE
controller-8687cdc65-h6d5m   1/1     Running   0          14s
speaker-b4p95                1/1     Running   0          14s
speaker-xhm72                1/1     Running   0          14s
~~~

***

Okay, so this is pretty it. I have set the cluster up, and it is working. In the next article (which is coming soon) I will deploy an infrastructure using Terraform, with actual domains and SSL support! Stay tuned.

## Troubleshooting

### 👉🐛 When joining a new node, the process hangs at the "[preflight] Running pre-flight checks" stage forever.

Endless process or timeout error sometimes lie in the I/O plane. Please make sure that nodes can ping each other by IP and the `master` node has only kubelet-produced rules in `iptables`.

### 👉🐛 Pods are stuck at "ContainerCreating" step

This problem may have many causes. In general, to see the events of a pod and where it stuck, a `describe pods` command is used:

~~~bash
kubectl describe pods -n ingress-nginx
~~~

If something goes totally pear-shaped, you may consider running the following piece of commands <ins>on each node</ins> to revert the cluster to pre `kubectl init (join)` state:

~~~bash
sudo kubeadm reset -f
sudo rm -rf /etc/cni/net.d

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
sudo iptables -X

sudo systemctl restart docker
sudo systemctl restart kubelet

rm -rf ${HOME}/.kube
~~~

## Useful links

* [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) - K8s's official bootstrapping tool
* [kops](https://kops.sigs.k8s.io/) - K8s installer on AWS/GCE/OpenStack/Digital Ocean/Spot Ocean
* [kubespray](https://kubespray.io/) - another installer
* [k8s cheatsheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/)
* [list of pod networks in the alphabetical order](https://kubernetes.io/docs/concepts/cluster-administration/networking/)
* [list of ingress controllers](https://medium.com/flant-com/comparing-ingress-controllers-for-kubernetes-9b397483b46b)

---

Cover image by Toni Vuohelainen at [Flickr](https://www.flickr.com/photos/tonivuohelainen/)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)