Manasseh

Posted on Oct 12

Setting Up a High Availability Kubernetes Cluster on AWS with Kubeadm, Terraform, and Ansible

Tech Stack

Terraform, Ansible, Docker, cri-dockerd, kubeadm, Kubernetes, Ubuntu, AWS {VPC, EC2, NLB}

This project contain the all required automation code for setting up Kubernetes cluster using kubeadm in AWS cloud environment.

Infrastructure Provisioning

Terraform for all the infrastructure provisioning automation.

Kubernetes Cluster Setup

Ansible for all Server & Cluster configurations.

Architecture Diagram

Writeup

Github Repo
Step 1: Clone the GitHub Repository and Initialize Terraform
terraform init
This command initializes the working directory, downloading provider plugins and preparing the environment.
terraform validate
This command checks if your Terraform configuration is syntactically valid and consistent
terraform apply
To apply the execution plan and deploy your resources

Step 2: Provisioning of resources on AWS and Ansible Playbook Configuation

After running terraform apply and confirming the action with "yes," Terraform is creating the specified resources such as VPC, subnets, NAT gateway, security groups, load balancer, and EC2 instances (workers, masters, bastion).

The Ansible playbook successfully gathers system information (facts) from all the EC2 instances (ip-10-0-*) without issues. Fact gathering is a default step that collects information about the managed nodes.

The following tasks are being executed across multiple EC2 instances:

Docker Installation: This task changes the system state, indicating that Docker was successfully installed on the target instances.
Install APT Transport HTTPS: Another package installation task required for using APT with HTTPS-based repositories.
Install curl: Curl was already installed, so no changes were made, but the task ran successfully on all instances.
Get Kubernetes package key: The task retrieves the Kubernetes package signing key, which is necessary for adding the Kubernetes repository.
Install Kubernetes repository: The repository is successfully installed, allowing Kubernetes tools to be installed.
Kubelet & Kubeadm Installation: These are essential Kubernetes components, and their installation completed successfully.
CRI-Dockerd Version: The playbook retrieves the latest version of cri-dockerd, which is a Docker runtime shim for Kubernetes. The output shows the version as v0.3.15.

The output shows:

The playbook generates a Master Join command using kubeadm token create.
The same process happens for worker nodes, where a join command is generated and copied locally for use.
kubectl, the Kubernetes command-line tool, is installed on the control plane nodes.
The additional control plane nodes are joined to the cluster using the join command with the token and certificate key.
The worker nodes are similarly joined to the cluster, completing the setup.

Play Recap: Shows the status for each EC2 instance involved (both control plane and worker nodes). Kubernetes cluster should be fully set up.

ok: Number of tasks successfully executed.
changed: Number of tasks that resulted in changes.
unreachable: 0 indicates all nodes were reachable.
failed: 0 means no tasks failed.
Total Resources Added: 49 added shows the creation of 49 resources (e.g., EC2 instances, network resources, etc.).

Output: Bastion Host IP: bastion_host_public_ip = "3.235.87.56" indicates the public IP of the bastion host, which you can use to access the cluster.

PART TWO:

resource "tls_private_key" "ssh" {
  algorithm = "RSA"
  rsa_bits  = "4096"
}

resource "local_file" "k8_ssh_key" {
    filename = "k8_ssh_key.pem"
    file_permission = "600"
    content  = tls_private_key.ssh.private_key_pem
}

resource "aws_key_pair" "k8_ssh" {
  key_name   = "k8_ssh"
  public_key = tls_private_key.ssh.public_key_openssh
}

File keys.tf generates SSH key pair using Terraform's tls_private_key resource and stores the private key locally in a PEM file(k8_ssh_key.pem). Then Uploads the public key to AWS as a key pair (k8_ssh) for use with EC2 instances.

Therefore, a PEM file (k8_ssh_key.pem) will be generated in your current working directory with the correct permissions.

Step 1: Connect to the Bastion host

Step 2: Get private ipv4 address for one of the Control Plane and ssh into it

kubectl get nodes
This command shows the nodes in your Kubernetes cluster. Each node represents a machine (physical or virtual) on which Kubernetes runs your applications.

Troubleshooting

kubectl get pods -A
This shows all the pods running across all namespaces (-A means all namespaces).

kube-flannel pods are crashing and coredns pods are stuck in ContainerCreating state.

Since Flannel is a networking tool, it may fail due to issues with the node network setup.
step1: Check the kube-flannel ConfigMap to ensure that the Network CIDR matches the PodCIDR assigned to your nodes.
kubectl get configmap -n kube-flannel kube-flannel-cfg -o yaml
step2: Check Node Configuration
`kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'

Based on the output from the kubectl get configmap and kubectl get nodes commands, we can see that there is a mismatch between the Flannel network configuration and the PodCIDRs assigned to the nodes.
step3: Update the Flannel configuratioN and edit the ConfigMap.
kubectl edit configmap -n kube-flannel kube-flannel-cfg

All pods are up and Running

These pods are essential Kubernetes components:

coredns: Responsible for DNS resolution within the cluster.
kube-apiserver, kube-controller-manager, kube-scheduler: These components are essential to the control plane and are all running without issues, meaning the control plane is functional.
kube-proxy: This is a network proxy that maintains network rules on each node. All kube-proxy pods are running successfully.

PART THREE:

High availability is critical in Kubernetes deployments, especially for production environments. By leveraging a load balancer to manage traffic and implementing a multi-AZ architecture, we ensure that the Kubernetes cluster remains resilient, scalable, and secure. This approach minimizes the risk of downtime and ensures that your applications are always available, even in the event of failures.