Swap Fire: My Kubernetes Experiment on a 7.6 GB VPS

#kubernetes #vps #swap #linux

Experimenting with Kubernetes on a 7.6 GB RAM VPS

Running a resource-intensive platform like Kubernetes on small-scale server environments, especially on a Virtual Private Server (VPS) with only 7.6 GB of RAM, often leads to unexpected and frustrating problems. In this post, I'll share my experiences dealing with the "swap fire" issue while setting up Kubernetes on my own VPS and how I overcame it. My goal is to explain with concrete examples what can go wrong in such situations, why it goes wrong, and most importantly, what we need to pay attention to when setting up this kind of infrastructure.

This experiment was purely for learning purposes. While running Kubernetes with such limited resources for production environments might not be very sensible, the knowledge I gained during this process sheds light on memory management and performance issues that can be encountered even in larger systems. This experiment on my own system shows how complex systems can react in unexpected ways and how much they can teach us system administrators.

Diving Deep into Swap Usage

Swap memory is an area created by the operating system on disk temporarily when physical RAM (Random Access Memory) is insufficient. A portion of the data in RAM is moved to disk, allowing more data to fit into RAM. While it sounds like a great solution in theory, the significantly slower disk access compared to RAM means that increased swap usage leads to severe drops in system performance. Especially in environments where services and containers like Kubernetes consume a lot of memory, excessive swap usage can lead to a complete performance collapse.

In my experiment, I started installing Kubernetes on my 7.6 GB RAM VPS after setting up a few basic services. At first, everything seemed to be going well. However, as the services and Kubernetes components started running, I noticed the system slowing down. When I ran the htop command, I saw RAM usage exceeding 90% and swap usage suddenly increasing rapidly. This indicated that the system was practically suffocating.

ℹ️ Why Does Swap Usage Cause Problems?

Because swap memory is orders of magnitude slower than physical RAM, heavy swap usage dramatically increases the system's overall response time (latency). This is an unacceptable situation, especially for real-time or high-performance applications. In distributed systems like Kubernetes, excessive swap usage on one node can cause that node to become unstable and even drop out of the cluster.

Triggers That Ignited the Swap Fire

So, how exactly did this "swap fire" start? Kubernetes, with its control plane components (API server, etcd, scheduler, controller manager) and agents like kubelet, already consumes a certain amount of memory. Additionally, the applications or services we run must also meet their own memory needs. In my case, the few basic services I initially set up (e.g., a monitoring tool and a database), combined with the Kubernetes components I added, quickly filled up the 7.6 GB of physical RAM.

To understand the source of the problem, I carefully examined the output of the dmesg command. In the system logs, I was seeing messages indicating that some processes were being terminated by the "Out Of Memory (OOM) killer" due to low memory. This showed that the system had completely exhausted RAM, and the kernel was forcefully closing the most memory-intensive processes to free up more memory. However, these terminations provided only temporary relief; the system was still becoming dependent on swap space.

Another important point was that Kubernetes itself had a specific memory profile. For example, components like etcd, having a database-like structure, can increase their memory consumption over time. Kubelet also uses memory while monitoring the status of containers on the node. When all these components came together, the 7.6 GB RAM quickly became insufficient.

# htop output after first connecting to the server

<figure>
  <Image src={cover} alt="An abstract visual representing complex networks and swap usage in a server" />
</figure>

top - 14:30:00 up 1 day,  1:15,  1 user,  load average: 0.50, 0.60, 0.70
Tasks: 200 total,   1 running, 199 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.0 us,  2.0 sy,  0.0 ni, 93.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7800 total,    150 free,   6500 used,   1150 buff/cache
MiB Swap:   2048 total,    500 free,   1548 used.   1000 avail Mem

# As memory usage reaches 83%, swap usage also starts to increase
top - 14:35:00 up 1 day,  1:20,  1 user,  load average: 1.20, 1.00, 0.90
Tasks: 210 total,   2 running, 208 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.0 us,  5.0 sy,  0.0 ni, 85.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7800 total,     50 free,   7200 used,    550 buff/cache
MiB Swap:   2048 total,     50 free,   1998 used.      0 avail Mem  <-- Swap is almost full!

The Root Cause: Dependency on Swap

Using swap memory is not inherently "bad." The real problem is when the system is forced to constantly rely on swap space. This leads to a cycle known as "swap thrashing," where the system constantly moves data between disk and memory, and the CPU is heavily occupied with these transfer operations, preventing real work from being done. In systems like Kubernetes, this can cause a node to become unstable and even drop out of the cluster.

In my situation, the value of the swappiness parameter was also a significant factor. swappiness is a parameter that determines how aggressively the kernel will use swap space. Its value ranges from 0 to 100. A value of 0 tells the kernel to use swap only when RAM is completely full, while a value of 100 encourages the kernel to actively use swap space even if RAM is not full. The default value is usually around 60. The high value of this parameter on my VPS caused the system to start using swap even when there was still available space in RAM.

Another indicator of this dependency was the system becoming unresponsive. Even connecting via ssh could take minutes, and commands would not respond. Running tools like htop became difficult because these tools also need memory and CPU to run. This was a complete performance bottleneck.

⚠️ Swap Swappiness Value

You can see the current swappiness value with the command sysctl vm.swappiness. To change the value, you can use a command like sudo sysctl vm.swappiness=10. However, to make this change permanent, you need to add it to the /etc/sysctl.conf file. In production environments, especially if your server has sufficient RAM, lowering the swappiness value (e.g., to 10) generally has a positive impact on performance.

Solution: Increasing Swap Resources and Adjusting Swappiness

The first step was to increase the swap space to alleviate the urgency of the problem. The existing 2 GB of swap space was filling up quickly. I created a new swap file and increased its size. This would allow the system to temporarily hold more memory on disk, but it wouldn't solve the fundamental problem.

The steps to create a new swap file are generally as follows:

Create Swap File: Allocate space for the swap file using fallocate or dd.

# Let's create a new 4 GB swap file
sudo fallocate -l 4G /swapfile2
# Or with dd:
# sudo dd if=/dev/zero of=/swapfile2 bs=1M count=4096

Set File Permissions: Ensure only the root user can access the swap file.
```
sudo chmod 600 /swapfile2
```
Format as Swap Area: Mark the created file as a swap area.
```
sudo mkswap /swapfile2
```
Activate Swap: Add the new swap file to the system.
```
sudo swapon /swapfile2
```
Make Permanent: Add it to the /etc/fstab file so the swap file is automatically activated on system reboot.
```
echo '/swapfile2 none swap sw 0 0' | sudo tee -a /etc/fstab
```

After completing these steps, you can see that the swap space has increased by running commands like htop or free -h.

# Status after increasing swap space
free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        7.2G         50Mi       100Mi         550Mi         100Mi
Swap:          6.0G        1.5G        4.5G

# Lowering swappiness to 10
sudo sysctl vm.swappiness=10
# Making the change permanent
echo 'vm.swappiness = 10' | sudo tee -a /etc/sysctl.conf

These steps allowed the system to breathe temporarily. However, the real solution was not just to increase swap space but also to reduce the kernel's swap usage by lowering the swappiness value. Lowering swappiness to 10 reduced the kernel's tendency to resort to swap when there was still free space in RAM. This directed the system to utilize physical RAM more, reducing disk I/O.

Optimizations and Alternatives for Kubernetes

To run Kubernetes smoothly on a small VPS, additional optimizations and different approaches might be necessary. Firstly, some settings can be adjusted to reduce Kubernetes' own memory consumption. For example, adjusting kubelet's memory limits or using lighter alternatives for components like etcd (if possible). However, this is generally a more complex topic, and distributed systems inherently have certain resource requirements.

A more practical approach might be to reduce the memory footprint of the applications to be run. For instance, using lightweight container images, fixing memory leaks in applications, or choosing alternative services that consume less memory. However, this would deviate from the purpose of my experiment.

💡 Lightweight Kubernetes Distributions

If your goal is truly to run Kubernetes in low-resource environments, I recommend looking into lighter Kubernetes distributions like K3s or MicroK8s. These distributions offer core Kubernetes functionality while being optimized to use less memory and CPU. For example, K3s can use SQLite instead of etcd or reduce its dependencies.

Another strategy would be to use Kubernetes in a completely different way. For example, managing services with a simpler tool like Docker Compose instead of Kubernetes. For a 7.6 GB RAM VPS, Docker Compose would generally be a much more suitable solution. Although this goes against the purpose of my experiment, it's an alternative that should definitely be considered in real-world scenarios.

Conclusion: Kubernetes Experiment on Small VPSs and Lessons Learned

In conclusion, experimenting with Kubernetes on a 7.6 GB RAM VPS resulted in extensive swap usage and performance issues. This experience once again showed me that Kubernetes has a certain minimum resource requirement, and when these limits are pushed, the system can become unstable. While increasing swap space and lowering swappiness provided temporary solutions, the fundamental problem was insufficient resources.

One of the most important lessons learned from this experiment is that every technology or platform has a "minimum viable resource" requirement. When a powerful and complex system like Kubernetes is used in an environment that lacks sufficient resources to run it, it creates more problems than it solves. If you need to perform container orchestration in a low-resource environment, considering lighter alternatives like K3s or MicroK8s, or simpler tools like Docker Compose, would be much more sensible.

These kinds of experiments are the best way to combine theoretical knowledge with practical experience. The problems I encountered and the solutions I found will allow me to be better prepared for similar situations in the future. It's important to remember that every "fire" is actually a learning opportunity.

As I mentioned in my related post: My VPS Migration Experience, accurate resource estimation is critical when planning infrastructure. This experiment once again highlighted this fact.